Monday, October 05, 2009

http://hbr.harvardbusiness.org/2009/05/energy-ceo-shai-agassi-on-recognizing-a-sliding-doors-moment/ar/1

Una linda historia de un chabón que pudiendo llegar a CEO de SAP prefirió renunciar y hacer su propia empresa, aún bajo gran riesgo de fallar.

Claro que este tipo dice que prefirió arriesgarse a fallar en su emprendimiento a que ser exitoso en SAP porque ya tenia 100M U$D en el banco.

Como decía Nietzsche “No se piensa lo mismo en una choza que en un castillo”.

Sunday, October 04, 2009

About database scalability:

Abstract--A new n-tier database architecture is proposed. This approach transform the database scalability problem to logarithmic complexity, the cheapest possible.


I. INTRODUCTION:

To go behind the speed of one server database, there are two possible horizontal database architectures, but either one compromise scalability in one way or another.

The first approach uses n servers with the full database each (fig. 1).
___ ___ ___
|___| |___| |___| servers
| | . . n . . |
() () () full DB replica

This architecture is excellent for readings, just by adding a load balancer in the front you can scale to as many nodes as you need . This approach is very poor for writes. In order to perform a write, you have to lock the row you want to update in every server, thus ending with the same write performance of a single server plus the network latency overhead.
There is only one commercial DB that uses something similar to this approach, although it has only one disk-array for all the servers, but it has the same locking problem (although it uses shared memory for locking).

The second approach uses n servers with a different part of the DB each (fig. 2).
___ ___ ___
|___| |___| |___| servers
| | . . n . . |
() () () Tx: table x
T1 T2 Tn

In this case we solved the write problem (the lock is local to each server) but we add a read problem. In the event of a join query that read through all tables, the client need to know where each table is located and construct itself the final answer. All the others commercial DBs use this approach to scale beyond a single box. They usually use a hashing table located in each client to map tables to individual servers (usually you will need to split your biggest tables in multiple servers). The problem arises due to this hashing tables. It becomes real hard to maintain this tables up to date and optimized for each client, due to the temporal nature of the connections. And if an insert is performed to a splitted table, you need to publish this change to every client's hashing table, so they know which part of the table has the data they need.


II. N-TIER ARCHITECTURE:

To solve this problems, we introduce a small but significant difference. We transform the database to a n-tier architecture by adding several layers that construct the answer to the query the client created, avoiding the need of a hashing table at the client level (fig. 3).
___
|___| n-layer
/ \ intermediate
___ ___ servers
|___| |___|
/ \ \
___ ___ ___
|___| |___| |___| servers
| | . . n . . |
() () () Tx: table x
T1 T2 Tn

The top layer of the n-layer intermediate servers is composed of the servers that connect directly to the clients (a load balancer can be added on top of them). These servers need only to know how the tables have been mapped to the servers in the layer under them, not all the way down. This lower layer servers need only to know how the tables have been assigned to the servers in the layer under them and so on, up to the bottom DB servers that handle the requests.
With this approach we have an architecture similar to the parallel add (fig. 4)
___
|___| add unit
/ \
___ ___
|___| |___| add units
/ \ / \
2 3 6 8

This well-known parallel adder has logarithmic complexity. If we double the quantity of numbers we want to add, we will end with only one more layer (logarithmic growth). This is the desire property we added to the existing horizontal database architectures. We can double our computer power (duplicating the number of servers) and add the latency of only one more layer (logarithmic growth).


III. CONSIDERATIONS:

Due to single box performance problems it might be needed to split a bottom node (fig. 5). In this case, you start at first by adding two slaves (50/50 tables split) to the original node, thus it will became a master that construct the final answer using its individual slaves answers. We repeat this iteration until the performance bottleneck is gone. The same approach can be used to solve performance bottlenecks that can arise at one of the n-layers nodes (fig. 6), thus avoiding a network topology optimization (which, by the way is a NP problem). Every of the new slaves of the problematic n-layer node need to have full connection to every slave of the original n-layer node. The original n-layer node becomes a load balancer.


(fig. 5) Problematic node *
___
|___| n-layer
/ \ intermediate
___ ___ servers
|_*_| |___|
/ \ \
___ ___ ___
|___| |___| |___| servers
| | . . n . . |
() () () Tx: table x
T1 T2 Tn


(fig. 6) Problematic node splitted in three new nodes**

___
|___| n-layer
/ \ intermediate
___ ___ servers
|_**| |___|
/ \ \
___ ___ ___
|_**| |_**| |___| servers
| \ / | . . n . . |
___ ___ () Tx: table x
|___| |___| Tn
| |
() ()
T1 T2






In real big configurations, the publishing of the tables assignment con became a bottleneck itself. In those cases, the metadata information can be located into a smaller (will always have less entries than the main DB) n-tier DB, this new n-tier DB metadata can be located into a smaller n-tier DB as well and so on, until the publishing of the tables assignment is no longer a bottleneck.
As of today, a single server with 64 CPUs is several orders of magnitude more expensive than 64 one CPU servers. With a good implementation you can get a faster (although with a high initial latency) and cheaper solution through an n-tier DB architecture than with a single big server.
Availability is the main problem of this solution. One possible way to solve this is to create HA cluster pairs for each pairs of nodes. In the event a DB node fails, its partner assume its functionality. Notice that in this approach we don't need any stand-by server. Each node can assume the other node functionality plus its own, with a predictable performance degradation, until normal service is restored. This can be done at every level.

IV. CONCLUSIONS:

We introduced a new horizontal DB architecture that solves all scalability problems (massive updates, joins, etc.). It achieves this amazing property thanks to its n-tier, tree like, logarithmic complexity approach (similar to a parallel add).
By this work we wanted to provide the community with a way to move to the high-end corporate space. We feel that it's easy to add this n-tier architecture to any DB engine.



1
Recién termino de leer un reportaje a Victor Hugo ( http://www.diarioperfil.com.ar/edimp/0404/articulo.php?art=17165&ed=0404 ). Interesante cuando dice a quién habría votado desde el regreso de la democracia hasta ahora y coincido en todo. El único cambio tal vez es que en lugar de Alfonsin ahora de viejo hubiera preferido a Luder, simplemente porque tengo un recuerdo de una entrevista que le hicieron en la noticia rebelde y lo que mi débil memoria trae a colación fue humor de alto vuelo con un Luder superando a la elite de la comedia Argentina.

Después no se porque se me dió por poner en Google el nombre del entrevistador y encontré la siguiente historia:

Fontevecchia: de Golpista a demócrata
http://www.rodolfowalsh.org/spip.php?article2420

Como siempre no tenés forma de saber si es verdad o no (ya en el límite con la filosofía) pero por lo menos es más corto y entretenido de leer.

Seguramente la vredad esté en el medió, como decían los antiguos Griegos (no los Turcos que hay ahora en Grecia), o la otra versión más criolla "los boludos mienten por igual".