Monday, October 05, 2009

http://hbr.harvardbusiness.org/2009/05/energy-ceo-shai-agassi-on-recognizing-a-sliding-doors-moment/ar/1

Una linda historia de un chabón que pudiendo llegar a CEO de SAP prefirió renunciar y hacer su propia empresa, aún bajo gran riesgo de fallar.

Claro que este tipo dice que prefirió arriesgarse a fallar en su emprendimiento a que ser exitoso en SAP porque ya tenia 100M U$D en el banco.

Como decía Nietzsche “No se piensa lo mismo en una choza que en un castillo”.

Sunday, October 04, 2009

About database scalability:

Abstract--A new n-tier database architecture is proposed. This approach transform the database scalability problem to logarithmic complexity, the cheapest possible.


I. INTRODUCTION:

To go behind the speed of one server database, there are two possible horizontal database architectures, but either one compromise scalability in one way or another.

The first approach uses n servers with the full database each (fig. 1).
___ ___ ___
|___| |___| |___| servers
| | . . n . . |
() () () full DB replica

This architecture is excellent for readings, just by adding a load balancer in the front you can scale to as many nodes as you need . This approach is very poor for writes. In order to perform a write, you have to lock the row you want to update in every server, thus ending with the same write performance of a single server plus the network latency overhead.
There is only one commercial DB that uses something similar to this approach, although it has only one disk-array for all the servers, but it has the same locking problem (although it uses shared memory for locking).

The second approach uses n servers with a different part of the DB each (fig. 2).
___ ___ ___
|___| |___| |___| servers
| | . . n . . |
() () () Tx: table x
T1 T2 Tn

In this case we solved the write problem (the lock is local to each server) but we add a read problem. In the event of a join query that read through all tables, the client need to know where each table is located and construct itself the final answer. All the others commercial DBs use this approach to scale beyond a single box. They usually use a hashing table located in each client to map tables to individual servers (usually you will need to split your biggest tables in multiple servers). The problem arises due to this hashing tables. It becomes real hard to maintain this tables up to date and optimized for each client, due to the temporal nature of the connections. And if an insert is performed to a splitted table, you need to publish this change to every client's hashing table, so they know which part of the table has the data they need.


II. N-TIER ARCHITECTURE:

To solve this problems, we introduce a small but significant difference. We transform the database to a n-tier architecture by adding several layers that construct the answer to the query the client created, avoiding the need of a hashing table at the client level (fig. 3).
___
|___| n-layer
/ \ intermediate
___ ___ servers
|___| |___|
/ \ \
___ ___ ___
|___| |___| |___| servers
| | . . n . . |
() () () Tx: table x
T1 T2 Tn

The top layer of the n-layer intermediate servers is composed of the servers that connect directly to the clients (a load balancer can be added on top of them). These servers need only to know how the tables have been mapped to the servers in the layer under them, not all the way down. This lower layer servers need only to know how the tables have been assigned to the servers in the layer under them and so on, up to the bottom DB servers that handle the requests.
With this approach we have an architecture similar to the parallel add (fig. 4)
___
|___| add unit
/ \
___ ___
|___| |___| add units
/ \ / \
2 3 6 8

This well-known parallel adder has logarithmic complexity. If we double the quantity of numbers we want to add, we will end with only one more layer (logarithmic growth). This is the desire property we added to the existing horizontal database architectures. We can double our computer power (duplicating the number of servers) and add the latency of only one more layer (logarithmic growth).


III. CONSIDERATIONS:

Due to single box performance problems it might be needed to split a bottom node (fig. 5). In this case, you start at first by adding two slaves (50/50 tables split) to the original node, thus it will became a master that construct the final answer using its individual slaves answers. We repeat this iteration until the performance bottleneck is gone. The same approach can be used to solve performance bottlenecks that can arise at one of the n-layers nodes (fig. 6), thus avoiding a network topology optimization (which, by the way is a NP problem). Every of the new slaves of the problematic n-layer node need to have full connection to every slave of the original n-layer node. The original n-layer node becomes a load balancer.


(fig. 5) Problematic node *
___
|___| n-layer
/ \ intermediate
___ ___ servers
|_*_| |___|
/ \ \
___ ___ ___
|___| |___| |___| servers
| | . . n . . |
() () () Tx: table x
T1 T2 Tn


(fig. 6) Problematic node splitted in three new nodes**

___
|___| n-layer
/ \ intermediate
___ ___ servers
|_**| |___|
/ \ \
___ ___ ___
|_**| |_**| |___| servers
| \ / | . . n . . |
___ ___ () Tx: table x
|___| |___| Tn
| |
() ()
T1 T2






In real big configurations, the publishing of the tables assignment con became a bottleneck itself. In those cases, the metadata information can be located into a smaller (will always have less entries than the main DB) n-tier DB, this new n-tier DB metadata can be located into a smaller n-tier DB as well and so on, until the publishing of the tables assignment is no longer a bottleneck.
As of today, a single server with 64 CPUs is several orders of magnitude more expensive than 64 one CPU servers. With a good implementation you can get a faster (although with a high initial latency) and cheaper solution through an n-tier DB architecture than with a single big server.
Availability is the main problem of this solution. One possible way to solve this is to create HA cluster pairs for each pairs of nodes. In the event a DB node fails, its partner assume its functionality. Notice that in this approach we don't need any stand-by server. Each node can assume the other node functionality plus its own, with a predictable performance degradation, until normal service is restored. This can be done at every level.

IV. CONCLUSIONS:

We introduced a new horizontal DB architecture that solves all scalability problems (massive updates, joins, etc.). It achieves this amazing property thanks to its n-tier, tree like, logarithmic complexity approach (similar to a parallel add).
By this work we wanted to provide the community with a way to move to the high-end corporate space. We feel that it's easy to add this n-tier architecture to any DB engine.



1
Recién termino de leer un reportaje a Victor Hugo ( http://www.diarioperfil.com.ar/edimp/0404/articulo.php?art=17165&ed=0404 ). Interesante cuando dice a quién habría votado desde el regreso de la democracia hasta ahora y coincido en todo. El único cambio tal vez es que en lugar de Alfonsin ahora de viejo hubiera preferido a Luder, simplemente porque tengo un recuerdo de una entrevista que le hicieron en la noticia rebelde y lo que mi débil memoria trae a colación fue humor de alto vuelo con un Luder superando a la elite de la comedia Argentina.

Después no se porque se me dió por poner en Google el nombre del entrevistador y encontré la siguiente historia:

Fontevecchia: de Golpista a demócrata
http://www.rodolfowalsh.org/spip.php?article2420

Como siempre no tenés forma de saber si es verdad o no (ya en el límite con la filosofía) pero por lo menos es más corto y entretenido de leer.

Seguramente la vredad esté en el medió, como decían los antiguos Griegos (no los Turcos que hay ahora en Grecia), o la otra versión más criolla "los boludos mienten por igual".

Monday, September 28, 2009

Distintos softwares Open Source de DNS que no han tenido problemas de seguridad como Nominum.

http://tech.slashdot.org/story/09/09/23/1555245/Nominum-Calls-Open-Source-DNS-A-Recipe-For-Problems?from=rss

Distintos softwares Open Source de DNS que no han tenido problemas de seguridad como Nominum.

http://tech.slashdot.org/story/09/09/23/1555245/Nominum-Calls-Open-Source-DNS-A-Recipe-For-Problems?from=rss

Friday, September 25, 2009

Han vuelto a jugar Karpov-Kasparov http://www.clarin.com/diario/2009/09/24/um/m-02005604.htm

Cuando jugaron por el campeonato mundial, yo tenia 9 años y me fascinaba el ajedrez. Mi preferido era Karpov, sin entender mucho por que. Luego de entender más del juego me dí cuenta de la monstruosa superioridad de Kasparov y lo empecé a respetar.
Ya ahora con 34 años vuelvo a preferir a Karpov antes que a Kasparov ya que siempre me dio la sensación que se dejo perder contra la computadora por plata.
Además estudiando un poco el juego de Karpov, de viejo me gusta más el juego cerrado donde vences al contrario sin que este se de cuenta donde se equivoco.
Esa dualidad de juego abierto de Kasparov y cerrado de Karpov la mantengo en mi juego, abriendo siempre de Dama jugando de blancas y siempre Siciliana con negras.

En fin, locuras de cada uno.
Sobre la idea de "Connecting de Dots" de Steve Jobs, en la historia de Ricahrd Feynman de wikipedia ( http://en.wikipedia.org/wiki/Richard_Feynman ) dice lo siguiente:

"During a temporary depression following the destruction of Hiroshima by the bomb produced by the Manhattan Project, he focused on complex physics problems, not for utility, but for self-satisfaction. One of these was analyzing the physics of a twirling, nutating dish as it is moving through the air. His work during this period, which used equations of rotation to express various spinning speeds, would soon prove important to his Nobel Prize winning work."

Connecting the Dots funciona, aunque primero tenes que plantar muchos Dots.
Una linda forma de manejar versionamiento de documentos online:



Slashdot is hardly the right venue to get a good answer to this question (how the hell did it end up in the Hardware category?), but I've dealt with this a zillion times, so I'll give a pointer to what is very likely the correct answer: optimistic locking [wikipedia.org].

Hard locks are probably not what you want in a stateless web app. (E.g. What happens if someone locks a record and then is hit by a bus?) Instead, here's how it works:

1. User X fetches version 1 of Record A.
2. User Y fetches version 1 of Record A.
3. User X modifies her copy of Record A and attempts to save the change.
4. System checks whether incoming version (1) matches database version (1). It does, so the save proceeds and the version number on the record is updated to 2.
5. User Y modifies his copy of Record A and attempts to save the change.
6. System checks whether incoming version (1) matches database version (2). It does not, so User Y is notified that he cannot save his changes.
7. User Y fetches version 2 of Record A and tries again.

This is also known in the vernacular as "second save loses". It may sound too harsh, but it is much better than "first save loses and user isn't notified", which is what you get if you have no currency checking at all. And it's also much more web friendly that your old desktop app (which uses an approach that is technically called "pessimistic locking").
--
"We all assume the worst the best we can." - Iron & Wine

Tuesday, September 22, 2009

Otra linda prueba que el sistema de salud gringo esta muy mal. Los gringos compran saliva de africanos que tienen tuberculosis para así conseguir cobertura médica gratuita.

http://idle.slashdot.org/article.pl?sid=09/09/18/1737254

Saturday, September 19, 2009

Claramente, los autos tienen que tener el motor central atras del conductor.

Por eso el Cayman se porta tanto mejor que el 911.
http://es.autoblog.com/2009/09/18/la-supercomparativa-de-motor-trend-de-este-ano-en-video/

Ya para la elite del mundo, diversion al por mayor con la 458 http://www.youtube.com/watch?v=3lO4KQ74r3g&feature=player_embedded#t=130
Fijense que linda salida de una curva hace la 458 amarilla que dobla cruzada sin agarre pleno. Mucha envidia.