Thursday, December 24, 2009

A case for hybrid SQL - NoSQL stack

Alex Popescu talks about Drizzle replication in his MyNoSql column. He makes a very interesting observation in his post regarding Drizzle's replication capabilities into a host of NoSQL storage backends ..

"Leaving aside the technical details — which are definitely interesting .., the solution using the Erlang AMQP .. implementation RabbitMQ .. — I think this replication layer could represent a good basis for SQL-NoSQL hybrid solutions".

We are going to see more and more of such hybrid solutions in the days to come. Drizzle does it at the product level. Using RabbitMQ as the transport, Drizzle can replicate data as serialized Java objects to Voldemort, as JSON marshalled objects to Memcached or as a hashmap to column family based Cassandra.

Custom implementation projects have also started using the hybrid stack of persistent stores. When you are dealing with real high volumes and patterns of access where you cannot use joins, anyway you need to denormalize. ORMs cease to be a part of your solution toolset. Data access patterns vary widely across the profile of clients using your system. If you are running an ecommerce suite and your product is launched you may have an explosive use of the shopping cart module. It makes every sense to move your shopping cart from the single relational data store where it was lying around and have it served through a more appropriate data store that lives up to the scalability requirements. It's not that you need to throw away your relational database that has served you so long. Like Alex mentioned, you can always go along with a hybrid model.

In one of our recent projects, we were using Oracle as the main relational database for a securities trading back office solution implementation. The database load was computed based on all calculations that were done initially. In a very late stage of the project a new requirement came up that needed heavy processing and storage of semi-structured data and meta-data from an external feed. Both the data and the meta-data were extensible which meant that it was difficult to model them with a fixed schema.

We could not afford frequent schema changes since it would entail long downtime of the production database. But there also was the requirement that after processing of these semi-structured data lots of them will have to be made available in the production database. We could have modeled it following the key/value paradigm in Oracle itself, which we were using anyway as the primary database. But that's again going down the age old saying of the hammer and nail story.

We decided to supplement the stack with another data store that fits the bill for this specific use case. We used MongoDB, that gave us phenomenal performance for our requirements. We were getting the feed from external data sources and loaded our MongoDB database with all the semi-structured data and meta-data. All necessary processing was done in MongoDB on those data and relevant information from MongoDB were pushed to JMS based queues for consumption by appropriate services that copied data asynchrnously to our Oracle servers.

What did we achieve with the above architecture ?

  1. Kept Oracle free to do what it does the best.

  2. Took away unnecessary load from production database servers.

  3. Introduced a document database for serving a requirement tailor made for its use - semi structured data, mainly reads, no constraints, no overhead of ORM ceremony. MongoDB supports a very clean programming model, a very decent query interface, simple to use and easy to convince your client.

  4. Used message based mapping to sync up data ASYNCHRONOUSLY between the nosql MongoDB and sql based Oracle. Each of the data stores were doing what they do the best, keeping us away from the blames of the hammer-nail paradigm.


With more and more of the nosql stores coming up, message based replication is going to play a very important role. Even within the nosql datastore, we are seeing choices of sql based storage backends being offered. Voldemort offers MySql as one of the storage backends - so the hybrid model starts right up there. It's always advisable to use multiple storage that fits your use case than trying to force-fit everything into a single paradigm.

Monday, December 14, 2009

No Comet, Hacking with WebSocket and Akka

Over the weekend I upgraded my Chrome to the developer channel and played around a bit with Web Sockets. It's really really exciting and for sure will make us think of Web applications in a different way. Web Sockets have been described as the "TCP for the Web" and will standardize bidirectional communication technology for Web applications.

The way we think today of push technology is mostly through Comet bsed implementations that use "long polling". The browser sends a request to the server and then waits for an event to happen and then sends a response. The client, on getting the response consumes it, closes the socket and does a new long polling connection to the server. The communication is not symmetric and connections can drop off and on between the client and the server. Also, the fact that Comet implementations are not portable across Web containers make it an additional headache to portability of applications.

Atmosphere offers a portable framework to implementing Ajax/Comet based applications for the masses. But still it does not make the underlying technology painless.

Web Sockets will make applications much more symmetric - instead of having long polling it's all symmetric push/pull. Once you get a Web Socket connection, you can exchange data between the browser and the server through send() method and an onmessage event handler.

I used this snippet to hack up a bidirectional communication channel with an Akka actor based service stack that I was already using in an application. I needed to hack up a very very experimental Web Socket server, which did not take much of an effort. The actors could do message exchange with the Web Socket and communicate further downstream to a CouchDB database. I already had all the goodness of transaction processing using Akka STM. But now I could get rid of much of the client code that looked like magic and replace it with something that's more symmetric from the application point of view. It just feels like a natural extension to socket based programming on the client. Also the fact that Javascript is ideally suited for an event driven programming model makes the client code much cleaner with the Web Socket API.

Talking about event based programming, we all saw the awesomeness that Ryan Dahl demonstrated a few weeks back with evented I/O in Node.js. Now people have already started developing experimental implementations of the Web Socket protocol for Node.js. And I noticed just now that Jetty also has a WebSocket server.