Tuesday, November 03, 2009

DSLs In Action - Updates on the Book Progress

DSLs In Action has been out for a month now in MEAP with the first 3 chapters being published. DSL is an emerging topic and I am getting quite some feedback from the readers who have already purchased the MEAP edition. Thanks for all the feedback.

Writing a book also has lots of similarities with coding. Particularly the refactoring part. The second attempt to articulate a piece of thought is almost always better than the first attempt, much like coding. You cannot imagine how many times I have discarded my first attempt and rewrote the stuff only to get a better feel of what I try to express for my readers.

Anyway, this post is not about book writing. Since the MEAP is out, I thought I should post a brief update on the progress since then .. here they are more as a list of bullet points :-

I delivered Chapter 4 and already got quite a few feedbacks on it. Chapter 4 starts the DSL Implementation section of the book. From here onwards expect lots of code snippets, zillions of implementation techniques that you may try out on your IDE. Chapter 4 deals with Implementation Patterns in Internal DSL. I discuss some of the common idioms that you will be using while implementing internal DSLs. As per the premise of the book, the snippets and idioms are in Ruby, Groovy, Clojure and Scala. It's more of a focus on the strengths of each of these languages that help you implement well designed DSLs. This chapter prepares the launching pad for some more extensive DSL implementations in Chapter 5. It's not there in MEAP yet, but here's the detailed ToC for Chapter 4.

Chapter 4 : Internal DSL Implementation Patterns

4.1. Building up your DSL Toolbox
4.2. Embedded DSL - Patterns in Meta-programming
4.2.1. Implicit Context and Smart APIs
4.2.2. Dynamic Decorators using Mixins
4.2.3. Hierarchical Structures using Builders
4.2.4. New Additions to your Toolbox
4.3. Embedded DSL - Patterns with Typed Abstractions
4.3.1. Higher Order Functions as Generic Abstractions
4.3.2. Explicit Type Constraints to model Domain logic
4.3.3. New Additions to your Toolbox
4.4. Generative DSL - Boilerplates for Runtime Generation
4.5. Generative DSL - One more tryst with Macros
4.6. Summary
4.7. Reference

Chapter 5 is also in the labs right now. But almost complete. I hope to post another update soon ..

Meanwhile .. Enjoy!

Monday, November 02, 2009

NOSQL Movement - Excited with the coexistence of Divergent Thoughts

Today we are witnessing a great bit of excitement with the NoSQL movement. Call it NoSQL (~SQL) or NOSQL (Not Only SQL), the movement has a mission. Not all applications need to store and process data the same way, and the storage should also be architected accordingly. Till today we have always been force-fitting a single hammer to drive every nail. Irrespective of how we process data in our application we have traditionally stored them as rows and columns in a relational database.

When we talk about really big write scaling applications, relational databases suck big time. Normalized data, joins, acid transactions are definite anti-patterns in write scalability. You may think sharding will solve your problems by splitting data into smaller chunks. But in reality, the biggest problem with sharding is that relational databases have never been designed for it. Sharding takes away many of the benefits that relational databases have traditionally been built for. Sharding cannot be an afterthought, sharding intrudes into the business logic of your application and joining data from multiple shards is definitely a non trivial effort. As long as you can scale up your data model vertically by increasing the size of your box, that's possibly the sanest way to go for. But Moore .. *cough* .. *cough* .. Even if you are able to scale up vertically, try migrating a really large MySQL database. It will take hours, and even days. That's one of the problems why some companies are moving to schemaless databases when their applications can afford to.

For horizontal scalability of an application if we sacrifice normalization, joins and ACID transactions, why should we use an RDBMS ? You don't need to .. Digg is moving to Cassandra from MySQL. It all depends on your application and the kind of write scalability that you need to achieve in processing of your data. For read scalability, you can still manage using read-only slaves replicating everything coming to the master database in realtime and setting up a smart proxy router between your clients and the database.

The biggest excitement that the NOSQL movement has created today is because of the divergence of thoughts that each of the products is promising. This is very much unlike the RDBMS movement which started as a single hammer named SQL that's capable of munging rows and columns of data based on the theory of mathematical set operations. And every application adopted the same storage architecture irrespective of how they process the data from within their application. One thing led to another, people thought they can solve this problem with yet another level of indirection .. and the strange thingy called an Object Relational Mapper was born.

At last it needed the momentum of the Web shaped data processing to make us realize that all data are not processed alike. The storage that works so well for your desktop trading application will fail miserably in a social application where you need to process linked data, more in the shape of a graph. The NOSQL community has responded with Neo4J, a graph database that offers easy storage and traversal of graph structures.

If you want to go big on write scalability, the only way out is decentralization and eventual consistency. The CAP theorem kicks in, and you need to compromise on at least one of consistency, availability and network partition tolerance. Riak and Cassandra offer decentralized data stores that can potentially scale indefinitely. If your application needs more structure than a key-value database, you can go for Cassandra, the distributed, peer-to-peer, column oriented data store. Have a look at the nice article from Digg which compares their use case between a relational storage and the columnar storage that Cassandra offers. For a document oriented database with all the goodness of REST and JSON, Riak is the option to choose. Also Riak offers linked map/reduce with the option to store linked data items, much in the way the Web works. Riak is truly a Web shaped data store.

CouchDB has yet another very interesting value proposition in this whole ecosystem of NOSQL databases. Most of the applications are inherently offline and need seamless and painless replication facilities. CouchDB's B-Tree based storage structure, append only operations with MVCC based model of concurrency control, lockless operations, REST APIs and incremental map/reduce operations position it with a sweet enough spot in the space of local browser storage. Chris Anderson, one of the core developers of CouchDB sums up the value of CouchDB in today's Web based world very nicely ..

"CouchApps are the product of an HTML5 browser and a CouchDB instance. Their key advantage is portability, based on the ubiquity of the html5 platform. Features like Web Workers and cross-domain XHR really make a huge difference in the fabric of the web. Their availability on every platform is key to the future of the web."

MongoDB, like CouchDB is also a document store. It doesn't offer REST out of the box, but it's based on JSON storage. It has map/reduce as well, but also offers a strong suite of query APIs much like SQL. This is the main sweet spot of MongoDB, which plays very well to people coming from a SQL background. MongoDB also offers master slave replication and has been working towards an autosharding based scalability and failover support.

There are quite a few other data stores that offer solutions to problems that you face in everyday application design. Caching, worker queues requiring atomic push/pop operations, processing activity streams, logging data etc. Redis and Tokyo Cabinet are nice fits for such use cases. You can think of Redis as a memcached with a backup persistent key-value database. It's single threaded, uses non-blocking IO and is blazing fast. Redis, besides offering every day key/value storage also offer list and sets to be stored along with atomic operations on each of them. Pick the one that fits your bill the best.

Another interesting aspect is the interoperability between these data stores. Riak, for example offers pluggable data backends - possibly we can have CouchDB as the data backend for Riak (can we ?). Possibly we will also see a Cassandra backend for Neo4J. It's extremely heartening to see that each of these communities has a deep sense of cooperation in making the entire ecosystem more meaningful and thriving.