Thursday, January 21, 2010

A new way to think of Data Storage for your Enterprise Application

A couple of posts earlier I had blogged about a real life case study of one of our projects where we are using a SQL store (Oracle) and a NoSQL store (MongoDB) in combination over a message based backbone. MongoDB was used to cater to a very specific subset of the application functionality, where we felt it made a better fit than a traditional RDBMS. This hybrid architecture of data organization is turning out to be an increasingly attractive option today with more and more specialized persistent storage structures being developed.

In many applications we need to process graph data structures. Neo4J can be a viable option for this. You can have your mainstream data storage still in an RDBMS and use Neo4J only for the subset of functionalities for which you need to use graph data structures. If you need to sync back to your main storage, use messaging as the transport to talk back to your relational database.

Multiple data storage use along with asynchronous messaging is one of the options that will looks very potent today. Drizzle has its entire replication based on a RabbitMQ based transport. And using AMQP messaging, Drizzle replicates data to a host of key/value stores like Voldemort, memcachedDB and Cassandra.

If we agree that messaging is going to be one of the most dominant paradigms in shaping application architectures, why not try to go one level up and look at some higher level abstractions for message based programming? Erlang programmers have been using the actor model for many years now and have demonstrated all the good qualities that the model imbibes. Inspired by Erlang, Scala also offers a similar model on the JVM. In an earlier post I had discussed how we can use the actor model in Scala to scale out messaging applications using a RabbitMQ storage.

Now with the developing ecosystem of polyglot storage, we can use the same model of actor based communication as the backbone for integrating multiple data storage options that you may plug in to your application. Have specific clients front end the storage that they need to work with and use messaging to sync that up with the main data storage backend to have a consistent system of record. Have a look at the following diagram that may not look that unreal today. You have a host of options that bring your data closer to the way you process them in your domain model, be it document oriented, graph based, key/value based or simple POJO based across a data grid like Terracotta.



When we have a bunch of architectural components loosely connected through messaging infrastructure, you can have a world of options managing interactions between them. In fact your options open up more when you get to interact with data shaped the way you would like to be. You now can think in terms of having a data model aligned with the model of your domain. You know once your rule base gets updated in Neo4J, it will somehow be synced up with the backend storage through some other service that will make it eventually consistent.

In a future post I will explore some of the options that a higher order middleware service like Akka can add to your stack. With Akka providing abstractions like transactors, pluggable persistence and out of the box integration modules for AMQP, there's a number of ways you can think of modularizing your application's domain model and storage. You can use peer to peer distributed actor based communication model that sets up synchronization options with your databases or you can use AMQP based transport to do the same much like what Drizzle does for replication. But that's some food for thought for yet another future post.

Sunday, January 10, 2010

A Case for Orthogonality in Design

In Learning, using and designing command paradigms, John M Carroll introduces the notion of lexical congruence. When you design a language, one of the things that you do is lexicalization of the domain. If we extrapolate the concept to software designs in general, we go through the same process with our domain model. We identify artifacts or lexemes and choose to name them appropriately so that the names are congruent with the semantics of the domain. This notion of lexical congruence is the essence of having a good mnemonics for your domain language. I found the reference to Carroll's work in Testing the principle of orthogonality in language design, which discusses the same issue of organizing your language around an optimal set of orthogonal semantic concepts. This blog post tries to relate the same concepts of orthogonality in designing domain models using the power that the newer languages of today offers.

The complexity of a modeling language depends on the number of lexemes, their congruence with the domain concepts being modeled and the number of ways you can combine them to form higher order lexemes. The more decoupled each of these lower level lexemes are, the easier they are to compose. When you have overlapping concepts being modeled as part of lexemes, the mixing is not easy. You need to squeeze in some special boundary conditions as part of composition logic. Making your concepts independent yet composable makes your design orthogonal.

The first time I came across the concept of orthogonality in design and consciously appreciated the power of unification that it brings on to your model, is through Andrei Alexandrescu's idea of policy based design that he evangelized in his book Modern C++ Design and in the making of the Loki library. You have orthogonal policies that are themselves reusable as independent abstractions. And at the same time you can use the language infrastructure to combine them when composing your higher order model. Consider this C++ example from Andrei's book ..

template
<
  class T,
  template <class> class CheckingPolicy,
  template <class> class ThreadingModel
>
class SmartPtr;


CheckingPolicy enforces constraints that need to be satisfied by the pointee object. The ThreadingModel abstraction defines the concurrency semantics. These two concerns are not related in any way between themselves. But you can use the power of C++ templates to plug in appropriate behaviors of these concerns when composing your own custom type of SmartPtr ..

template SmartPtr<Widget, NoChecking, SingleThreaded>
  WidgetPtr;


This is orthogonal design where you have a minimal set of lexemes to model otherwise unrelated concerns. And use the power of C++ templates to evolve a larger abstraction by composing them together. The policies themselves are independent and can be applied to construct arbitrary families of abstraction.

The crux of the idea is that you have m concepts that you can use with n types. There's no static relationship between the concepts and the types - that's what makes orthogonality an extensible concept. Consider Haskell typeclasses ..

class Eq a where 
  (==) :: a -> a -> Bool


The above typeclass defines the concept of equality. It's parameterized on the type and defines the constraint that the type as to define an equality operator in order to qualify itself as an instance of the Eq typeclass. The actual type is left open which gives the typeclass an unbounded extensibility.

For integers, we can do

instance Eq Integer where 
  x == y =  x `integerEq` y


For floats we can have ..

instance Eq Float where 
  x == y =  x `floatEq` y


We can define Eq even for any custom data type, even recursive types like Tree ..

instance (Eq a) => Eq (Tree a) where 
  Leaf a         == Leaf b          =  a == b
  (Branch l1 r1) == (Branch l2 r2)  =  (l1==l2) && (r1==r2)
  _              == _               =  False


Haskell typeclasses, like C++ templates help implement orthogonality in abstractions through a form of parametric polymorphism. Programming languages offer facilities to promote orthogonal modeling of abstractions. Of course the power varies depending on the power of abstraction that the language itself offers.

Let's consider a real world scenario. We have an abstraction named Address and modeled as a case class in Scala ..

case class Address(no: Int, street: String, 
                   city: String, state: String, zip: String)


There can be many contexts in which you would like to use the Address abstraction. Consider printing of labels for shipping that needs your address to be printed in some specific label format, as per the following trait ..

trait LabelMaker {
  def toLabel: String
}


Note that printing addresses in the form of labels is not one of the primary concerns of your Address abstraction. Hence it makes no sense to model it as one of the methods of the class. It's only required that in some situations we may need to use the Address to print itself in the form of a label as per the specification mandated by LabelMaker.

One other concern is sorting. You may need to have your addresses sorted based on zip code before submitting them to your Printer module for shpping. Sorting may be required in combination with label printing or as well as on its own - these two are orthogonal concerns that should never have any dependence amongst themselves within your abstraction.

Depending on your use case, you can decide to compose your Address abstraction as

case class Address(houseNo: Int, street: String, 
  city: String, state: String, zip: String)
  extends Ordered[Address] with LabelMaker {
  //..
}


which makes your Address abstraction statically coupled with the other two.

Or you may like to make the composition based on individual objects which would keep the base abstraction independent of any static coupling.

val a = new Address(..) with LabelMaker {
  override def toLabel = {
    //..
  }
}


As an alternative you can also choose to implement implicit conversions from Address using Scala views ..

object Address {
  implicit def AddressToLabelMaker(addr: Address) = new LabelMaker {
    def toLabel =
      "%d-%s, %s, %s-%s".format(
        addr.houseNo, addr.street, addr.city, addr.state, addr.zip)
  }
}


Whatever be the implementation, take note of the fact that we are not polluting the basic Address abstraction with the concerns that are orthogonal to it. Our model, which is the design language treats orthogonal concerns as separate lexemes and encourages ways to compose them non invasively by the user.

Sunday, January 03, 2010

Pragmatics of Impurity

James Hague, a long time Erlanger, drives home a point or two regarding purity of paradigms in a couple of his latest blog posts. Here's his take on being effective with pure functional languages ..

"My real position is this: 100% pure functional programing doesn't work. Even 98% pure functional programming doesn't work. But if the slider between functional purity and 1980s BASIC-style imperative messiness is kicked down a few notches--say to 85%--then it really does work. You get all the advantages of functional programming, but without the extreme mental effort and unmaintainability that increases as you get closer and closer to perfectly pure."

Purity is not necessarily pragmatic. In my last blog post I also tangentially touched upon the notion of purity while discussing how a *hybrid* model of SQL-NoSQL database stack can be effective for large application deployments. Be it with programming languages or with databases or any other paradigms of computation, we need to have the right balance of purity and pragmatism.

Clojure introduced transients. Rich Hickey says in the rationale .. "If a pure function mutates some local data in order to produce an immutable return value, is that ok?". Transients in Clojure allow localized mutation in initializing or transforming a large persistent data structure. This mutation will only be seen by the code that does the transformation - the client gets back a version for immutable use that can be shared. In no way does this invalidate the benefits that immutability brings in reasoning of Clojure programs. It's good to see Rich Hickey being flexible and pragmatic at the expense of injecting that little impurity into his creation.

Just like the little compromise (and big pragmatism) with the purity of persistent data structures, Clojure also made a similar compromise with laziness by introducing chunked sequences that optimize the overhead associated with lazy sequences. These are design decisions that have been taken consciously by the creator of the language that values pragmatism over purity.

Enough has already been said about the virtues of purity in functional languages. Believe me, 99% of the programming world does not even care for purity. They do what works best for them and hybrid languages are mostly the ones that find the sweetest spots. Clojure is as impure as Scala is, considering the fact that both allow side-effecting with mutable references and uncontrolled IO. Even Erlang has uncontrolled IO and a mutable process dictionary, though its use is often frowned upon within the community. The important point is that all of them have proved to be useful to programmers at large.

Why do creators infuse impurity into their languages ? Why aren't every language created as pure as Haskell is ? Well, it's mostly related to a larger thought that the language often targets to. Lisp started as an incarnation of the lambda calculus under the tutelage of John McCarthy and became the first significant language promoting the purely applicative model of programming without side-effects. Later on it added the impurities of mutation constructs based on the von Neumann architecture of the machines where Lisp was implemented. The obvious reason was to get an improved performance over purely functional constructs. Scala and Clojure both decided to go for the JVM as the primary runtime platform - hence both languages are susceptible to the pitfalls of impurity that JVM offers. Both of them decided to inherit all the impurities that Java has.

Consider the module system of Scala. You can compose modules using traits with deferred concrete definitions of types and objects. You can even compose mutually recursive modules using lazy vals, somewhat similar to what Newspeak and some dialects of ML offer. But because you have decided to bite the Java pill, you can also wreak havoc through shared mutable state at the top level object that you compose. In his post titled A Ban on Imports Gilad Bracha discusses all evil effects that an accessible global namespace can bring to the modularity aspects of your code. Newspeak is being designed as pure in this respect, with all dependencies being abstract and need to be plugged together explicitly as part of configuring the module. Scala is impure in this respect, allows imports to bring in the world on to your module definitions, but at the same time opens up all possibilities of sharing the huge ecosystem that the Java community has built over the years. You can rightfully choose to be pure in Scala, but that's not enforced by the language.

When we talk about impurity in languages, it's mostly related to how it handles side-effects and mutable state. And Haskell has a completely different take on this aspect than what we discussed with Lisp, Scala or Clojure. You have to use monads in Haskell towards any side-effecting operation. And people with a taste for finer things in life are absolutely fine with that. You cannot just stick in a printf to your program for debugging. You need to return the whole stuff within an IO monad and then do a print. The Haskell philosophy looks at a program as a model of mathematical functions where side-effects are also implemented in a functional way. This makes reasoning and optimization by the compiler much easier - you can make your pure Haskell code run as fast as C code. But you need to think differently. Pragmatic ? What do you think ?

Gilad Bracha is planning to implement pure subsets of Newspeak. It will be really exciting to get to see languages which are pure, functional (note: not purely functional) and object-oriented at the same time. He observes in his post that (t)he world is slowly digesting the idea that object-oriented and functional programming are not contradictory concepts. They are orthogonal, and can be arranged to be rather complementary. This is an interesting trend where we can see families of languages built around the same philosophy but differing in aspects of purity. You need to be pragmatic to choose and even mix them depending on your requirements.