Sunday, April 06, 2014

Functional Patterns in Domain Modeling - Immutable Aggregates and Functional Updates

In the last post I looked at a pattern that enforces constraints to ensure domain objects honor the domain rules. But what exactly is a domain object ? What should be the granularity of an object that my solution model should expose so that it makes sense to a domain user ? After all, the domain model should speak the language of the domain. We may have a cluster of entities modeling various concepts of the domain. But only some of them can be published as abstractions to the user of the model. The others can be treated as implementation artifacts and are best hidden under the covers of the published ones.

An aggregate in domain driven design is a published abstraction that provides a single point of interaction to a specific domain concept. Considering the classes I introduced in the last post, an Order is an aggregate. It encapsulates the details that an Order is composed of in the real world (well, only barely in this example, which is only for illustration purposes :-)).

Note an aggregate can consist of other aggregates - e.g. we have a Customer instance within an Order. Eric Evans in his book on Domain Driven Design provides an excellent discussion of what constitutes an Aggregate.

Functional Updates of Aggregates with Lens


This is not a post about Aggregates and how they fit in the realm of domain driven design. In this post I will talk about how to use some patterns to build immutable aggregates. Immutable data structures offer a lot of advantages, so build your aggregates ground up as immutable objects. Algebraic Data Type (ADT) is one of the patterns to build immutable aggregate objects. Primarily coming from the domain of functional programming, ADTs offer powerful techniques of pattern matching that help you match values against patterns and bind variables to successful matches. In Scala we use case classes as ADTs that give immutable objects out of the box ..

case class Order(orderNo: String, orderDate: Date, customer: Customer, 
  lineItems: Vector[LineItem], shipTo: ShipTo, netOrderValue: Option[BigDecimal] = None, 
  status: OrderStatus = Placed)

Like all good aggregates, we need to provide a single point of interaction to users. Of course we can access all properties using accessors of case classes. But what about updates ? We can update the orderNo of an order like this ..

val o = Order( .. )
o.copy(orderNo = newOrderNo)

which gives us a copy of the original order with the new order no. We don't mutate the original order. But anybody having some knowledge of Scala will realize that this becomes pretty clunky when we have to deal with nested object updation. e.g in the above case, ShipTo is defined as follows ..

case class Address(number: String, street: String, city: String, zip: String)
case class ShipTo(name: String, address: Address)

So, here you go in order to update the zip code of a ShipTo ..

val s = ShipTo("abc", Address("123", "Monroe Street", "Denver", "80233"))
s.copy(address = s.address.copy(zip = "80231"))

Not really pleasing and can go off bounds in comprehensibility pretty soon.

In our domain model we use an abstraction called a Lens for updating Aggregates. In very layman's terms, a lens is an encapsulated get and set combination. The get extracts a small part from a larger whole, while the set transforms the larger abstraction with a smaller part taken as a parameter.

case class Lens[A, B](get: A => B, set: (A, B) => A)

This is a naive definition of a Lens in Scala. Sophisticated lens designs go a long way to ensure proper abstraction and composition. scalaz provides one such implementation out of the box that exploits the similarity in structure between the get and the set to generalize the lens definition in terms of another abstraction named Store. As it happens so often in functional programming, Store happens to abstract yet another pattern called the Comonad. You can think of a Comonad as the inverse of a Monad. But in case you are more curious, and have wondered how lenses form "the Coalgebras for the Store Comonad", have a look at the 2 papers here and here.

Anyway for us mere domain modelers, we will use the Lens implementation as in scalaz .. here's a lens that helps us update the OrderStatus within an Order ..

val orderStatus = Lens.lensu[Order, OrderStatus] (
  (o, value) => o.copy(status = value),
  _.status
)

and use it as follows ..

val o = Order( .. )
orderStatus.set(o, Placed)

will change the status field of the Order to Placed. Let's have a look at some of the compositional properties of a lens which help us write readable code for functionally updating nested structures.

Composition of Lenses

First let's define some individual lenses ..

// lens for updating a ShipTo of an Order
val orderShipTo = Lens.lensu[Order, ShipTo] (
  (o, sh) => o.copy(shipTo = sh),
  _.shipTo
)

// lens for updating an address of a ShipTo
val shipToAddress = Lens.lensu[ShipTo, Address] (
  (sh, add) => sh.copy(address = add),
  _.address
)

// lens for updating a city of an address
val addressToCity = Lens.lensu[Address, String] (
  (add, c) => add.copy(city = c),
  _.city
)

And now we compose them to define a lens that directly updates the city of a ShipTo belonging to an Order ..

// compositionality FTW
def orderShipToCity = orderShipTo andThen shipToAddress andThen addressToCity

Now updating a city of a ShipTo in an Order is as simple and expressive as ..

val o = Order( .. )
orderShipToCity.set(o, "London")

The best part of using such compositional data structures is that it makes your domain model implementation readable and expressive to the users of your API. And yet your aggregate remains immutable.

Let's look at another use case when the nested object is a collection. scalaz offers partial lenses that you can use for such composition. Here's an example where we build a lens that updates the value member within a LineItem of an Order. A LineItem is defined as ..

case class LineItem(item: Item, quantity: BigDecimal, value: Option[BigDecimal] = None, 
  discount: Option[BigDecimal] = None)

and an Order has a collection of LineItems. Let's define a lens that updates the value within a LineItem ..

val lineItemValue = Lens.lensu[LineItem, Option[BigDecimal]] (
  (l, v) => l.copy(value = v),
  _.value
)

and then compose it with a partial lens that helps us update a specific item within a vector. Note how we convert our lineItemValue lens to a partial lens using the unary operator ~ ..

// a lens that updates the value in a specific LineItem within an Order 
def lineItemValues(i: Int) = ~lineItemValue compose vectorNthPLens(i)

Now we can use this composite lens to functionally update the value field of each of the items in a Vector of LineItems using some specific business rules ..

(0 to lis.length - 1).foldLeft(lis) {(s, i) => 
  val li = lis(i)
  lineItemValues(i).set(s, unitPrice(li.item).map(_ * li.quantity)).getOrElse(s)
}

In this post we saw how we can handle aggregates functionally and without any in-place mutation. This keeps the model pure and helps us implement domain models that has sane behavior even in concurrent settings without any explicit use of locks and semaphores. In the next post we will take a look at how we can use such compositional structures to make the domain model speak the ubiquitous language of the domain - another pattern recommended by Eric Evans in domain driven design.

Monday, March 31, 2014

Functional Patterns in Domain Modeling - The Specification Pattern

When you model a domain, you model its entities and behaviors. As Eric Evans mentions in his book Domain Driven Design, the focus is on the domain itself. The model that you design and implement must speak the ubiquitous language so that the essence of the domain is not lost in the myriads of incidental complexities that your implementation enforces. While being expressive the model needs to be extensible too. And when we talk about extensibility, one related attribute is compositionality.

Functions compose more naturally than objects and In this post I will use functional programming idioms to implement one of the patterns that form the core of domain driven design - the Specification pattern, whose most common use case is to implement domain validation. Eric's book on DDD says regarding the Specification pattern ..
It has multiple uses, but one that conveys the most basic concept is that a SPECIFICATION can test any object to see if it satisfies the specified criteria.
A specification is defined as a predicate, whereby business rules can be combined by chaining them together using boolean logic. So there's a concept of composition and we can talk about Composite Specification when we talk about this pattern. Various literature on DDD implement this using the Composite design pattern so commonly implemented using class hierarchies and composition. In this post we will use function composition instead.

Specification - Where ?

One of the very common confusions that we have when we design a model is where to keep the validation code of an aggregate root or any entity, for that matter.
  • Should we have the validation as part of the entity ? No, it makes the entity bloated. Also validations may vary based on some context, while the core of the entity remains the same.
  • Should we have validations as part of the interface ? May be we consume JSON and build entities out of it. Indeed some validations can belong to the interface and don't hesitate to put them there.
  • But the most interesting validations are those that belong to the domain layer. They are business validations (or specifications), which Eric Evans defines as something that "states a constraint on the state of another object". They are business rules which the entity needs to honor in order to proceed to the next stage of processing.
We consider the following simple example. We take an Order entity and the model identifies the following domain "specifications" that a new Order must satisfy before being thrown into the processing pipeline:

  1. it must be a valid order obeying the constraints that the domain requires e.g. valid date, valid no of line items etc.
  2. it must be approved by a valid approving authority - only then it proceeds to the next stage of the pipeline
  3. customer status check must be passed to ensure that the customer is not black-listed
  4. the line items from the order must be checked against inventory to see if the order can be fulfilled
These are the separate steps that are to be done in sequence by the order processing pipeline as pre-order checks before the actual order is ready for fulfilment. A failure in any of them takes the order out of the pipeline and the process stops there. So the model that we will design needs to honor the sequence as well as check all constraints that need to be done as part of every step.

An important point to note here is that none of the above steps mutate the order - so every specification gets a copy of the original Order object as input, on which it checks some domain rules and determines if it's suitable to be passed to the next step of the pipeline.

Jumping on to the implementation ..

Let's take down some implementation notes from what we learnt above ..

  • The Order can be an immutable entity at least for this sequence of operations
  • Every specification needs an order, can we can pull some trick out of our hat which prevents this cluttering of API by passing an Order instance to every specification in the sequence ?
  • Since we plan to use functional programming principles, how can we model the above sequence as an expression so that our final result still remains composable with the next process of order fulfilment (which we will discuss in a future post) ?
  • All these functions look like having similar signatures - we need to make them compose with each other
Before I present more of any explanation or theory, here are the basic building blocks which will implement the notes that we took after talking to the domain experts ..

type ValidationStatus[S] = \/[String, S]

type ReaderTStatus[A, S] = ReaderT[ValidationStatus, A, S]

object ReaderTStatus extends KleisliInstances with KleisliFunctions {
  def apply[A, S](f: A => ValidationStatus[S]): ReaderTStatus[A, S] = kleisli(f)
}

ValidationStatus defines the type that we will return from each of the functions. It's either some status S or an error string that explains what went wrong. It's actually an Either type (right biased) as implemented in scalaz.

One of the things which we thought will be cool is to avoid repeating the Order parameter for every method when we invoke the sequence. And one of the idioamtic ways of doing it is to use the Reader monad. But here we already have a monad - \/ is a monad. So we need to stack them together using a monad transformer. ReaderT does this job and ReaderTStatus defines the type that somehow makes our life easier by combining the two of them.

The next step is an implementation of ReaderTStatus, which we do in terms of another abstraction called Kleisli. We will use the scalaz library for this, which implements ReaderT in terms of Kleisli. I will not go into the details of this implementation - in case you are curious, refer to this excellent piece by Eugene.

So, how does one sample specification look like ?

Before going into that, here are some basic abstractions, grossly simplified only for illustration purposes ..

// the base abstraction
sealed trait Item {
  def itemCode: String
}

// sample implementations
case class ItemA(itemCode: String, desc: Option[String], 
  minPurchaseUnit: Int) extends Item
case class ItemB(itemCode: String, desc: Option[String], 
  nutritionInfo: String) extends Item

case class LineItem(item: Item, quantity: Int)

case class Customer(custId: String, name: String, category: Int)

// a skeleton order
case class Order(orderNo: String, orderDate: Date, customer: Customer, 
  lineItems: List[LineItem])

And here's a specification that checks some of the constraints on the Order object ..

// a basic validation
private def validate = ReaderTStatus[Order, Boolean] {order =>
  if (order.lineItems isEmpty) left(s"Validation failed for order $order") 
  else right(true)
}

It's just for illustration and does not contain much domain rules. The important part is how we use the above defined types to implement the function. Order is not an explicit argument to the function - it's curried. The function returns a ReaderTStatus, which itself is a monad and hence allows us to sequence in the pipeline with other specifications. So we get the requirement of sequencing without breaking out of the expression oriented programming style.

Here are a few other specifications based on the domain knowledge that we have gathered ..

private def approve = ReaderTStatus[Order, Boolean] {order =>
  right(true)
}

private def checkCustomerStatus(customer: Customer) = ReaderTStatus[Order, Boolean] {order =>
  right(true)
}

private def checkInventory = ReaderTStatus[Order, Boolean] {order =>
  right(true)
}

Wiring them together

But how do we wire these pieces together so that we have the sequence of operations that the domain mandates and yet all goodness of compositionality in our model ? It's actually quite easy since we have already done the hard work of defining the appropriate types that compose ..

Here's the isReadyForFulfilment method that defines the composite specification and invokes all the individual specifications in sequence using for-comprehension, which, as you all know does the monadic bind in Scala and gives us the final expression that needs to be evaluated for the Order supplied.

def isReadyForFulfilment(order: Order) = {
  val s = for {
    _ <- validate
    _ <- approve
    _ <- checkCustomerStatus(order.customer)
    c <- checkInventory
  } yield c
  s(order)
}

So we have the monadic bind implement the sequencing without breaking the compositionality of the abstractions. In the next instalment we will see how this can be composed with the downstream processing of the order that will not only read stuff from the entity but mutate it too, of course in a functional way.

Thursday, January 23, 2014

A Sketch as the Query Model of an EventSourced System

In my last post I discussed the count-min sketch data structure that can be used to process data streams using sub-linear space. In this post I will continue with some of my thoughts on how count-min sketches can be used in a typical event sourced application architecture. An event sourcing system typically has a query model which provides a read only view of how all the events are folded to provide a coherent view of the system. I have seen applications where the query model is typically rendered from a relational database. And the queries can take a lot of time to be successfully processed and displayed to the user if the data volume is huge. And when we are talking about Big Data, this is not a very uncommon use case.

Instead of rendering the query from the RDBMS, quite a few types of them can be rendered from a count-min sketch using sub-linear space. Consider the use case where you need to report the highest occuring user-ids in a Twitter stream. The stream is continuous, huge and non ending and you get to see each item once. So you get each item from where you parse out the user-id occurring in it and update the sketch. So each entry of the sketch contains the frequency of the user-id that hashes to that slot. And we can take the minimum of all the slots to which a user-id hashes to, in order to get the frequency of that user-id. The details of how this works can be found in my last post.

Consider the case where we need to find the heavy-hitters - those user-ids whose frequency exceeds a pre-determined threshold. For that, in addition to the sketch we can also maintain a data structure like heap or tree where we update the top-k heavy hitters. When a user-id appears, we update the sketch, get its estimated frequency from the sketch and if it exceeds the threshold, also record it in the data structure. So at any point in time we can probe this accessary data structure to get the current heavy-hitters. Spark examples contain a sample implementation of this heavy hitters query from a Twitter stream using the CountMinSketchMonoid of Algebird.

Can this be a viable approach of implementing the query model in an event sourced system if the use case fits the approximation query approach ? It can be faster, relatively cheap in space and can prove to be responsive enough to be displayed in dashboards in the form of charts or graphs.

Sunday, January 19, 2014

Count-Min Sketch - A Data Structure for Stream Mining Applications

In today's age of Big Data, streaming is one of the techniques for low latency computing. Besides the batch processing infrastructure of map/reduce paradigm, we are seeing a plethora of ways in which streaming data is processed at near real time to cater to some specific kinds of applications. Libraries like Storm, Samza and Spark belong to this genre and are starting to get their share of user base in the industry today.

This post is not about Spark, Storm or Samza. It's about a data structure which is one of the relatively new entrants in the domain of stream processing, which is simple to implement, but has already proved to be of immense use in serving a certain class of queries over huge streams of data. I have been doing some readings about the application of such structures and thought of sharing them with the readers of my blog.

Using Sublinear Space


Besides data processing, these tools also support data mining over streams that include serving specialized queries over data using limited space and time. Ok, so once we store all data as they come we can always serve queries with O(n) space. But since we are talking about huge data streams, it may not even be possible to run algorithms on the full set of data - it simply will be too expensive. Even if we have the entire set of data in a data warehouse, the processing of the entire data set may take time and consume resources that we cannot afford to have, considering the fee charged under the evolving models of using the platform-as-a-service within the cloud based infrastructure. Also the fact that these algorithms will be working on data streams, there's a high likelihood that they will get to see these data only in a single pass. The bottom line is that we need to have algorithms that work on sub-linear space.

Working on sublinear space implies that we don't get to store or see all data - hence an obvious conclusion from this will be the fact that we also don't get to deliver an accurate answer to some queries. We rely on some approximation techniques and deliver an accuracy with a reasonably high probability bound. We don't store all data, instead we store a lossy compressed representation of the data and deliver user queries from this subset instead of the entire set.

One widely used technique for storing a subset of data is through Random Sampling, where the data stored is selected through some stochastic mechanism. There are various ways to determine which data we select for storing and how we build the estimator for querying the data. There are pros and cons with this approach, but it's one of the simplest ways to do approximation based queries on streaming data.

There are a few other options like Histograms and Wavelet based synopses. But one of the most interesting data structures that have been developed in recent times is the Sketch, which uses summary based techniques for delivering approximation queries, gets around the typical problems that sampling techniques have and are highly parallelizable in practice.

An important class of sketch is one where the sketch vector (which is the summary information) is a linear transform of the input vector. So if we model the input as a vector we can multiply it by a sketch matrix to obtain the sketch vector that contains the synopses data that we can use for serving approximation queries. Here's a diagrammatic representation of the sketch as a linear transform of the input data.



Count-Min Sketch


One of the most popular forms of the sketch data structure is the Count-Min Sketch introduced by Muthukrishnan and Cormode in 2003. The idea is quite simple and the data structure is based on probabilistic algorithms to serve various types of queries on streaming data. The data structure is parameterized by two factors - ε and δ, where the error in answering the query is within a factor of ε with probability δ. So you can tune these parameters based on the space that you can afford and accordingly amortize the accuracy of results that the data structure can serve you.

Consider this situation where you have a stream of data (typically modeled as a vector) like updates to stock quotes in a financial processing system arriving continuously that you need to process and report statistical queries on a real time basis.

  • We model the data stream as a vector a[1 .. n] and the updates received at time t are of the form (it, ct) which mean that the stock quote for a[it] has been incremented by ct. There are various models in which this update can appear as discussed in Data Streams: Algorithms and Applications by Muthukrishnan which includes negative updates as well and the data structure can be tuned to handle each of these variants.


  • The core of the data structure is a 2 dimensional array count[w, d] that stores the synopses of the original vector and which is used to report results of queries using approximation techniques. Hence the total space requirement of the data structure is (w * d). We can bound each of w and d in terms of our parameters ε and δ and control the level of accuracy that we want our data structure to serve.


  • The data structure uses hashing techniques to process these updates and report queries using sublinear space. So assume we have d pairwise-independent hash functions {h1 .. hd} that hash each of our inputs to the range (1 .. w). Just for the more curious mind, pairwise independence is a method to construct a universal hash family, a technique that ensures lower number of collisions in the hash implementation.


  • When an update (it, ct) comes for the stream, we hash a[it] through each of the hash functions h1 .. hd and increment each of the w entries in the array that they hash to.



  • for i = 1 to d
      v = h(i)(a[it]) // v is between 1 and w
      count[i, v] += ct // increment the cell count by ct
    end

    At any point in time if we want to know the approximate value of an element a[i] of the vector a, we can get it from computing the minimum of all values in each of the d cells of count where i hashes to. This can be proved formally. But the general intuition is that since we are using hash functions there's always a possibility of multiple i's colliding on to the same cell and contributing additively to the value of the cell. Hence the minimum among all hash values is the closest candidate to give us the correct result for the query.


    The figure above shows the processing of the updates in a Count-Min sketch. This is typically called the Point Query that returns an approximation of a[i]. Similarly we can use a Count-Min sketch to get approximation queries for ranges which is typically a summation over multiple point queries. Another interesting application is to serve inner product queries where the data structure is used to query inner products of 2 vectors, a typical application of this being the estimation of join sizes in relational query processing. The paper Statistical Analysis of Sketch Estimators gives all details of how to use sketching as a technique for this.

    Count-Min sketches have some great properties which make them a very useful data structure when processing distributed streams. They have associativity properties and can be modelled as monoids and hence terribly performant in a distributed environment where you can parallelize sketch operations. In a future post I will discuss some implementation techniques and how we can use count-min sketches to serve some useful applications over data streams. Meanwhile Twitter's algebird and ClearSpring's stream-lib offer implementations of Count-Min sketch and various other data structures applicable for stream mining applications.



    Wednesday, July 24, 2013

    Scala Redis client goes non blocking : uses Akka IO

    scala-redis is getting a new non blocking version based on a kernel implemented with the new Akka IO. The result is that all APIs are non blocking and return a Future. We are trying to keep the API as close to the blocking version as possible. But bits rot and some of the technical debt need to be repaid. We have cleaned up some of the return types which had unnecessary Option[] wrappers, made some improvements and standardizations on the API type signatures and also working on making the byte array manipulation faster using akka.util.ByteString at the implementation level. We also have plans of using the Akka IO pipeline for abstracting the various stages of handling Redis protocol request and response.

    As of today we have quite a bit ready for review by the users. The APIs may change a bit here and there, but the core APIs are up there. There are a few areas which have not yet been implemented like PubSub or clustering. Stay tuned for more updates on this blog .. Here are a few code snippets that demonstrate the usage of the APIs ..

    Non blocking get/set

    @volatile var callbackExecuted = false
    
    val ks = (1 to 10).map(i => s"client_key_$i")
    val kvs = ks.zip(1 to 10)
    
    val sets: Seq[Future[Boolean]] = kvs map {
      case (k, v) => client.set(k, v)
    }
    
    val setResult = Future.sequence(sets) map { r: Seq[Boolean] =>
      callbackExecuted = true
      r
    }
    
    callbackExecuted should be (false)
    setResult.futureValue should contain only (true)
    callbackExecuted should be (true)
    
    callbackExecuted = false
    val gets: Seq[Future[Option[Long]]] = ks.map { k => client.get[Long](k) }
    val getResult = Future.sequence(gets).map { rs =>
      callbackExecuted = true
      rs.flatten.sum
    }
    
    callbackExecuted should be (false)
    getResult.futureValue should equal (55)
    callbackExecuted should be (true)
    

    Composing through sequential combinators

    val key = "client_key_seq"
    val values = (1 to 100).toList
    val pushResult = client.lpush(key, 0, values:_*)
    val getResult = client.lrange[Long](key, 0, -1)
    
    val res = for {
      p <- pushResult.mapTo[Long]
      if p > 0
      r <- getResult.mapTo[List[Long]]
    } yield (p, r)
    
    val (count, list) = res.futureValue
    count should equal (101)
    list.reverse should equal (0 to 100)
    

    Error handling using Promise Failure

    val key = "client_err"
    val v = client.set(key, "value200")
    v.futureValue should be (true)
    
    val x = client.lpush(key, 1200)
    val thrown = evaluating { x.futureValue } should produce [TestFailedException]
    thrown.getCause.getMessage should equal ("ERR Operation against a key holding the wrong kind of value")
    
    Feedbacks welcome, especially on the APIs and their usage. All code are in Github with all tests in the test folder. Jisoo Park (@guersam) has been doing an awesome job contributing a lot to all the goodness that's there in the repo. Thanks a lot for all the help ..

    Monday, July 22, 2013

    The Realm of Racket is an enjoyable read



    There are many ways to write a programming language book. You can start introducing the syntax and semantics of the language in a naturally comprehensible sequence of complexity and usage. Or you can choose to introduce the various features of the language with real world examples using the standard librray that the language offers. IIRC Accelerated C++ by Andrew Koenig and Barbara Moo takes this route. I really loved this approach and enjoyed reading the book.

    Of course Matthias Felleisen is known for a third way of teaching a language - the fun way. The Little Schemer and The Seasoned Schemer have introduced a novel way of learning a language. The Realm of Racket follows a similar style of teaching the latest descendant of Lisp, one game at a time. The implementation of every game introduces the idioms and language features with increasing degrees of complexity. There's a nice progression which helps understanding the complex features of the language building upon the already acquired knowledge of the simpler ones in earlier chapters.

    The book begins with a history of the Racket programming language and how it evolved as a descendant of Scheme, how it makes programming fun and how it can be used successfully as an introductory language to students aspiring to learn programming. Then it starts with Getting Started with Dr Racket for the impatient programmer and explains the IDE that will serve as your companion for the entire duration of your playing around with the book.

    Every chapter introduces some of the new language features and either develops a new game or builds on some improvement of a game developed earlier. This not only demonstrates a real world usage of the syntax and semantics of the language but makes the programmer aware of how the various features interact as a whole to build complex abstractions out of simpler ones. The book also takes every pain to defer the complexity of the features to the right point so that the reader is not burdened upfront. e.g. Lambdas are introduced only when the authors have introduced all basics of programming with functions and recursion. Mutants are introduced only after teaching the virtues of immutablity. For loops and comprehensions appear only when the book has introduced all list processing functions like folds, maps and filters. And then the book goes into great depth explaining why the language has so many variants of the for loop like for/list, for/fold, for*, for/first, for/last etc. In this entire discussion of list processing, for loops etc., I would love to see a more detailed discussion on sequence in the book. Sequence abstracts a large number of data types, but, much like Clojure it introduces a new way of API design - a single sequence to rule them all. API designers would surely like to have more of this sauce as part of their repertoire. Racket's uniform way of handling sequence is definitely a potent model of abstraction as compared to Scheme or other versions of Lisp.

    The games developed progress in complexity and we can see the powers of the language being put to great use when the authors introduce lazy evaluation and memoized computations and use them to improve the Dice of Doom. Then the authors introduce distributed game development which is the final frontier that the book covers. It's truly an enjoyable ride through the entire series.

    The concluding chapter talks about some of the advanced features like classes, objects and meta-programming. Any Lisp book will be incomplete without a discussion of macros and language development. But I think the authors have done well to defer these features till the end. Considering the fact that this is a book for beginning to learn the language this sounded like a sane measure.

    However, as a programmer experienced in other languages and wanting to take a look at Racket, I would have loved to see some coverage on testing. Introducing a bit on testing practices, maybe a unit testing library would have made the book more complete.

    The style of writing of this book has an underlying tone of humor and simplicity, which makes it a thoroughly enjoyable read. The use of illustrations and comics take away the monotony of learning the prosaics of a language. And the fact that Racket is a simple enough language makes this combination with pictures very refreshing.

    On the whole, as a book introducing a language, The Realm of Racket is a fun read. I enjoyed reading it a lot and recommend it without reservations for your bookshelf.



    Monday, June 03, 2013

    Endo is the new fluent API

    I tweeted this over the weekend .. My last two blog posts have been about endomorphisms and how it combines with the other functional structures to help you write expressive and composable code. In A DSL with an Endo - monoids for free, endos play with Writer monad and implement a DSL for a sequence of activities through monoidal composition. And in An exercise in Refactoring - Playing around with Monoids and Endomorphisms, I discuss a refactoring exercise that exploits the monoid of an endo to make composition easier. Endomorphisms help you lift your computation into a data type that gives you an instance of a monoid. And the mappend operation of the monoid is the function composition. Hence once you have the Endo for your type defined, you get a nice declarative syntax for the operations that you want to compose, resulting in a fluent API. Just a quick recap .. endomorphisms are functions that map a type on to itself and offer composition over monoids. Given an endomorphism we can define an implicit monoid instance ..
    implicit def endoInstance[A]: Monoid[Endo[A]] = new Monoid[Endo[A]] {
      def append(f1: Endo[A], f2: => Endo[A]) = f1 compose f2
      def zero = Endo.idEndo
    }
    
    I am not going into the details of this, which I discussed at length in my earlier posts. In this article I will sum up with yet another use case for making fluent APIs using the monoid instance of an Endo. Consider an example from the domain of securities trading, where a security trade goes through a sequence of transformations in its lifecycle through the trading process .. Here's a typical Trade model (very very trivialified for demonstration) ..
    sealed trait Instrument
    case class Security(isin: String, name: String) extends Instrument
    
    case class Trade(refNo: String, tradeDate: Date, valueDate: Option[Date] = None, 
      ins: Instrument, principal: BigDecimal, net: Option[BigDecimal] = None, 
      status: TradeStatus = CREATED)
    
    Modeling a typical lifecycle of a trade is complex. But for illustration, let's consider these simple ones which need to executed on a trade in sequence ..
    1. Validate the trade
    2. Assign value date to the trade, which will ideally be the settlement date
    3. Enrich the trade with tax/fees and net trade value
    4. Journalize the trade in books
    Each of the functions take a Trade and return a copy of the Trade with some attributes modified. A naive way of doing that will be as follows ..
    def validate(t: Trade): Trade = //..
    
    def addValueDate(t: Trade): Trade = //..
    
    def enrich(t: Trade): Trade = //..
    
    def journalize(t: Trade): Trade = //..
    
    and invoke these methods in sequence while modeling the lifecycle. Instead we try to make it more composable and lift the function Trade => Trade within the Endo ..
    type TradeLifecycle = Endo[Trade]
    
    and here's the implementation ..
    // validate the trade: business logic elided
    def validate: TradeLifecycle = 
      ((t: Trade) => t.copy(status = VALIDATED)).endo
    
    // add value date to the trade (for settlement)
    def addValueDate: TradeLifecycle = 
      ((t: Trade) => t.copy(valueDate = Some(t.tradeDate), status = VALUE_DATE_ADDED)).endo
    
    // enrich the trade: add taxes and compute net value: business logic elided
    def enrich: TradeLifecycle = 
      ((t: Trade) => t.copy(net = Some(t.principal + 100), status = ENRICHED)).endo
    
    // journalize the trade into book: business logic elided
    def journalize: TradeLifecycle = 
      ((t: Trade) => t.copy(status = FINALIZED)).endo
    
    Now endo has an instance of Monoid defined by scalaz and the mappend of Endo is function composition .. Hence here's our lifecycle model using the holy monoid of endo ..
    def doTrade(t: Trade) =
      (journalize |+| enrich |+| addValueDate |+| validate).apply(t)
    
    It's almost the specification that we listed above in numbered bullets. Note the inside out sequence that's required for the composition to take place in proper order.

    Why not plain old composition ?


    A valid question. The reason - abstraction. Abstracting the composition within types helps you compose the result with other types, as we saw in my earlier blog posts. In one of them we built larger abstractions using the Writer monad with Endo and in the other we used the mzero of the monoid as a fallback during composition thereby avoiding any special case branch statements.

    One size doesn't fit all ..


    The endo and its monoid compose beautifully and gives us a domain friendly syntax that expresses the business functionality ina nice succinct way. But it's not a pattern which you can apply everywhere where you need to compose a bunch of domain behaviors. Like every idiom, it has its shortcomings and you need different sets of solutions in your repertoire. For example the above solution doesn't handle any of the domain exceptions - what if the validation fails ? With the above strategy the only way you can handle this situation is to throw exceptions from validate function. But exceptions are side-effects and in functional programming there are more cleaner ways to tame the evil. And for that you need different patterns in practice. More on that in subsequent posts ..