Sunday, September 27, 2009

The Thrush combinator in Scala

In his book To Mock a Mockingbird, Raymond Smullyan teaches combinatory logic using songbirds in a forest. He derives important results combining various combinators, all using the birds of the enchanted forest. Combinators are an effective tool in designing abstractions with functional programming principles. They are reusable units, make you code very concise, without losing on the expressivity. More than the implementation, the names can go a long way in establishing a common vocabulary of programming idioms and techniques. In an earlier post, I talked about Kestrel, and its implementation in Scala for handling side-effects in an abstraction.

In this post, I look at Thrush, a permuting combinator. A Thrush is defined by the following condition: Txy = yx. Thrush reverses the order of evaluation. Raganwald talks about Thrush and its implementations in Ruby in an excellent post quite some time back.

Why would you want to reverse the order of evaluation in a computation ? Well, if you value readability of your code, and have been cringing at how a mix of expression oriented pipelines and nested function calls can make your code less readable, then a Thrush may be for you.

Consider the Ruby example that Raganwald discusses in his first example of Thrush implementation.

lambda { |x| x * x }.call((1..100).select(&:odd?).inject(&:+))

The argument to the proc is a pipeline expression that reads nicely from left to right : get the list of numbers from 1 to 100, select the odd ones and add them up. What the proc does is it finds the square of its input number. But the proc invocation is a function call, which, though is the last in sequence to be executed, has to be the first one that you read. You can find the Ruby implementation in Raganwald's blog that transforms the above code to a left-to-right pipeline expression using Thrush.

Let's try to see how we can do the same in Scala ..

In Scala, I can write the above as ..

((x: Int) => (* x))((1 to 100).filter(% 2 != 0).foldLeft(0)(_+_))

Almost the same as the Ruby code above, and has the similar drawback in readability.

Let's define the Scala Thrush combinator ..

case class Thrush[A](x: A) {
  def into[B](g: A => B): B = {
    g(x)
  }
}


Immediately we can write the above invocation as ..

Thrush((1 to 100)
  .filter(% 2 != 0)
  .foldLeft(0)(+ _))
  .into((x: Int) => x * x)


A very simple combinator, a permuting one that pushes the function to where it belongs in the line of readability. If you want to be more succinct, commit the sin of defining an implicit for your use case ..

implicit def int2Thrush(x: Int) = Thrush(x)

and immediately the above transforms to ..

(1 to 100)
  .filter(% 2 != 0)
  .foldLeft(0)(+ _)
  .into((x: Int) => x * x)


Does it read better ?

In fact with this implicit definition, you can go chaining into all the way ..

(1 to 100)
  .filter(% 2 != 0)
  .foldLeft(0)(+ _)
  .into((x: Int) => x * x)
  .into(* 2)


While designing domain APIs that need to be expressive, this technique can often come very handy. Here's an example that uses the Thrush combinator to ensure a clean pipeline expression flowing into a piece of DSL.

//..
accounts.filter(_ belongsTo "John S.") 
        .map(_.calculateInterest)
        .filter(> threshold)
        .foldLeft(0)(+ _)
        .into {x: Int =>
          updateBooks journalize(Ledger.INTEREST, x)
        }
//..

Sunday, September 13, 2009

Misconstrued Language Similarity Considered Harmful

Very frequently I come across posts of the form Language X for Language Y programmers. It's not that there is anything wrong with them, but, more often than not, the underlying tone of such posts is to highlight some apparent (and often misconstrued) similarities between the two languages.

Objects in Java and processes in Erlang have some similarity in the sense that both of them abstract some state of your application. But that's where the similarity ends - the rest is all gaping differences. Objects in a class oriented OO language like Java and processes in a concurrency oriented language like Erlang are miles apart in philosophy, usage, implementation and granularity. Read Joe Armstrong's Why OO sucks for details. But for the purpose of this post, the following statement from Joe suffices to nip in the bud any logic that claims objects in Java are similar to processes in Erlang ..

"As Erlang became popular we were often asked "Is Erlang OO" - well, of course the true answer was "No of course not" - but we didn't to say this out loud - so we invented a serious of ingenious ways of answering the question that were designed to give the impression that Erlang was (sort of) OO (If you waved your hands a lot) but not really (If you listened to what we actually said, and read the small print carefully)."

Similarity breeds Contentment

It's always comforting to have a base of similarity. It's human nature to force find a similar base and transform one's thought process with respect to it. With programming languages, it works only if the two languages share the same philosophy and implement similar ideas. But you can never learn a new language by pitching it against your own favorite language and identifying apparent similarities. It is these apparent similarities that tend to influence how you think of the idioms of the new language and you will be misled into believing and practising something that will lead you to the path of antipatterns. Consider static typing and dynamic typing - a debate that has possibly been beaten to death. But when you are making the change, learn to think in the new paradigm. It's foolish to think in terms of concrete types in a dynamically typed setting. Think in terms of the contracts that the abstraction implemnents and organize your tests around them. Ola Bini wrote a fantastic post on this in response to a twitter discussion that originated from me.

The starting point should always be the philosophy and the philosophical differences that the two languages imbibe. Haskell typeclasses may seem similar in many respects to polymorphism in object-oriented languages. In fact it is more similar to parametric polymorphism, while the most dominant form of polymorphism in OO is subtype polymorphism. And Haskell being a functional language does not support subtyping.

This post relates pattern matching in Erlang and conditionals in Java in the same vein. It's true both of them offer some form of dispatch in program control. But the more significant and idiomatic difference that matters in this context is between the conditional statements in Java and expressions in the functional setting of Erlang. It's this expression based programming that influences the way you structure your code in Erlang. Instead of highlighting upfront the similarity of both constructs as means of flow control, emphasize on the difference in thought process that gives your program a different geometry. And finally, the most important use of pattern matching is in programming with algebraic data types, a whole new idiom that gets unveiled.

Ok, if you want to write a post on language X for prospective programmers of language Y, go ahead and highlight the differences in philosophy and idioms. And then try to implement your favorite feature from language Y in X. Misconstrued similarities often bias programmers new to language X the wrong way.

Sunday, September 06, 2009

Side-effects with Kestrel in Scala

Consider the following piece of logic that we frequently come across in codebases ..

  1. val x = get an instance, either create it or find it

  2. manipulate x with post-creation activities (side-effects)

  3. use x


Step 2 is only for some side-effecting operations, maybe on the instance itself or for some other purposes like logging, registering, writing to database etc. While working on the serialization framework sjson, I have been writing pieces of code that follows exactly the above pattern to create Scala objects out of JSON structures. Now if you notice the above 3 steps, step 2 looks like being a part of the namespace which calls the function that sets up x. But logically step 2 needs to be completed before we can use x. Which means that step 2 is also a necessary piece of logic that needs to be completed before we hand over the constructed instance x to the calling context.

One option of course is to make Step 2 a part of the function in Step 1. But this is not always feasible, since Step 2 needs access to the context of the caller namespace.

Let's look at an idiom in Scala that expresses the above behavior more succinctly and leads us into implementing one of the most popularly used objects in combinatory logic.

Consider the following method in Scala that creates a new instance of a class with the arguments passed to it ..


def newInstance[T](args: Array[AnyRef])(implicit m: Manifest[T]): T = {
  val constructor = 
    m.erasure.getDeclaredConstructors.first
  constructor.newInstance(args: _*).asInstanceOf[T]
}



I can use it like ..

newInstance[Person](Array("ghosh", "debasish"))

for a class Person defined as ..

case class Person(lastName: String, firstName: String)

It's often the case that I would like to have some operations on the new instance after its creation which will be pure side-effects. It may or may not mutate the new instance, but will be done in the context of the new instance. I can very well do that like ..

val persons = new ListBuffer[Person]()
val p = newInstance[Person](Array("ghosh", "debasish"))
persons += p  // register to the list
persons.foreach(mail("New member has joined: " + p)) // new member mail to all
//.. other stuff


It works perfectly .. but we can make the code more expressive if we can somehow be explicit about the context of the block of code that needs to go with every new instance of Person being created. Maybe something like ..

newInstance[Person](Array("ghosh", "debasish")) { p =>
  persons += p
  persons.foreach(mail("New member has joined: " + p))
  //.. other stuff
}


This clearly indicates that the side-effecting steps of adding to the global list of persons or sending out a mail to every member is also part of the creation process of the new person. The effect is the same as the earlier example, only that it delineates the context more clearly. Though at the end of it all, it returns only the instance that it creates.

Consider another example of a good old Java bean ..

class Address {
  private String street;
  private String houseNumber;
  private String city;
  private String zip;

  //..
  //.. getters and setters
}


Working with a reflection based library it's not uncommon to see code that instantiates the bean using the default constructor and then allow clients to set the instance up with custom values .. something like ..

var p = Person(..)
val pAddress =
  newInstance[Address](null) {=>
    a.setStreet("Market Street")
    a.setHouseNumber("B/102")
    a.setCity("San Francisco")
    a.setZip("98032")
    p.address = a
    p.mail("Your address has been changed to: " + a)
  }


Once again the block is only for side-effects, which can contain lots of other custom codes that depends on the caller's context. Make it more concise, DSLish using the object import syntax of Scala ..

var p = Person(..)
val pAddress =
  newInstance[Address](null) {=>
    import a._
    setStreet("Market Street")
    setHouseNumber("B/102")
    setCity("San Francisco")
    setZip("98032")
    p.address = a
    p.mail("Your address has been changed to: " + a)
  }


Looks like a piece of idiom that can be effective as part of your programming repertoire. Here is the version of newInstance that allows you to make the above happen ..


def newInstance[T](args: Array[AnyRef])(op: T => Unit)(implicit m: Manifest[T]): T = {
  val constructor = 
    m.erasure.getDeclaredConstructors.first
  val v = constructor.newInstance(args: _*).asInstanceOf[T]
  op(v)
  v
}



Looking carefully at newInstance I realized that it is actually the Kestrel combinator that Raymond Smullyan explains so eloquently in his amazing book To Mock a Mockingbird. A bird K is called a Kestrel if for any birds x and y, (Kx)y = x. And that's exactly what's happening with newInstance. It does everything you pass onto the block, but ultimately returns the new instance that it creates. A nice way to plug in some side-effects. Reg has blogged about Kestrels in Ruby - the tap method in Ruby 1.9 and returning in Rails. These small code pieces may not be as ceremonious as the popularly used design patterns, but just as effective and provide deep insights into how our code needs to be structured for expressivity. Next time you discover any such snippet that you find useful for sharing, feel free to write a few lines of blog about it .. the community will love it ..