Monday, March 04, 2013

An exercise in Refactoring - Playing around with Monoids and Endomorphisms

A language is powerful when it offers sufficient building blocks for library design and adequate syntactic sugar that helps build expressive syntax on top of the lower level APIs that the library publishes. In this post I will discuss an exercise in refactoring while trying to raise the level of abstraction of a modeling problem.

Consider the following modeling problem that I recently discussed in one of the Scala training sessions. It's simple but offers ample opportunities to explore how we can raise the level of abstraction in designing the solution model. We will start with an imperative solution and then incrementally work on raising the level of abstraction to make the final code functional and more composable.

A Problem of Transformation ..

The problem is to compute the salary of a person through composition of multiple salary components calculated based on some percentage of other components. It's a problem of applying repeated transformations to a pipeline of successive computations - hence it can be generalized as a case study in function composition. But with some constraints as we will see shortly.

Let's say that the salary of a person is computed as per the following algorithm :

  1. basic = the basic component of his salary
  2. allowances = 20% of basic
  3. bonus = 10% of (basic + allowances)
  4. tax = 30% of (basic + allowances + bonus)
  5. surcharge = 10% of (basic + allowances + bonus - tax)
Note that the computation process starts with a basic salary, computes successive components taking the input from the previous computation of the pipeline. But there's a catch though, which makes the problem a bit more interesting from the modleing perspective. Not all components of the salary are mandatory - of course the basic is mandatory. Hence the final components of the salary will be determined by a configuration object which can be like the following ..

// an item = true means the component should be activated in the computation
case class SalaryConfig(
  surcharge: Boolean    = true, 
  tax: Boolean          = true, 
  bonus: Boolean        = true,
  allowance: Boolean    = true 
)

So when we compute the salary we need to take care of this configuration object and activate the relevant components for calculation.

A Function defines a Transformation ..

Let's first translate the above components into separate Scala functions ..

// B = basic + 20%
val plusAllowance = (b: Double) => b * 1.2

// C = B + 10%
val plusBonus = (b: Double) => b * 1.1

// D = C - 30%
val plusTax = (b: Double) => 0.7 * b

// E = D - 10%
val plusSurcharge = (b: Double) => 0.9 * b

Note that every function computes the salary up to the stage which will be fed to the next component computation. So the final salary is really the chained composition of all of these functions in a specific order as determined by the above stated algorithm.

But we need to selectively activate and deactivate the components depending on the SalaryConfig passed. Here's the version that comes straight from the imperative mindset ..

The Imperative Solution ..

// no abstraction, imperative, using var
def computeSalary(sc: SalaryConfig, basic: Double) = {
  var salary = basic
  if (sc.allowance) salary = plusAllowance(salary)
  if (sc.bonus) salary = plusBonus(salary)
  if (sc.tax) salary = plusTax(salary)
  if (sc.surcharge) salary = plusSurcharge(salary)
  salary
}

Straight, imperative, mutating (using var) and finally rejected by our functional mindset.

Thinking in terms of Expressions and Composition ..

Think in terms of expressions (not statements) that compose. We have functions defined above that we could compose together and get the result. But, but .. the config, which we somehow need to incorporate as part of our composable expressions.

So direct composition of functions won't work because we need some conditional support to take care of the config. How else can we have a chain of functions to compose ?

Note that all of the above functions for computing the components are of type (Double => Double). Hmm .. this means they are endomorphisms, which are functions that have the same argument and return type - "endo" means "inside" and "morphism" means "transformation". So an endomorphism maps a type on to itself. Scalaz defines it as ..

sealed trait Endo[A] {
  /** The captured function. */
  def run: A => A
  //..
}

But the interesting part is that there's a monoid instance for Endo and the associative append operation of the monoid for Endo is function composition. That seems mouthful .. so let's dissect what we just said ..

As you all know, a monoid is defined as "a semigroup with an identity", i.e.

trait Monoid[A] {
  def append(m1: A, m2: A): A
  def zero: A
}

and append has to be associative.

Endo forms a monoid where zero is the identity endomorphism and append composes the underlying functions. Isn't that what we need ? Of course we need to figure out how to sneak in those conditionals ..

implicit def endoInstance[A]: Monoid[Endo[A]] = new Monoid[Endo[A]] {
  def append(f1: Endo[A], f2: => Endo[A]) = f1 compose f2
  def zero = Endo.idEndo
}

But we need to append the Endo only if the corresponding bit in SalaryConfig is true. Scala allows extending a class with custom methods and scalaz gives us the following as an extension method on Boolean ..

/**
 * Returns the given argument if this is `true`, otherwise, the zero element 
 * for the type of the given argument.
 */
final def ??[A](a: => A)(implicit z: Monoid[A]): A = b.valueOrZero(self)(a)

That's exactly what we need to have the following implementation of a functional computeSalary that uses monoids on Endomorphisms to compose our functions of computing the salary components ..

// compose using mappend of endomorphism
def computeSalary(sc: SalaryConfig, basic: Double) = {
  val e = 
    sc.surcharge ?? plusSurcharge.endo     |+|
    sc.tax ?? plusTax.endo                 |+|
    sc.bonus ?? plusBonus.endo             |+| 
    sc.allowance ?? plusAllowance.endo 
  e run basic
}

More Generalization - Abstracting over Types ..

We can generalize the solution further and abstract upon the type that represents the collection of component functions. In the above implementation we are picking each function individually and doing an append on the monoid. Instead we can abstract over a type constructor that allows us to fold the append operation over a collection of elements.

Foldable[] is an abstraction which allows its elements to be folded over. Scalaz defines instances of Foldable[] typeclass for List, Vector etc. so we don't care about the underlying type as long as it has an instance of Foldable[]. And Foldable[] has a method foldMap that makes a Monoid out of every element of the Foldable[] using a supplied function and then folds over the structure using the append function of the Monoid.

trait Foldable[F[_]]  { self =>
  def foldMap[A,B](fa: F[A])(f: A => B)(implicit F: Monoid[B]): B
  //..
}

In our example, f: A => B is the endo function and the append is the append of Endo which composes all the functions that form the Foldable[] structure. Here's the version using foldMap ..

def computeSalary(sc: SalaryConfig, basic: Double) = {
  val components = 
    List((sc.surcharge, plusSurcharge), 
         (sc.tax, plusTax), 
         (sc.bonus, plusBonus),
         (sc.allowance, plusAllowance)
    )
  val e = components.foldMap(e => e._1 ?? e._2.endo)
  e run basic
}

This is an exercise which discusses how to apply transformations on values when you need to model endomorphisms. Instead of thinking in terms of generic composition of functions, we exploited the types more, discovered that our tranformations are actually endomorphisms. And then applied the properties of endomorphism to model function composition as monoidal appends. The moment we modeled at a higher level of abstraction (endomorphism rather than native functions), we could use the zero element of the monoid as the composable null object in the sequence of function transformations.

In case you are interested I have the whole working example in my github repo.

Friday, February 15, 2013

A DSL with an Endo - monoids for free

When we design a domain model, one of the issues that we care about is abstraction of implementation from the user level API. Besides making the published contract simple, this also decouples the implementation and allows post facto optimization to be done without any impact on the user level API.

Consider a class like the following ..

// a sample task in a project
case class Task(name: String) 

// a project with a list of tasks & dependencies amongst the
// various tasks
case class Project(name: String, 
                   startDate: java.util.Date, 
                   endDate: Option[java.util.Date] = None, 
                   tasks: List[Task] = List(), 
                   deps: List[(Task, Task)] = List())

We can always use the algebraic data type definition above to add tasks and dependencies to a project. Besides being cumbersome as a user level API, it also is a way to program too close to the implementation. The user is coupled to the fact that we use a List to store tasks, making it difficult to use any alternate implementation in the future. We can offer a Builder like OO interface with fluent APIs, but that also adds to the verbosity of implementation, makes builders mutable and is generally more difficult to compose with other generic functional abstractions.

Ideally we should be having a DSL that lets users create projects and add tasks and dependencies to them.

In this post I will discuss a few functional abstractions that will stay behind from the user APIs, and yet provide the compositional power to wire up the DSL. This is a post inspired by this post which discusses a similar DSL design using Endo and Writers in Haskell.

Let's address the issues one by one. We need to accumulate tasks that belong to the project. So we need an abstraction that helps in this accumulation e.g. concatenation in a list, or in a set or in a Map .. etc. One abstraction that comes to mind is a Monoid that gives us an associative binary operation between two objects of a type that form a monoid.

trait Monoid[T] {
  def append(m1: T, m2: T): T
  def zero: T
}

A List is a monoid with concatenation as the append. But since we don't want to expose the concrete data structure to the client API, we can talk in terms of monoids.

The other data structure that we need is some form of an abstraction that will offer us the writing operation into the monoid. A Writer monad is an example of this. In fact the combination of a Writer and a Monoid is potent enough to have such a DSL in the making. Tony Morris used this combo to implement a logging functionality ..

for {
  a <- k withvaluelog ("starting with " + _)
  b <- (a + 7) withlog "adding 7"
  c <- (b * 3).nolog
  d <- c.toString.reverse.toInt withvaluelog ("switcheroo with " + _)
  e <- (d % 2 == 0) withlog "is even?"
} yield e
We could use this same technique here. But we have a problem - Project is not a monoid and we don't have a definition of zero for a Project that we can use to make it a Monoid. Is there something that would help us get a monoid from Project i.e. allow us to use Project in a monoid ?

Enter Endo .. an endomorphism which is simply a function that takes an argument of type T and returns the same type. In Scala, we can state this as ..
sealed trait Endo[A] {
  // The captured function
  def run: A => A
  //..
}
Scalaz defines Endo[A] and provides a lot of helper functions and syntactic sugars to use endomorphisms. Among its other properties, Endo[A] provides a natural monoid and allows us to use A in a Monoid. In other words, endomorphisms of A form a monoid under composition. In our case we can define an Endo[Project] as a function that takes a Project and returns a Project. We can then use it with a Writer (as above) and implement the accumulation of tasks within a Project.

Exercise: Implement Tony Morris' logger without side-effects using an Endo.

Here's how we would like to accumulate tasks in our DSL ..
for {
  _ <- task("Study Customer Requirements")
  _ <- task("Analyze Use Cases")
  a <- task("Develop code")
} yield a


Let's define a function that adds a Task to a Project ..
// add task to a project
val withTask = (t: Task, p: Project) => p.copy(tasks = t :: p.tasks)


and use this function to define the DSL API task which makes an Endo[Project] and passes it as a Monoid to the Writer monad. In the following snippet, (p: Project) => withTask(t, p) is a mapping from Project => Project, which gets converted to an Endo and then passed to the Writer monad for adding to the task list of the Project.
def task(n: String): Writer[Endo[Project], Task] = {
  val t = Task(n)
  for {
    _ <- tell(((p: Project) => withTask(t, p)).endo)
  } yield t
}

The DSL snippet above is a monad comprehension. Let's add some more syntax to the DSL by defining dependencies of a Project. That's also a mapping from one Project state to another and can be realized using a similar function like withTask ..
// add project dependency
val withDependency = (t: Task, on: Task, p: Project) => 
  p.copy(deps = (t, on) :: p.deps)

.. and define a function dependsOn to our DSL that allows the user to add the explicit dependencies amongst tasks. But this time instead of making it a standalone function we will make it a method of the class Task. This is only for getting some free syntactic sugar in the DSL. Here's the modified Task ADT ..
case class Task(name: String) {
  def dependsOn(on: Task): Writer[Endo[Project], Task] = {
    for {
      _ <- tell(((p: Project) => withDependency(this, on, p)).endo)
    } yield this
  }
}
Finally we define the last API of our DSL that glues together the building of the Project and the addition of tasks and dependencies without directly coupling the user to some of the underlying implementation artifacts.
def project(name: String, startDate: Date)(e: Writer[Endo[Project], Task]) = {
  val p = Project(name, startDate)
  e.run._1(p)
}
And we can finally create a Project along with tasks and dependencies using our DSL ..
project("xenos", now) {
  for {
    a <- task("study customer requirements")
    b <- task("analyze usecases")
    _ <- b dependsOn a
    c <- task("design & code")
    _ <- c dependsOn b
    d <- c dependsOn a
  } yield d
}
In case you are interested I have the whole working example in my github repo.

Friday, February 01, 2013

Modular Abstractions in Scala with Cakes and Path Dependent Types

I have been trying out various options of implementing the Cake pattern in Scala, considered to be one of the many ways of doing dependency injection without using any additional framework. There are other (more functional) ways of doing the same thing, one of which I blogged about before and also talked about in a NY Scala meetup. But I digress ..

Call it DI or not, the Cake pattern is one of the helpful techniques to implement modular abstractions in Scala. You weave your abstract components (aka traits), layering on the dependencies and commit to implementations only at the end of the world. I was trying to come up with an implementation that does not use self type annotations. It's not that I think self type annotations are kludgy or anything but I don't find them used elsewhere much besides the Cake pattern. And of course mutually recursive self annotations are a code smell that makes your system anti-modular.

In the following implementation I use path dependent types, which have become a regular feature in Scala 2.10. Incidentally it was there since long back under the blessings of an experimental feature, but has come out in public only in 2.10. The consequence is that instead of self type annotations or inheritance I will be configuring my dependencies using composition.

Let me start with some basic abstractions of a very simple domain model. The core component that I will build is a service that reports the portfolio of clients as a balance. The example has been simplified for illustration purposes - the actual real life model has a much more complex implementation.

A Portfolio is a collection of Balances. A Balance is a position of an Account in a specific Currency as on a particular Date. Expressing this in simple terms, we have the following traits ..

// currency
sealed trait Currency
case object USD extends Currency
case object EUR extends Currency
case object AUD extends Currency

//account
case class Account(no: String, name: String, openedOn: Date, status: String)

trait BalanceComponent {
  type Balance

  def balance(amount: Double, currency: Currency, asOf: Date): Balance
  def inBaseCurrency(b: Balance): Balance
}

The interesting point to note is that the actual type of Balance has been abstracted in BalanceComponent, since various services may choose to use various representations of a Balance. And this is one of the layers of the Cake that we will mix finally ..

Just a note for the uninitiated, a base currency is typically considered the domestic currency or accounting currency. For accounting purposes, a firm may use the base currency to represent all profits and losses. So we may have some service or component that would like to have the balances reported in base currency.

trait Portfolio {
  val bal: BalanceComponent
  import bal._

  def currentPortfolio(account: Account): List[Balance]
} 

Portfolio uses the abstract BalanceComponent and does not commit to any specific implementation. And the Balance in the return type of the method currentPortfolio is actually a path dependent type, made to look nice through the object import syntax.

Now let's have some standalone implementations of the above components .. we are still not there yet to mix the cake ..

// report balance as a TUPLE3 - simple
trait SimpleBalanceComponent extends BalanceComponent {
  type Balance = (Double, Currency, Date)

  override def balance(amount: Double, currency: Currency, asOf: Date) = 
    (amount, currency, asOf)
  override def inBaseCurrency(b: Balance) = 
    ((b._1) * baseCurrencyFactor.get(b._2).get, baseCurrency, b._3)
}

// report balance as an ADT
trait CustomBalanceComponent extends BalanceComponent {
  type Balance = BalanceRep

  // balance representation
  case class BalanceRep(amount: Double, currency: Currency, asOf: Date)

  override def balance(amount: Double, currency: Currency, asOf: Date) = 
    BalanceRep(amount, currency, asOf)
  override def inBaseCurrency(b: Balance) = 
    BalanceRep((b.amount) * baseCurrencyFactor.get(b.currency).get, baseCurrency, b.asOf)
}

And a sample implementation of ClientPortfolio that adds logic without yet commiting to any concrete type for the BalanceComponent.

trait ClientPortfolio extends Portfolio {
  val bal: BalanceComponent
  import bal._

  override def currentPortfolio(account: Account) = {
    //.. actual impl will fetch from database
    List(
      balance(1000, EUR, Calendar.getInstance.getTime),
      balance(1500, AUD, Calendar.getInstance.getTime)
    )
  }
}

Similar to ClientPortfolio, we can have multiple implementations of Portfolio reporting that reports balances in various forms. So our cake has started taking shape. We have the Portfolio component and the BalanceComponent already weaved in without any implementation. Let's add yet another layer to the mix, maybe for fun - a decorator for the Portfolio.

We add Auditing as a component which can decorate *any* Portfolio component and report the balance of an account in base currency. Note that Auditing needs to abstract implementations of BalanceComponent as well as Portfolio since the idea is to decorate any Portfolio component using any of the underlying BalanceComponent implementations.

Many cake implementations use self type annotations (or inheritance) for this. I will be using composition and path dependent types.

trait Auditing extends Portfolio {
  val semantics: Portfolio
  val bal: semantics.bal.type
  import bal._

  override def currentPortfolio(account: Account) = {
    semantics.currentPortfolio(account) map inBaseCurrency
  }
}

Note how the Auditing component uses the same Balance implementation as the underlying decorated Portfolio component, enforced through path dependent types.

And we have reached the end of the world without yet committing to any implementation of our components .. But now let's do that and get a concrete service instantiated ..

object SimpleBalanceComponent extends SimpleBalanceComponent
object CustomBalanceComponent extends CustomBalanceComponent

object ClientPortfolioAuditService1 extends Auditing {
  val semantics = new ClientPortfolio { val bal = SimpleBalanceComponent }
  val bal: semantics.bal.type = semantics.bal
}

object ClientPortfolioAuditService2 extends Auditing {
  val semantics = new ClientPortfolio { val bal = CustomBalanceComponent }
  val bal: semantics.bal.type = semantics.bal
}

Try out in your Repl and see how the two services behave the same way abstracting away all implementations of components from the user ..

scala> ClientPortfolioAuditService1.currentPortfolio(Account("100", "dg", java.util.Calendar.getInstance.getTime, "a"))
res0: List[(Double, com.redis.cake.Currency, java.util.Date)] = List((1300.0,USD,Thu Jan 31 12:58:35 IST 2013), (1800.0,USD,Thu Jan 31 12:58:35 IST 2013))

scala> ClientPortfolioAuditService2.currentPortfolio(Account("100", "dg", java.util.Calendar.getInstance.getTime, "a"))
res1: List[com.redis.cake.ClientPortfolioAuditService2.bal.Balance] = List(BalanceRep(1300.0,USD,Thu Jan 31 12:58:46 IST 2013), BalanceRep(1800.0,USD,Thu Jan 31 12:58:46 IST 2013))

The technique discussed above is inspired from the paper Polymoprhic Embedding of DSLs. I have been using this technique for quite some time and I have discussed a somewhat similar implementation in my book DSLs In Action while discussing internal DSL design in Scala.

And in case you are interested in the full code, I have uploaded it on my Github.

Monday, January 14, 2013

A language and its interpretation - Learning free monads

I have been playing around with free monads of late and finding them more and more useful in implementing separation of concerns between pure data and its interpretation. Monads generally don't compose. But if you restrict monads to a particular form, then you can define a sum type that composes. In the paper Data Types a la carte, Wouter Swierstra describes this form as
data Term f a =
    Pure a
    | Impure (f (Term f a)) 
These monads consist of either pure values or an impure effect, constructed using f. When f is a functor, Term f is a monad. And in this case, Term f is the free monad being the left adjoint to the forgetful functor f.

I am not going into the details of what makes a monad free. I once asked this question on google+ and Edward Kmett came up with a beautiful explanation. So instead of trying to come up with a half assed version of the same, have a look at Ed's response here .

In short, we can say that a free monad is the freeest object possible that's still a monad.

Composition

Free monads compose and help you build larger abstractions which are pure data and yet manage to retain all properties of a monad. Hmm .. this sounds interesting because now we can not only build abstractions but make them extensible through composition by clients using the fact that it's still a monad.

A free monad is pure data, not yet interpreted, as we will see shortly. You can pass it to a separate interpreter (possibly multiple interpreters) which can do whatever you feel like with the structure. Your free monad remains pure while all impurities can be put inside your interpreters.

And so we interpret ..

In this post I will describe how I implemented an interpreter for a Joy like concatenative language that uses the stack for its computation. Of course it's just a prototype and addresses an extremely simplified subset, but you get the idea. The basic purpose is to explore the power of free monads and come up with something that can potentially be extended to a full blown implementation of the language.

When designing an interpreter there's always the risk of conflating the language along with the concerns of interpreting it. Here we will have the 2 completely decoupled by designing the core language constructs as free monads. Gabriel Gonzalez has written a series of blog posts [1,2,3] on the use of free monads which contain details of their virtues and usage patterns. My post is just an account of my learning experience. In fact after I wrote the post, I discovered a similar exercise done for embedding Forth in Haskell - so I guess I'm not the first one to learn free monads using a language interpreter.

Let's start with some code, which is basically a snippet of Joy like code that will run within our Haskell interpreter ..
p :: Joy ()
p = do push 5
       push 6
       add
       incr
       add2
       square
       cube
       end

This is our wish list and at the end of the post we will see if we can interpret this correctly and reason about some aspects of this code. Let's not bother much about the details of the above snippet. The important point is that if we fire it up in ghci, we will see that it's pure data!
*Joy> p
Free (Push 5 (Free (Push 6 (Free (Add (Free (Push 1 (Free (Add (Free (Push 1 (Free (Add (Free (Push 1 (Free (Add (Free (Dup (Free (Mult (Free (Dup (Free (Dup (Free (Mult (Free (Mult (Free End))))))))))))))))))))))))))))))
*Joy> 

We haven't yet executed anything of the above code. It's completely free to be interpreted and possibly in multiple ways. So, we have achieved this isolation - that the data is freed from the interpreter. You want to develop a pretty printer for this data - go for it. You want to apply semantics and give an execution model based on Joy - do it.

Building the pure data (aka Builder)

Let's first define the core operators of the language ..
data JoyOperator cont = Push Int cont
                      | Add      cont 
                      | Mult     cont 
                      | Dup      cont 
                      | End          
                      deriving (Show, Functor)

The interesting piece is the derivation of the Functor, which is required for implementing the forgetful functor adjoint of the free monad. Keeping the technical mumbo jumbo aside, free monads are just a general way of turning functors into monads. So if we have a core operator f as a functor, we can get a free monad Free f out of it. And knowing something is a free monad helps you transform transform an operation over the monad (the monad homomorphism) into an operation over the functor (functor homomorphism). We will see how this helps later ..

The other point to note is that all operators take a continuation argument that points to the next operation in the chain. End is the terminal symbol and we plan to ignore anything that the user enters after an End.

Push takes an Int and pushes into the stack, Add pops the top 2 elements of the stack and pushes the sum, Mult does the same for multiplication and Dup duplicates the top element of the stack. End signifies the end of program.

Next we define the free monad over JoyOperator by using the Free data constructor, defined as part of Control.Monad.Free ..

data Free f a = Pure a | Free (f (Free f a))

-- | The free monad over JoyOperator
type Joy = Free JoyOperator

And then follow it up with some of the definitions of Joy operators as operations over free monads. Note that liftF lifts an operator (which is a Functor) into the context of a free monad. liftF has the following type ..

liftF :: Functor f => f a -> Free f a

As a property, a free moand has a forgetful functor as its left adjoint. The unlifting from the monad to the functor is given by the retract function ..

retract :: Monad f => Free f a -> f a

and needless to say

retract . liftF = id

-- | Push an integer to the stack
push :: Int -> Joy ()
push n = liftF $ Push n ()

-- | Add the top two numbers of the stack and push the sum
add :: Joy ()
add = liftF $ Add ()

.. and this can be done for all operators that we wish to support.

Not only this, we can also combine the above operators and build newer ones. Remember we are working with monads and hence the *do* notation based sequencing comes for free ..

-- | This combinator adds 1 to a number. 
incr :: Joy ()
incr = do {1; add}

-- | This combinator increments twice
add2 :: Joy ()
add2 = do {incr; incr}

-- | This combinator squares a number
square :: Joy ()
square = do {dup; mult}

Now we can have a composite program which sequences through the core operators as well as the ones we derive from them. And that's what we posted as our first example snippet of a target program.

An Interpreter (aka Visitor)

Once we have the pure data part done, let's try and build an interpreter that does the actual execution based on the semantics that we defined on the operators.

-- | Run a joy program. Result is either an Int or an error
runProgram :: Joy n -> Either JoyError Int
runProgram program = joy [] program
  where joy stack (Free (Push v cont))         = joy (v : stack) cont
        joy (a : b : stack) (Free (Add cont))  = joy (a + b : stack) cont
        joy (a : b : stack) (Free (Mult cont)) = joy (a * b : stack) cont
        joy (a : stack) (Free (Dup cont))      = joy (a : a : stack) cont
        joy _ (Free Add {})                    = Left NotEnoughParamsOnStack
        joy _ (Free Dup {})                    = Left NotEnoughParamsOnStack
        joy [] (Free End)                      = Left NotEnoughParamsOnStack
        joy [result] (Free End)                = Right result
        joy _ (Free End)                       = Left NotEmptyOnEnd
        joy _ Pure {}                          = Left NoEnd

runProgram is the interpreter that takes a free monad as its input. Its implementation is quite trivial - it just matches on the recursive structure of the data and pushes the appropriate results on the stack. Now if we run our program p using the above interpreter we get the correct result ..

*Joy> runProgram p
Right 7529536
*Joy> 

Equational Reasoning

Being Haskell and being pure, we obviously can prove some bits and pieces of our program as mathematical equations. At the beginning I said that End is the end of the program and anything after End needs to be ignored. What happens if we do the following ..

runProgram $ do {push 5; incr; incr; end; incr; incr}

If you have guessed correctly we get Right 7, which means that all operations after end have been ignored. But can we prove that our program indeed does this ? The following is a proof that end indeed ends our program. Consider some operation m follows end ..

end >> m

-- definition of end
= liftF $ End >> m

-- m >> m' = m >>= \_ -> m'
= liftF $ End >>= \_ -> m

-- definition of liftF
= Free End >>= \_ -> m

-- Free m >>= f = Free (fmap (>>= f) m)
= Free (fmap (>>= \_ -> m) End)

-- fmap f End = End
= Free End

-- liftF f = Free f (f is a functor)
= liftF End

-- definition of End
= end

So this shows that any operation we do after end is never executed.

Improving the bind

Free monads offer improved modularity by allowing us to separate the building of pure data from the (possibly) impure concerns of interpreting it with the external world. But a naive implementation leads to a penalty in runtime efficiency as Janis Voigtlander discusses in his paper Asymptotic Improvement of Computations over Free Monads. And the Haskell's implementation of free monads had this inefficiency where the asymptotic complexity of substitution was quadratic because of left associative bind. Edward Kmett implemented the Janis trick and engineered a solution that gets over this inefficiency. I will discuss this in a future post. And if you would like to play around with the interpreter, the source is there in my github.

Thursday, January 03, 2013

strict : recursion :: non-strict : co-recursion

Consider the very popular algorithm that uses a tail recursive call to implement a map over a List. Here's the implementation in F#

let map converter l =
  let rec loop acc = function
    | [] -> acc
    | hd :: tl -> loop (converter hd :: acc) tl
  List.rev (loop [] l)

Scala is also a statically typed functional programming language, though it uses a completely different trick and mutability to implement map over a sequence. Let's ignore it for the time being and imagine we are being a bonafide functional programmer.

The above code uses a very common idiom of accumulate-and-reverse to implement a tail recursive algorithm. Though Scala stdlib does not use this technique and uses a mutable construct for this implementation, we could have done the same thing in Scala as well ..

Both Scala and F# are languages with strict evaluation semantics as the default. What would a similar tail recursive Haskell implementation look like ?

map' f xs = reverse $ go [] xs
    where
    go accum [] = accum
    go accum (x:xs) = go (f x : accum) xs

Let's now have a look at the actual implementation of map in Haskell prelude ..

map :: (a -> b) -> [a] -> [b]
map _ []     = []
map f (x:xs) = f x : map f xs

Whoa! It's not a tail recursion and instead a body recursive implementation. Why is that ? We have been taught that tail recursion is the holy grail since it takes constant stack space and all ..

In a strict language the above implementation of map will be a bad idea since it uses linear stack space. On the other hand the initial implementation is tail recursive. But Haskell has non-strict semantics and hence the last version of map consumes only one element of the input and yields one element of the output. The earlier versions consume the whole input before yielding any output.

In a lazy language what you need is to make your algorithm co-recursive. And this needs co-data. Understanding co-data or co-recursion needs a brief background of co-induction and how it differs from induction.

When we define a list in Haskell as

data [a] = [] | a : [a]

it means that the set "List of a" is the smallest set such that [] is in the List of a, and if [a] is in the List of a and a is in a, then a : [a] is in List of a. This is the inductive definition os a List and we can use recursion to implement various properties of a List. Also once we have a recursive definition, we can use structural induction to prove various properties of the data structure.

If an inductive definition on data gives us the smallest set, a co-inductive definition on co-data gives us the largest set. Let's define a Stream in Haskell ..

data Stream a = Cons a (Stream a)

First to note - unlike the definition of a List, there's no special case for empty Stream. Hence no base case unlike the inductive definition above. And a "Stream of a" is the largest set consisting of a pair of an "a" and a "Stream of a". Which means that Stream is an infinite data structure i.e. the range (or co-domain) of a Stream is infinite. And we implement properties on co-inductive data structures using co-recursive programs. A co-recursive program is defined as one whose range is a type defined recursively as the greatest solution of some equation.

In terms of a little bit of mathematical concepts, if we model types as sets, then an infinite list of integers is the greatest set X for which there is a bijection X ≅ ℤ x X and a program that generates such a list is a co-recursive program. As its dual, the type of a finite list is the least set X for which X ≅ 1 + (ℤ x X), where 1 is a singleton set and + is the disjoint union of sets. Any program which consumes such an input is a recursive program. (Note the dualities in italics).

Strictly speaking, the co-domain does not necessarily have to be infinite. If you have read my earlier post on initial algebra, co-recursion recursively defines functions whose co-domain is a final data type, dual to the way that ordinary recursion recursively defines functions whose domain is an initial data type.

In case of primitive recursion (with the List data type above), you always frame the recursive step to operate on data which gets reduced in size in subsequent steps of recursive calls. It's not so with co-recursion since you may be working with co-data and infinite data structures. Since in Haskell, a List can also be infinite, using the above co-recursive definition of map, we can have

head $ map (1+) [1..]

which invokes map on an infinite list of integers. And here the co-recursive steps of map operate successively on sets of data which are not less than the earlier set. So, it's not tail recursion that makes an efficient implementation in Haskell, you need to make the co-recursive call within the application of a constructor.

As the first post of the new year, that's all friends. Consider this as the notes from someone learning the principles of co-inductive definitions. Who knew co-recursion is so interesting ? It will be more interesting once we get down into the proof techniques for co-recursive programs. But that's for a future post ..

Saturday, December 29, 2012

The Best of 2012 - Looking Back

The year 2012 was a bit different for me in terms of readings and enlightenment. While once again Scala proved to be the most dominant language that I used throughout the year, I also had some serious productive time and mind share to learning machine learning and related topics. Online courses at Coursera were of immense help in this regard and proved to be the catalyst of inspiration. Being in India, staying within the comfort zones of my happy home, I could attend the classes of professors like Daphne Koller, Martin Odersky and Geoffrey Hinton. Thanks a lot Coursera for this enlightening experience ..

online courses that I attended ..



the best of the papers that I read ..


I usually read lots of papers on my subjects of interest. The total number are too many to list here. But here are a few of them, mostly from the domain of programming languages and type theory. In this regard I must confess that I am not a programming language theory person. But I sincerely believe that you need to know some theory to make sense of its implementation. That's the reason I think programmers (yes. serious programmers using a decent programming language with a static type system) will do better to learn a bit of category theory.

As I mentioned at the beginning, currently my programming language of choice is Scala and I have been exploring how to write better code with Scala's multi-paradigm capabilities. Though I tend to use the functional power of Scala more than its object oriented features, there are quite a few idioms that I use where I find its object oriented capabilities useful. Scala 2.10 will now be released any time and I am even more excited to try out many of the new features that it will offer.

Ok .. here are some of the papers that I enjoyed reading in 2012 ..


a few good books ..


This is also a different list that what I expected at the beginning of the year. It's mostly due to my focus on machine learning and AI related topics that I took up in course of my journey.


3 conferences in a year - my best so far ..


This was my second consecutive PhillyETE and I spoke on functional approach to domain modeling. In case you missed it earlier, here's the presentation up on slideshare. I combined my PhillyETE trip with a trip to London to attend ScalaDays 2012. It was a great experience listening to 3 keynotes by Guy Steele, Simon Peyton Jones and Martin Odersky. I also got to meet a number of my friends on Twitter who rock the Scala community .. it was so nice getting to know the faces behind the twitter handles. Later in the year I also spoke at QCon NY on the topic of domain modeling in the functional world.

One of the conferences that I plan to attend in 2013 is The Strange Loop. It has been one of the awesomest conferences so far and I hate to be tracking it only on Twitter every year. Keeping fingers crossed ..

some plans for 2013 (Not Resolutions though!)


In no specific order ..

  • Do more Haskell - Haskell is surely in for some big renaissance. I just spent a couple of hours of very productive time watching Edward Kmett explain his new lens library.
  • Continue Scala - obviously .. needs no explanation ..
  • Explore more of machine learning and work on at least one concrete project. I am now reading 2 solid books on ML theory - as I mentioned above I feel a bit uncomfortable learning implementations without any of the theory that drive them. While both of these books assume a solid background of mathematics and statistics (which I don't have), I am trying to crawl through them, greedily accumulating necessary math background as much as I can. Wish I could take a sabbatical leave for 1 year and finish both of them.

Guess that's all .. See you all on the other side of 2013. Happy holidays and have a gorgeous 2013.

Monday, October 01, 2012

Towards better refactoring support in IDEs for functional programming


A couple of days back I was thinking how we could improve the state of IDEs and let them give rich feedbacks to users focusing on code improvement and refactoring. When you write Java programs, IntelliJ IDEA is possibly the best IDE you can have with an extensive set of refactoring tools. But most of these are targeted towards reducing boilerplates and do refactor based on structural changes only (e.g. replace constructor with builder or use interfaces wherever possible etc.). Can we improve the state of the art when we are using a semantically richer language like Scala or Haskell ? How about suggesting some idioms that relate to the paradigm of functional programming and explore some of the best practices suggested by the masters of the trade ?

I always find many programmers use Option.get in Scala when they should be using higher order structures instead. Here's an example of the anti-pattern ..

if oName.isDefined oName.get.toUpperCase else "Name not present"

which should ideally be refactored to

oName.map(_.toUpperCase).getOrElse("Name not present")

The advantage with the latter is that it's more composable and can be pipelined to other higher order functions down the line without introducing any temporary variables.

Many people also use pattern matching for destructuring an Option ..

oName match {
  case Some(n) => n.toUpperCase
  case _ => "Name not present"
}

This is also not what idioms espouse and is almost equivalent to explicit null checking.

Can an IDE suggest such refactoring ? I tweeted about my thoughts and almost immediately Mirko, who is doing some great stuff on the Scala IDE, added this to his list of ideas. Nice!

Here's another idea. How about suggesting some program transformations for functional programming ? I know Scala is not a pure functional language and we don't have effect tracking. I have seen people use side-effecting functions in map, which, I know is a strict no-no. But again, the compiler doesn't complain. Still we can have the IDE suggest some refactorings assuming the programmer uses purity as per recommendations. After all, these are suggestions only and can enlighten a programmer with some of the realizations of how to model based on the best practices.

Consider this ..

(-1000000 to 1000000).map(_ + 1).filter(_ > 0)

Here the map is executed on all the elements and then filter processes the whole output. One option to optimize will be to defer the computations using view ..

(-1000000 to 1000000).view.map(_ + 1).filter(_ > 0)

But this doesn't reduce the load of computation on map. It just prevents the creation of a temporary list as an output of the map. We can apply a transformation called filter promotion that does the filtering first so that map can operate on a potentially smaller list. Of course this assumes that both functions are *pure* so that operations can be reordered - but that's what we should anyway ensure while doing functional programming .. right ? Here's the transformation ..

(-1000000 to 1000000).filter(((_: Int) > 0) compose ((_: Int) + 1)).map(_ + 1)

which can be further simplified into ..

(-1000000 to 1000000).filter((_: Int) >= 0).map(_ + 1)

.. and this has same computation load on filter but potentially less load on map, since it now has to process a filtered list.

Can we have such transformations available in the IDE ? I am a faithful user of vim, but with such enrichments I may consider going back to an IDE. TYhis really has no limits. Consider Wadler's free theorems. Many times we tend to write functions whose arguments are more specialized in types than what's required. Generalization of such functions using polymorphic arguments can lead to better abstraction and verification through free theorems. An IDE can suggest all these and may also help generate properties for property based testing using tools like QuickCheck and ScalaCheck. But that's discussion for another day ..

P.S. Simon Thompson's Haskell The Craft of Functional Programming 3rd edition discusses lots of similar transformations in various chapters that discuss reasoning or Haskell programs.