Sunday, January 03, 2010

Pragmatics of Impurity

James Hague, a long time Erlanger, drives home a point or two regarding purity of paradigms in a couple of his latest blog posts. Here's his take on being effective with pure functional languages ..

"My real position is this: 100% pure functional programing doesn't work. Even 98% pure functional programming doesn't work. But if the slider between functional purity and 1980s BASIC-style imperative messiness is kicked down a few notches--say to 85%--then it really does work. You get all the advantages of functional programming, but without the extreme mental effort and unmaintainability that increases as you get closer and closer to perfectly pure."

Purity is not necessarily pragmatic. In my last blog post I also tangentially touched upon the notion of purity while discussing how a *hybrid* model of SQL-NoSQL database stack can be effective for large application deployments. Be it with programming languages or with databases or any other paradigms of computation, we need to have the right balance of purity and pragmatism.

Clojure introduced transients. Rich Hickey says in the rationale .. "If a pure function mutates some local data in order to produce an immutable return value, is that ok?". Transients in Clojure allow localized mutation in initializing or transforming a large persistent data structure. This mutation will only be seen by the code that does the transformation - the client gets back a version for immutable use that can be shared. In no way does this invalidate the benefits that immutability brings in reasoning of Clojure programs. It's good to see Rich Hickey being flexible and pragmatic at the expense of injecting that little impurity into his creation.

Just like the little compromise (and big pragmatism) with the purity of persistent data structures, Clojure also made a similar compromise with laziness by introducing chunked sequences that optimize the overhead associated with lazy sequences. These are design decisions that have been taken consciously by the creator of the language that values pragmatism over purity.

Enough has already been said about the virtues of purity in functional languages. Believe me, 99% of the programming world does not even care for purity. They do what works best for them and hybrid languages are mostly the ones that find the sweetest spots. Clojure is as impure as Scala is, considering the fact that both allow side-effecting with mutable references and uncontrolled IO. Even Erlang has uncontrolled IO and a mutable process dictionary, though its use is often frowned upon within the community. The important point is that all of them have proved to be useful to programmers at large.

Why do creators infuse impurity into their languages ? Why aren't every language created as pure as Haskell is ? Well, it's mostly related to a larger thought that the language often targets to. Lisp started as an incarnation of the lambda calculus under the tutelage of John McCarthy and became the first significant language promoting the purely applicative model of programming without side-effects. Later on it added the impurities of mutation constructs based on the von Neumann architecture of the machines where Lisp was implemented. The obvious reason was to get an improved performance over purely functional constructs. Scala and Clojure both decided to go for the JVM as the primary runtime platform - hence both languages are susceptible to the pitfalls of impurity that JVM offers. Both of them decided to inherit all the impurities that Java has.

Consider the module system of Scala. You can compose modules using traits with deferred concrete definitions of types and objects. You can even compose mutually recursive modules using lazy vals, somewhat similar to what Newspeak and some dialects of ML offer. But because you have decided to bite the Java pill, you can also wreak havoc through shared mutable state at the top level object that you compose. In his post titled A Ban on Imports Gilad Bracha discusses all evil effects that an accessible global namespace can bring to the modularity aspects of your code. Newspeak is being designed as pure in this respect, with all dependencies being abstract and need to be plugged together explicitly as part of configuring the module. Scala is impure in this respect, allows imports to bring in the world on to your module definitions, but at the same time opens up all possibilities of sharing the huge ecosystem that the Java community has built over the years. You can rightfully choose to be pure in Scala, but that's not enforced by the language.

When we talk about impurity in languages, it's mostly related to how it handles side-effects and mutable state. And Haskell has a completely different take on this aspect than what we discussed with Lisp, Scala or Clojure. You have to use monads in Haskell towards any side-effecting operation. And people with a taste for finer things in life are absolutely fine with that. You cannot just stick in a printf to your program for debugging. You need to return the whole stuff within an IO monad and then do a print. The Haskell philosophy looks at a program as a model of mathematical functions where side-effects are also implemented in a functional way. This makes reasoning and optimization by the compiler much easier - you can make your pure Haskell code run as fast as C code. But you need to think differently. Pragmatic ? What do you think ?

Gilad Bracha is planning to implement pure subsets of Newspeak. It will be really exciting to get to see languages which are pure, functional (note: not purely functional) and object-oriented at the same time. He observes in his post that (t)he world is slowly digesting the idea that object-oriented and functional programming are not contradictory concepts. They are orthogonal, and can be arranged to be rather complementary. This is an interesting trend where we can see families of languages built around the same philosophy but differing in aspects of purity. You need to be pragmatic to choose and even mix them depending on your requirements.

4 comments:

gwern said...

> You cannot just stick in a printf to your program for debugging. You need to return the whole stuff within an IO monad and then do a print.

Actually, you can do that - Debug.Trace.trace :: String -> a -> a.

It uses the usual purity escape-hatch, System.IO.Unsafe.unsafePerformIO, and it works like it says on the tin: 'foo x = trace x (x++".txt")' etc.

> It's good to see Rich Hickey being flexible and pragmatic at the expense of injecting that little impurity into his creation.

Transients are fine, but Hickey is sacrificing purity for pragmatism and is receiving neither:
"Transients do not support the persistent interface of the source data structure. assoc, conj etc will all throw exceptions, because transients are not persistent. Thus you cannot accidentally leak a transient into a context requiring a persistent."

Yes, you can't leak them - he's traded runtime errors for exceptions.

On the other hand, Haskell can give you the ST monad*.

Inside of it you can operate & mutate as you please - with no worries about accidentally using a function which in this context only will throw exceptions - and then return a value to the outside, a value which appears pure & referentially transparent and which is in fact guaranteed to be so by the type system. Purity *and* pragmatism.

* http://www.haskell.org/ghc/docs/6.10.2/html/libraries/base/Control-Monad-ST.html

Alex Boisvert said...

Great post! Purity is not a panacea. Thank you for this insightful exploration of pragmatic programming language trade-offs.

Ingo said...

I think purity will pay off when it comes to (automatic) parallelization.

Having only read only values, for example, is great: one needs no access synchronization at all.

Unknown said...

@Ingo
For cases where you need parallelization, you can preach purity through idioms and best practices even if your language does support impurity. Despite the fact that u have transients in Clojure you can very well localize their usage (as Rich Hickey has done in Clojure lib) and prevent them from appearing in code that need to be parallelized.