Monday, July 19, 2010

sjson: Now offers Type Class based JSON Serialization in Scala

sjson's serialization APIs have so long been based on reflection. The advantage was that the API was remarkably easy to use, while the heavy lifting was done underneath by the reflection based implementation.

However we need to remember that there's a big difference between the richness of type information that a JSON structure has and that which a Scala object can have. Unless you preserve the type information as part of your serialization protocol when going from Scala to JSON, it becomes very tricky and extremely difficult in some cases to do a lossless transformation. And with the JVM, type erasure makes it almost impossible to reconstruct some of the serialized JSON structures into the corresponding original Scala objects.

From ver 0.7, sjson offers JSON serialization protocol that does not use reflection in addition to the original one. This is useful in the sense that the user gets to define his own protocol for serializing custom objects to JSON. Whatever you did with annotations in Reflection based JSON Serialization, you can define custom protocol to implement them.

sjson's type class based serialization is inspired from the excellent sbinary by David MacIver (currently maintained by Mark Harrah) and uses the same protocol and even steals many of the implementation artifacts.

For an introduction to the basics of the concepts of type class, its implementation in Scala and how type class based serialization protocols can be designed in Scala, refer to the following blog posts which I wrote a few weeks back:


JSON Serialization of built-in types

Here’s a sample session at the REPL that uses the default serialization protocol of sjson ..

scala> import sjson.json._
import sjson.json._

scala> import DefaultProtocol._
import DefaultProtocol._

scala> val str = "debasish"
str: java.lang.String = debasish

scala> import JsonSerialization._
import JsonSerialization._

scala> tojson(str)
res0: dispatch.json.JsValue = "debasish"

scala> fromjson[String](res0)
res1: String = debasish


Now consider a generic data type List in Scala. Here’s how the protocol works ..

scala> val list = List(10, 12, 14, 18)
list: List[Int] = List(10, 12, 14, 18)

scala> tojson(list)
res2: dispatch.json.JsValue = [10, 12, 14, 18]

scala> fromjson[List[Int]](res2)
res3: List[Int] = List(10, 12, 14, 18)

Define your Class and Custom Protocol

In the last section we saw how default protocols based on type classes are being used for serialization of standard data types. If you have your own class, you can define your custom protocol for JSON serialization.

Consider a case class in Scala that defines a Person abstraction .. But before we look into how this serializes into JSON and back, here's the generic serialization protocol in sjson :-

trait Writes[T] {
  def writes(o: T): JsValue
}

trait Reads[T] {
  def reads(json: JsValue): T
}

trait Format[T] extends Writes[T] with Reads[T]

Format[] is the type class that specifies the contract for serialization. For your own abstraction you need to provide an implementation of the Format[] type class. Let’s do the same for Person within a specific Scala module. In case you don't remember the role that modules play in type class based design in Scala, they allow selection of the appropriate instance based on the static type checking that the language offers. This is something that you don't get in Haskell.

object Protocols {
  // person abstraction
  case class Person(lastName: String, firstName: String, age: Int)

  // protocol definition for person serialization
  object PersonProtocol extends DefaultProtocol {
    import dispatch.json._
    import JsonSerialization._

    implicit object PersonFormat extends Format[Person] {
      def reads(json: JsValue): Person = json match {
        case JsObject(m) =>
          Person(fromjson[String](m(JsString("lastName"))), 
            fromjson[String](m(JsString("firstName"))), fromjson[Int](m(JsString("age"))))
        case _ => throw new RuntimeException("JsObject expected")
      }

      def writes(p: Person): JsValue =
        JsObject(List(
          (tojson("lastName").asInstanceOf[JsString], tojson(p.lastName)), 
          (tojson("firstName").asInstanceOf[JsString], tojson(p.firstName)), 
          (tojson("age").asInstanceOf[JsString], tojson(p.age)) ))
    }
  }
}

Note that the implementation of the protocol uses the dispatch-json library from Nathan Hamblen. Basically the methods writes and reads define how the JSON serialization will be done for my Person object. Now we can fire up a scala REPL and see it in action :-

scala> import sjson.json._
import sjson.json._

scala> import Protocols._
import Protocols._

scala> import PersonProtocol._
import PersonProtocol._

scala> val p = Person("ghosh", "debasish", 20)
p: sjson.json.Protocols.Person = Person(ghosh,debasish,20)

scala> import JsonSerialization._
import JsonSerialization._

scala> tojson[Person](p)         
res1: dispatch.json.JsValue = {"lastName" : "ghosh", "firstName" : "debasish", "age" : 20}

scala> fromjson[Person](res1)
res2: sjson.json.Protocols.Person = Person(ghosh,debasish,20)

We get serialization of the object into JSON structure and then back to the object itself. The methods tojson and fromjson are part of the Scala module that uses the type class Format as implicits. Here’s how we define it ..

object JsonSerialization {
  def tojson[T](o: T)(implicit tjs: Writes[T]): JsValue = {
    tjs.writes(o)
  }

  def fromjson[T](json: JsValue)(implicit fjs: Reads[T]): T = {
    fjs.reads(json)
  }
}

Verbose ?

Sure .. you have to do a lot of stuff to define the protocol for your class. If you have a case class, the sjson has some out of the box magic for you where you can do away with all the verbosity. Once again the Scala’s type system to the rescue.

Let’s see how the protocol can be extended for your custom classes using a much less verbose API which applies only for case classes. Here’s a session at the REPL ..

scala> case class Shop(store: String, item: String, price: Int)
defined class Shop

scala> object ShopProtocol extends DefaultProtocol {
     |   implicit val ShopFormat: Format[Shop] = 
     |       asProduct3("store", "item", "price")(Shop)(Shop.unapply(_).get)
     |   }
defined module ShopProtocol

scala> import ShopProtocol._
import ShopProtocol._

scala> val shop = Shop("Shoppers Stop", "dress material", 1000)
shop: Shop = Shop(Shoppers Stop,dress material,1000)

scala> import JsonSerialization._
import JsonSerialization._

scala> tojson(shop)
res4: dispatch.json.JsValue = {"store" : "Shoppers Stop", "item" : "dress material", "price" : 1000}

scala> fromjson[Shop](res4)
res5: Shop = Shop(Shoppers Stop,dress material,1000)

If you are curious about what goes on behind the asProduct3 method, feel free to peek into the source code.

Tuesday, July 06, 2010

Refactoring into Scala Type Classes

A couple of weeks back I wrote about type class implementation in Scala using implicits. Type classes allow you to model orthogonal concerns of an abstraction without hardwiring it within the abstraction itself. This takes the bloat away from the core abstraction implementation into separate independent class structures. Very recently I refactored Akka actor serialization and gained some real insights into the benefits of using type classes. This post is a field report of the same.

Inheritance and traits looked good ..

.. but only initially. Myself and Jonas Boner had some cool discussions on serializable actors where the design we came up with looked as follows ..

trait SerializableActor extends Actor 
trait StatelessSerializableActor extends SerializableActor

trait StatefulSerializerSerializableActor extends SerializableActor {
  val serializer: Serializer
  //..
}

trait StatefulWrappedSerializableActor extends SerializableActor {
  def toBinary: Array[Byte]
  def fromBinary(bytes: Array[Byte])
}

// .. and so on 

All these traits make the concerns of serializability just too coupled with the core Actor implementation. And with various forms of serializable actors, clearly we were running out of class names. One of the wisdoms that the GoF Patterns book taught us was that when you struggle naming your classes using inheritance, you're definitely doing it wrong! Look out for other ways that separate the concerns more meaningfully.

With Type Classes ..

We took the serialization stuff out of the core Actor abstraction into a separate type class.

/**
 * Type class definition for Actor Serialization
 */
trait FromBinary[<: Actor] {
  def fromBinary(bytes: Array[Byte], act: T): T
}

trait ToBinary[<: Actor] {
  def toBinary(t: T): Array[Byte]
}

// client needs to implement Format[] for the respective actor
trait Format[<: Actor] extends FromBinary[T] with ToBinary[T]

We define 2 type classes FromBinary[T <: Actor] and ToBinary[T <: Actor] that the client needs to implement in order to make actors serializable. And we package them together as yet another trait Format[T <: Actor] that combines both of them.

Next we define a separate module that publishes APIs to serialize actors that use these type class implementations ..

/**
 * Module for actor serialization
 */
object ActorSerialization {

  def fromBinary[<: Actor](bytes: Array[Byte])
    (implicit format: Format[T]): ActorRef = //..

  def toBinary[<: Actor](a: ActorRef)
    (implicit format: Format[T]): Array[Byte] = //..

  //.. implementation
}

Note that these type classes are passed as implicit arguments that the Scala compiler will pick up from the surrounding lexical scope. Here's a sample test case which implements the above strategy ..

A sample actor with encapsulated state. Note that we no longer have any incidental complexity of my actor having to inherit from any specialized Actor class ..

class MyActor extends Actor {
  var count = 0

  def receive = {
    case "hello" =>
      count = count + 1
      self.reply("world " + count)
  }
}

and the client implements the type class for protocol buffer based serialization and package it as a Scala module ..

object BinaryFormatMyActor {
  implicit object MyActorFormat extends Format[MyActor] {
    def fromBinary(bytes: Array[Byte], act: MyActor) = {
      val p = Serializer.Protobuf
                        .fromBinary(bytes, Some(classOf[ProtobufProtocol.Counter]))
                        .asInstanceOf[ProtobufProtocol.Counter]
      act.count = p.getCount
      act
    }
    def toBinary(ac: MyActor) =
      ProtobufProtocol.Counter.newBuilder.setCount(ac.count).build.toByteArray
  }
}

We have a test snippet that uses the above type class implementation ..

import ActorSerialization._
import BinaryFormatMyActor._

val actor1 = actorOf[MyActor].start
(actor1 !! "hello").getOrElse("_") should equal("world 1")
(actor1 !! "hello").getOrElse("_") should equal("world 2")

val bytes = toBinary(actor1)
val actor2 = fromBinary(bytes)
actor2.start
(actor2 !! "hello").getOrElse("_") should equal("world 3")

Note that the state is correctly serialized by toBinary and then subsequently de-serialized to get the updated value of the Actor state.

This refactoring has made the core actor implementation much cleaner moving away the concerns of serialization to a separate abstraction. The client code also becomes cleaner in the sense that the client actor definition does not include details of how the actor state is being serialized. Scala's power of implicit arguments and executable modules made this type class based implementation possible.