If you are serializing data, then using the reader implies serializing to a 
string (by printing your data) and reading it from a string.  It should be 
obvious that marshalling data to/from strings is neither fast nor small 
compared to other binary alternatives.  Using the reader is nice for 
certain applications (tagged literals help) and using serializers is 
necessary for other applications.

Nippy looks pretty clean and I like that it doesn't have the baggage of 
carbonite's dependency on kryo. [I wrote carbonite.]

Looking at the code, I would guess that it suffers quite a bit on 
performance and serialization size based on my experiences with Carbonite. 
 Some key ideas for improvement:

1) buffer reuse - reusing cached ThreadLocal byte[] for converting from 
data to bytes is a nice trick and made possible in most serialization libs 
but I don't that's possible in the current code. 
2) transients while deserializing collections (in coll-thaw!) - I found 
this to be the only way to get anywhere close to Java serialization while 
benchmarking.
3) more efficient primitive writing - this is the main reason I wrapped 
Kryo - there are lots of well-known tricks for writing chars, ints, etc. 
 Thrift, Protobuff, Kryo, and others have made these pretty standard and 
they make a huge difference vs the basic DataInputStream serialization.  

I did a lot of benchmarking vs standard Java serialization (which is 
supported on all Clojure data types already).  Despite its well-known 
issues, Java serialization is so deeply hacked into the language (some 
people say Java's greatest design mistake) that it is actually really fast. 
 For small pieces of data, it is incredibly bloated, but for larger graphs, 
that actually amortizes out a bit if you have repeated object references.  

I found the use of transients in deserializing collections to be the only 
way I could get anywhere close to Java serialization performance - they are 
critical.  The resulting data is definitely smaller, especially on smaller 
amounts of data, and for our uses that's actually really important so it 
was still a win overall.  

The other major issue that drove the creation of carbonite was that Java 
serialization of LazySeqs could easily blow the stack by pushing every cons 
serialization onto the stack.  This is a trivial issue to solve in any 
Clojure-aware serializer.  For that reason alone, we needed a better 
solution.

If you want to steal ideas from Carbonite, please feel free. :) 
 https://github.com/revelytix/carbonite

Alex

-- 
You received this message because you are subscribed to the Google
Groups "Clojure" group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en

Reply via email to