Hi all,

I was benchmarking the fastjson2 serialization library a few weeks back for
a Java pipeline I was working on and was asked by a colleague to benchmark
binary JSON serialization against Rows for fun. We didn't do any extensive
analysis across different shapes and sizes, but the finding on this
workload was that serialization to binary JSON (tuple representation)
outperformed the SchemaCoder on throughput by ~11x on serialization and ~5x
on deserialization. Additionally, RowCoder outperformed SchemaCoder on
throughput by ~1.3x on serialization and ~1.7x on deserialization. Note
that all benchmarks measured in the millions of ops/sec for this quick
test, so this is already excellent performance obviously.

I'm sure there's stuff to learn from other serialization libraries, but I'd
table that for now. The low hanging fruit improvement would be to skip that
intermediate hop to/from Row and instead generate custom SchemaCoders to
serialize directly into or deserialize from the Row format.
I'd be happy to pick this up at some point in the new year, but would just
like to get some thoughts from this group.

Regards,

Steve

Reply via email to