Hi all, I was benchmarking the fastjson2 serialization library a few weeks back for a Java pipeline I was working on and was asked by a colleague to benchmark binary JSON serialization against Rows for fun. We didn't do any extensive analysis across different shapes and sizes, but the finding on this workload was that serialization to binary JSON (tuple representation) outperformed the SchemaCoder on throughput by ~11x on serialization and ~5x on deserialization. Additionally, RowCoder outperformed SchemaCoder on throughput by ~1.3x on serialization and ~1.7x on deserialization. Note that all benchmarks measured in the millions of ops/sec for this quick test, so this is already excellent performance obviously.
I'm sure there's stuff to learn from other serialization libraries, but I'd table that for now. The low hanging fruit improvement would be to skip that intermediate hop to/from Row and instead generate custom SchemaCoders to serialize directly into or deserialize from the Row format. I'd be happy to pick this up at some point in the new year, but would just like to get some thoughts from this group. Regards, Steve