Out of curiosity, did you add a warmup time before benchmarking? Schema and
row coder does codegen, so the first usage is very slow, but subsequent
usages should be much faster. I recommend running any test for a warmup
period before starting to measure.

On Fri, Dec 1, 2023, 9:13 AM Steven van Rossum via dev <dev@beam.apache.org>
wrote:

> Hi all,
>
> I was benchmarking the fastjson2 serialization library a few weeks back
> for a Java pipeline I was working on and was asked by a colleague to
> benchmark binary JSON serialization against Rows for fun. We didn't do any
> extensive analysis across different shapes and sizes, but the finding on
> this workload was that serialization to binary JSON (tuple representation)
> outperformed the SchemaCoder on throughput by ~11x on serialization and ~5x
> on deserialization. Additionally, RowCoder outperformed SchemaCoder on
> throughput by ~1.3x on serialization and ~1.7x on deserialization. Note
> that all benchmarks measured in the millions of ops/sec for this quick
> test, so this is already excellent performance obviously.
>
> I'm sure there's stuff to learn from other serialization libraries, but
> I'd table that for now. The low hanging fruit improvement would be to skip
> that intermediate hop to/from Row and instead generate custom SchemaCoders
> to serialize directly into or deserialize from the Row format.
> I'd be happy to pick this up at some point in the new year, but would just
> like to get some thoughts from this group.
>
> Regards,
>
> Steve
>

Reply via email to