> Out of curiosity, did you add a warmup time before benchmarking? Schema and 
> row coder does codegen, so the first usage is very slow, but subsequent 
> usages should be much faster. I recommend running any test for a warmup 
> period before starting to measure.

Yep, I poked at this using JMH with 3 warmup iterations and 5
iterations. The SchemaCoder is constructed at setup and encode is
invoked once to ensure code generation for the underlying RowCoder has
completed before the iteration begins.

One minor thing I noticed when I was looking at this yesterday is that
an optional field's Coder is wrapped by NullableCoder, but RowCoder
skips encoding of null values entirely and marks their absence in that
field presence BitSet which precedes all field data. It seems like
wrapping optional fields with NullableCoder is a tad redundant, since
either no bytes are written at all or the first byte is always set to
1 to mark the presence of data. The field encoding makes it impossible
to relax a field from required to optional even though Row could
support doing so with the field presence BitSet. Even though that's a
backwards incompatible schema change I can see how some users may
expect support for these changes if they interact with sources/sinks
which support required to optional relaxation like BigQuery.

Reply via email to