> Out of curiosity, did you add a warmup time before benchmarking? Schema and > row coder does codegen, so the first usage is very slow, but subsequent > usages should be much faster. I recommend running any test for a warmup > period before starting to measure.
Yep, I poked at this using JMH with 3 warmup iterations and 5 iterations. The SchemaCoder is constructed at setup and encode is invoked once to ensure code generation for the underlying RowCoder has completed before the iteration begins. One minor thing I noticed when I was looking at this yesterday is that an optional field's Coder is wrapped by NullableCoder, but RowCoder skips encoding of null values entirely and marks their absence in that field presence BitSet which precedes all field data. It seems like wrapping optional fields with NullableCoder is a tad redundant, since either no bytes are written at all or the first byte is always set to 1 to mark the presence of data. The field encoding makes it impossible to relax a field from required to optional even though Row could support doing so with the field presence BitSet. Even though that's a backwards incompatible schema change I can see how some users may expect support for these changes if they interact with sources/sinks which support required to optional relaxation like BigQuery.