> Meaning BSON I presume? What do you mean by "tuple representation"?
> (One downside of JSON is that the field names are redundantly stored
> in each record, so even if you save on CPU it may hurt on the network
> due to the greater data sizes).

Yes, I meant BSON. Tuple or array representation formats the
serialized representation as an array of all field values so the field
names are not stored in the serialized result.

> Sounds like there's a lot of room for improvement! One downside of
> Rows is that they can't (IIRC) store (and encode/decode) unboxed
> representations of their primitive field types. This alone would be
> good to solve, but as mentioned you could probably also skip a Row
> intermediate altogether for encoding/decoding.

If Row were an interface then you could generate a POJO at runtime
from a Schema and have it implement that interface, but I'm not sure
if that improves anything when it comes to serialization since you'd
still use some function with a field index parameter to retrieve
values from the Row instance, but it could be that the deserialized
instance takes up less space in memory.

Mapping out the RowCoderGenerator result into a specialized Coder for
the POJO I was benchmarking resulted in an improvement of
serialization throughput of ~2.2x and an improvement of
deserialization throughput of ~1.6x.

Reply via email to