Hey All, While working on https://issues.apache.org/jira/browse/ARROW-1692 I noticed that there is a difference between C++ and Java on the way Sparse Unions are handled. I haven't seen in the format spec which the correct is so I wanted to check with the wider community.
c++ (and the integration tests) see sparse unions as: name count VALIDITY[] TYPE_ID[] children[] and java as: name count TYPE[] children[] The precise names may only be important for json reading/writing in the integration tests so I will ignore TYPE/TYPE_ID for now. However, the big difference is that Java doesn't have a validity buffer and c++ does. My understanding is thta technically the validity buffer is redundant (0 type == NULL) so I can see why Java would omit it. My question is then: which language is 'correct'? I suppose the actual language implementation is not entirely relevant here, instead correct refers to what the canonical IPC schema for a sparse union should be. Best, Ryan