Le 19/05/2020 à 13:43, Ryan Murray a écrit : > Hey All, > > While working on https://issues.apache.org/jira/browse/ARROW-1692 I noticed > that there is a difference between C++ and Java on the way Sparse Unions > are handled. I haven't seen in the format spec which the correct is so I > wanted to check with the wider community. > > c++ (and the integration tests) see sparse unions as: > name > count > VALIDITY[] > TYPE_ID[] > children[] > > and java as: > name > count > TYPE[] > children[] > > The precise names may only be important for json reading/writing in the > integration tests so I will ignore TYPE/TYPE_ID for now. However, the big > difference is that Java doesn't have a validity buffer and c++ does. My > understanding is thta technically the validity buffer is redundant (0 type > == NULL) so I can see why Java would omit it. My question is then: which > language is 'correct'?
Union type ids are logical, so 0 could very well be a valid type id. You can't assume that type 0 means a null entry. Regards Antoine.