Hi Ryan, In addition to the limitations mentioned above another one is only 1 column of each type that can participate in the union.
There are some old threads on these differences on the mailing list that should be searchable. Thanks, Micah On Tue, May 19, 2020 at 6:44 AM Antoine Pitrou <anto...@python.org> wrote: > > Also, you may want to run the integration tests and inspect the > generated JSON file for union data, it will probably be informative > (look for type ids). > > Regards > > Antoine. > > > Le 19/05/2020 à 15:38, Ryan Murray a écrit : > > Thanks for the clarification! Next time I will read the whole document > ;-) > > > > On Tue, May 19, 2020 at 2:38 PM Antoine Pitrou <anto...@python.org> > wrote: > > > >> > >> As explained in the comment below: > >> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L91 > >> > >> Regards > >> > >> Antoine. > >> > >> > >> Le 19/05/2020 à 14:14, Ryan Murray a écrit : > >>> Thanks Antoine, > >>> > >>> Can you just clarify what you mean by 'type ids are logical'? In my > mind > >>> type ids are strongly coupled to the types and their order in > Schema.fbs > >>> [1]. Do you mean that the order there is only a convention and we can't > >>> assume that 0 === Null? > >>> > >>> Best, > >>> Ryan > >>> > >>> [1] https://github.com/apache/arrow/blob/master/format/Schema.fbs#L235 > >>> > >>> On Tue, May 19, 2020 at 2:04 PM Antoine Pitrou <anto...@python.org> > >> wrote: > >>> > >>>> > >>>> Le 19/05/2020 à 13:43, Ryan Murray a écrit : > >>>>> Hey All, > >>>>> > >>>>> While working on https://issues.apache.org/jira/browse/ARROW-1692 I > >>>> noticed > >>>>> that there is a difference between C++ and Java on the way Sparse > >> Unions > >>>>> are handled. I haven't seen in the format spec which the correct is > so > >> I > >>>>> wanted to check with the wider community. > >>>>> > >>>>> c++ (and the integration tests) see sparse unions as: > >>>>> name > >>>>> count > >>>>> VALIDITY[] > >>>>> TYPE_ID[] > >>>>> children[] > >>>>> > >>>>> and java as: > >>>>> name > >>>>> count > >>>>> TYPE[] > >>>>> children[] > >>>>> > >>>>> The precise names may only be important for json reading/writing in > the > >>>>> integration tests so I will ignore TYPE/TYPE_ID for now. However, the > >> big > >>>>> difference is that Java doesn't have a validity buffer and c++ does. > My > >>>>> understanding is thta technically the validity buffer is redundant (0 > >>>> type > >>>>> == NULL) so I can see why Java would omit it. My question is then: > >> which > >>>>> language is 'correct'? > >>>> > >>>> Union type ids are logical, so 0 could very well be a valid type id. > >>>> You can't assume that type 0 means a null entry. > >>>> > >>>> Regards > >>>> > >>>> Antoine. > >>>> > >>> > >> > > >