Thanks for the clarification! Next time I will read the whole document ;-)

On Tue, May 19, 2020 at 2:38 PM Antoine Pitrou <anto...@python.org> wrote:

>
> As explained in the comment below:
> https://github.com/apache/arrow/blob/master/format/Schema.fbs#L91
>
> Regards
>
> Antoine.
>
>
> Le 19/05/2020 à 14:14, Ryan Murray a écrit :
> > Thanks Antoine,
> >
> > Can you just clarify what you mean by 'type ids are logical'? In my mind
> > type ids are strongly coupled to the types and their order in Schema.fbs
> > [1]. Do you mean that the order there is only a convention and we can't
> > assume that 0 === Null?
> >
> > Best,
> > Ryan
> >
> > [1] https://github.com/apache/arrow/blob/master/format/Schema.fbs#L235
> >
> > On Tue, May 19, 2020 at 2:04 PM Antoine Pitrou <anto...@python.org>
> wrote:
> >
> >>
> >> Le 19/05/2020 à 13:43, Ryan Murray a écrit :
> >>> Hey All,
> >>>
> >>> While working on https://issues.apache.org/jira/browse/ARROW-1692 I
> >> noticed
> >>> that there is a difference between C++ and Java on the way Sparse
> Unions
> >>> are handled. I haven't seen in the format spec which the correct is so
> I
> >>> wanted to check with the wider community.
> >>>
> >>> c++ (and the integration tests) see sparse unions as:
> >>> name
> >>> count
> >>> VALIDITY[]
> >>> TYPE_ID[]
> >>> children[]
> >>>
> >>> and java as:
> >>> name
> >>> count
> >>> TYPE[]
> >>> children[]
> >>>
> >>> The precise names may only be important for json reading/writing in the
> >>> integration tests so I will ignore TYPE/TYPE_ID for now. However, the
> big
> >>> difference is that Java doesn't have a validity buffer and c++ does. My
> >>> understanding is thta technically the validity buffer is redundant (0
> >> type
> >>> == NULL) so I can see why Java would omit it. My question is then:
> which
> >>> language is 'correct'?
> >>
> >> Union type ids are logical, so 0 could very well be a valid type id.
> >> You can't assume that type 0 means a null entry.
> >>
> >> Regards
> >>
> >> Antoine.
> >>
> >
>

Reply via email to