I also meant Algebraic Data Type not Abstract Data Type (too many
acronymns).

On Tue, 2 Apr 2024 at 13:28, Antoine Pitrou <anto...@python.org> wrote:

>
> Thanks. The Arrow spec does support multiple union members with the same
> type, but not all implementations do. The C++ implementation should
> support it, though to my surprise we do not seem to have any tests for it.
>
> If the Java implementation doesn't, then you can probably open an issue
> for it (and even submit a PR if you would like to tackle it).
>
> I've also opened https://github.com/apache/arrow/issues/40947 to create
> integration tests for this.
>
> Regards
>
> Antoine.
>
>
> Le 02/04/2024 à 13:19, Finn Völkel a écrit :
> >> Can you explain what ADT means ?
> >
> > Sorry about that. ADT stands for Abstract Data Type. What do I mean by an
> > ADT style vector?
> >
> > Let's take an example from the project I am on. We have an `op` union
> > vector with three child vectors `put`, `delete`, `erase`. `delete` and
> > `erase` have the same type but represent different things.
> >
> > On Tue, 2 Apr 2024 at 13:16, Steve Kim <chairm...@gmail.com> wrote:
> >
> >> Thank you for asking this question. I have the same question.
> >>
> >> I noted a similar problem in the c++/python implementation:
> >> https://github.com/apache/arrow/issues/19157#issuecomment-1528037394
> >>
> >> On Tue, Apr 2, 2024, 04:30 Finn Völkel <f...@juxt.pro> wrote:
> >>
> >>> Hi,
> >>>
> >>> my question primarily concerns the union layout described at
> >>> https://arrow.apache.org/docs/format/Columnar.html#union-layout
> >>>
> >>> There are two ways to use unions:
> >>>
> >>>     - polymorphic vectors (world 1)
> >>>     - ADT style vectors (world 2)
> >>>
> >>> In world 1 you have a vector that stores different types. In the ADT
> >> world
> >>> you could have multiple child vectors with the same type but different
> >> type
> >>> ids in the union type vector. The difference is apparent if you want to
> >> use
> >>> two BigIntVectors as children which doesn't exist in world 1. World 1
> is
> >> a
> >>> subset of world 2.
> >>>
> >>> The spec (to my understanding) doesn’t explicitly forbid world 2, but
> the
> >>> implementation we have been using (Java) has been making the assumption
> >> of
> >>> being in world 1 (a union only having ONE child of each type). We
> >> sometimes
> >>> use union in the ADT style which has led to problems down the road.
> >>>
> >>> Could someone clarify what the specification allows and what it doesn’t
> >>> allow? Could we tighten the specification after that clarification?
> >>>
> >>> Best, Finn
> >>>
> >>
> >
>

Reply via email to