I also meant Algebraic Data Type not Abstract Data Type (too many acronymns).
On Tue, 2 Apr 2024 at 13:28, Antoine Pitrou <anto...@python.org> wrote: > > Thanks. The Arrow spec does support multiple union members with the same > type, but not all implementations do. The C++ implementation should > support it, though to my surprise we do not seem to have any tests for it. > > If the Java implementation doesn't, then you can probably open an issue > for it (and even submit a PR if you would like to tackle it). > > I've also opened https://github.com/apache/arrow/issues/40947 to create > integration tests for this. > > Regards > > Antoine. > > > Le 02/04/2024 à 13:19, Finn Völkel a écrit : > >> Can you explain what ADT means ? > > > > Sorry about that. ADT stands for Abstract Data Type. What do I mean by an > > ADT style vector? > > > > Let's take an example from the project I am on. We have an `op` union > > vector with three child vectors `put`, `delete`, `erase`. `delete` and > > `erase` have the same type but represent different things. > > > > On Tue, 2 Apr 2024 at 13:16, Steve Kim <chairm...@gmail.com> wrote: > > > >> Thank you for asking this question. I have the same question. > >> > >> I noted a similar problem in the c++/python implementation: > >> https://github.com/apache/arrow/issues/19157#issuecomment-1528037394 > >> > >> On Tue, Apr 2, 2024, 04:30 Finn Völkel <f...@juxt.pro> wrote: > >> > >>> Hi, > >>> > >>> my question primarily concerns the union layout described at > >>> https://arrow.apache.org/docs/format/Columnar.html#union-layout > >>> > >>> There are two ways to use unions: > >>> > >>> - polymorphic vectors (world 1) > >>> - ADT style vectors (world 2) > >>> > >>> In world 1 you have a vector that stores different types. In the ADT > >> world > >>> you could have multiple child vectors with the same type but different > >> type > >>> ids in the union type vector. The difference is apparent if you want to > >> use > >>> two BigIntVectors as children which doesn't exist in world 1. World 1 > is > >> a > >>> subset of world 2. > >>> > >>> The spec (to my understanding) doesn’t explicitly forbid world 2, but > the > >>> implementation we have been using (Java) has been making the assumption > >> of > >>> being in world 1 (a union only having ONE child of each type). We > >> sometimes > >>> use union in the ADT style which has led to problems down the road. > >>> > >>> Could someone clarify what the specification allows and what it doesn’t > >>> allow? Could we tighten the specification after that clarification? > >>> > >>> Best, Finn > >>> > >> > > >