Thanks @Antoine/@Weston - we've raised an issue [1] for the same in Arrow Java as suggested.
Cheers, James [1]: https://github.com/apache/arrow/issues/40951 On Tue, 2 Apr 2024 at 14:29, Finn Völkel <f...@juxt.pro> wrote: > @weston I think my mentioning of ADT was a mistake. I am just thinking of > sum types (https://en.wikipedia.org/wiki/Tagged_union) which I should have > just called differently. > You are thinking of a product type which is better represented by a > StructVector with nullable child vectors. > > @antoine Thanks for the clarification. > > On Tue, 2 Apr 2024 at 14:47, Weston Pace <weston.p...@gmail.com> wrote: > > > Wouldn't support for ADT require expressing more than 1 type id per > > record? In other words, if `put` has type id 1, `delete` has type id 2, > > and `erase` has type id 3 then there is no way to express something is > (for > > example) both type id 1 and type id 3 because you can only have one type > id > > per record. > > > > If that understanding is correct then it seems you can always encode > world > > 2 into world 1 by exhaustively listing out the combinations. In other > > words, `put` is the LSB, `delete` is bit 2, and `erase` is bit 3 and you > > have: > > > > 7 - put/delete/erase > > 6 - delete/erase > > 5 - erase/put > > 4 - erase > > 3 - put/delete > > 2 - delete > > 1 - put > > > > On Tue, Apr 2, 2024 at 4:36 AM Finn Völkel <f...@juxt.pro> wrote: > > > > > I also meant Algebraic Data Type not Abstract Data Type (too many > > > acronymns). > > > > > > On Tue, 2 Apr 2024 at 13:28, Antoine Pitrou <anto...@python.org> > wrote: > > > > > > > > > > > Thanks. The Arrow spec does support multiple union members with the > > same > > > > type, but not all implementations do. The C++ implementation should > > > > support it, though to my surprise we do not seem to have any tests > for > > > it. > > > > > > > > If the Java implementation doesn't, then you can probably open an > issue > > > > for it (and even submit a PR if you would like to tackle it). > > > > > > > > I've also opened https://github.com/apache/arrow/issues/40947 to > > create > > > > integration tests for this. > > > > > > > > Regards > > > > > > > > Antoine. > > > > > > > > > > > > Le 02/04/2024 à 13:19, Finn Völkel a écrit : > > > > >> Can you explain what ADT means ? > > > > > > > > > > Sorry about that. ADT stands for Abstract Data Type. What do I mean > > by > > > an > > > > > ADT style vector? > > > > > > > > > > Let's take an example from the project I am on. We have an `op` > union > > > > > vector with three child vectors `put`, `delete`, `erase`. `delete` > > and > > > > > `erase` have the same type but represent different things. > > > > > > > > > > On Tue, 2 Apr 2024 at 13:16, Steve Kim <chairm...@gmail.com> > wrote: > > > > > > > > > >> Thank you for asking this question. I have the same question. > > > > >> > > > > >> I noted a similar problem in the c++/python implementation: > > > > >> > > https://github.com/apache/arrow/issues/19157#issuecomment-1528037394 > > > > >> > > > > >> On Tue, Apr 2, 2024, 04:30 Finn Völkel <f...@juxt.pro> wrote: > > > > >> > > > > >>> Hi, > > > > >>> > > > > >>> my question primarily concerns the union layout described at > > > > >>> https://arrow.apache.org/docs/format/Columnar.html#union-layout > > > > >>> > > > > >>> There are two ways to use unions: > > > > >>> > > > > >>> - polymorphic vectors (world 1) > > > > >>> - ADT style vectors (world 2) > > > > >>> > > > > >>> In world 1 you have a vector that stores different types. In the > > ADT > > > > >> world > > > > >>> you could have multiple child vectors with the same type but > > > different > > > > >> type > > > > >>> ids in the union type vector. The difference is apparent if you > > want > > > to > > > > >> use > > > > >>> two BigIntVectors as children which doesn't exist in world 1. > > World 1 > > > > is > > > > >> a > > > > >>> subset of world 2. > > > > >>> > > > > >>> The spec (to my understanding) doesn’t explicitly forbid world 2, > > but > > > > the > > > > >>> implementation we have been using (Java) has been making the > > > assumption > > > > >> of > > > > >>> being in world 1 (a union only having ONE child of each type). We > > > > >> sometimes > > > > >>> use union in the ADT style which has led to problems down the > road. > > > > >>> > > > > >>> Could someone clarify what the specification allows and what it > > > doesn’t > > > > >>> allow? Could we tighten the specification after that > clarification? > > > > >>> > > > > >>> Best, Finn > > > > >>> > > > > >> > > > > > > > > > > > > > > > -- *James Henderson* XTDB Head of Engineering at *JUXT* Mobile +44 (0) 780 4321 777 <+447804321777> Email j...@juxt.pro Website https://juxt.pro [image: photo]