Thanks. The Arrow spec does support multiple union members with the same
type, but not all implementations do. The C++ implementation should
support it, though to my surprise we do not seem to have any tests for it.
If the Java implementation doesn't, then you can probably open an issue
for it (and even submit a PR if you would like to tackle it).
I've also opened https://github.com/apache/arrow/issues/40947 to create
integration tests for this.
Regards
Antoine.
Le 02/04/2024 à 13:19, Finn Völkel a écrit :
Can you explain what ADT means ?
Sorry about that. ADT stands for Abstract Data Type. What do I mean by an
ADT style vector?
Let's take an example from the project I am on. We have an `op` union
vector with three child vectors `put`, `delete`, `erase`. `delete` and
`erase` have the same type but represent different things.
On Tue, 2 Apr 2024 at 13:16, Steve Kim <chairm...@gmail.com> wrote:
Thank you for asking this question. I have the same question.
I noted a similar problem in the c++/python implementation:
https://github.com/apache/arrow/issues/19157#issuecomment-1528037394
On Tue, Apr 2, 2024, 04:30 Finn Völkel <f...@juxt.pro> wrote:
Hi,
my question primarily concerns the union layout described at
https://arrow.apache.org/docs/format/Columnar.html#union-layout
There are two ways to use unions:
- polymorphic vectors (world 1)
- ADT style vectors (world 2)
In world 1 you have a vector that stores different types. In the ADT
world
you could have multiple child vectors with the same type but different
type
ids in the union type vector. The difference is apparent if you want to
use
two BigIntVectors as children which doesn't exist in world 1. World 1 is
a
subset of world 2.
The spec (to my understanding) doesn’t explicitly forbid world 2, but the
implementation we have been using (Java) has been making the assumption
of
being in world 1 (a union only having ONE child of each type). We
sometimes
use union in the ADT style which has led to problems down the road.
Could someone clarify what the specification allows and what it doesn’t
allow? Could we tighten the specification after that clarification?
Best, Finn