Algebraic Data Types (Sums and Products) are very abstract. This means
they don't fully specify a concrete/physical layout [1]: different
physical layouts can match the same algebraic definition. As an
in-memory data format specification, Arrow doesn't and shouldn't
rigidly specify concretization r
Thanks @Antoine/@Weston - we've raised an issue [1] for the same in Arrow
Java as suggested.
Cheers,
James
[1]: https://github.com/apache/arrow/issues/40951
On Tue, 2 Apr 2024 at 14:29, Finn Völkel wrote:
> @weston I think my mentioning of ADT was a mistake. I am just thinking of
> sum types
@weston I think my mentioning of ADT was a mistake. I am just thinking of
sum types (https://en.wikipedia.org/wiki/Tagged_union) which I should have
just called differently.
You are thinking of a product type which is better represented by a
StructVector with nullable child vectors.
@antoine Thank
Wouldn't support for ADT require expressing more than 1 type id per
record? In other words, if `put` has type id 1, `delete` has type id 2,
and `erase` has type id 3 then there is no way to express something is (for
example) both type id 1 and type id 3 because you can only have one type id
per re
I also meant Algebraic Data Type not Abstract Data Type (too many
acronymns).
On Tue, 2 Apr 2024 at 13:28, Antoine Pitrou wrote:
>
> Thanks. The Arrow spec does support multiple union members with the same
> type, but not all implementations do. The C++ implementation should
> support it, though
Thanks. The Arrow spec does support multiple union members with the same
type, but not all implementations do. The C++ implementation should
support it, though to my surprise we do not seem to have any tests for it.
If the Java implementation doesn't, then you can probably open an issue
for
> Can you explain what ADT means ?
Sorry about that. ADT stands for Abstract Data Type. What do I mean by an
ADT style vector?
Let's take an example from the project I am on. We have an `op` union
vector with three child vectors `put`, `delete`, `erase`. `delete` and
`erase` have the same type bu
Thank you for asking this question. I have the same question.
I noted a similar problem in the c++/python implementation:
https://github.com/apache/arrow/issues/19157#issuecomment-1528037394
On Tue, Apr 2, 2024, 04:30 Finn Völkel wrote:
> Hi,
>
> my question primarily concerns the union layout
Can you explain what ADT means ?
Le 02/04/2024 à 11:31, Finn Völkel a écrit :
Hi,
my question primarily concerns the union layout described at
https://arrow.apache.org/docs/format/Columnar.html#union-layout
There are two ways to use unions:
- polymorphic vectors (world 1)
- ADT st