Re: [Format][Union] polymorphic vectors vs ADT style vectors

2024-04-02 Thread Felipe Oliveira Carvalho
Algebraic Data Types (Sums and Products) are very abstract. This means they don't fully specify a concrete/physical layout [1]: different physical layouts can match the same algebraic definition. As an in-memory data format specification, Arrow doesn't and shouldn't rigidly specify concretization r

Re: [Format][Union] polymorphic vectors vs ADT style vectors

2024-04-02 Thread James Henderson
Thanks @Antoine/@Weston - we've raised an issue [1] for the same in Arrow Java as suggested. Cheers, James [1]: https://github.com/apache/arrow/issues/40951 On Tue, 2 Apr 2024 at 14:29, Finn Völkel wrote: > @weston I think my mentioning of ADT was a mistake. I am just thinking of > sum types

Re: [Format][Union] polymorphic vectors vs ADT style vectors

2024-04-02 Thread Finn Völkel
@weston I think my mentioning of ADT was a mistake. I am just thinking of sum types (https://en.wikipedia.org/wiki/Tagged_union) which I should have just called differently. You are thinking of a product type which is better represented by a StructVector with nullable child vectors. @antoine Thank

Re: [Format][Union] polymorphic vectors vs ADT style vectors

2024-04-02 Thread Weston Pace
Wouldn't support for ADT require expressing more than 1 type id per record? In other words, if `put` has type id 1, `delete` has type id 2, and `erase` has type id 3 then there is no way to express something is (for example) both type id 1 and type id 3 because you can only have one type id per re

Re: [Format][Union] polymorphic vectors vs ADT style vectors

2024-04-02 Thread Finn Völkel
I also meant Algebraic Data Type not Abstract Data Type (too many acronymns). On Tue, 2 Apr 2024 at 13:28, Antoine Pitrou wrote: > > Thanks. The Arrow spec does support multiple union members with the same > type, but not all implementations do. The C++ implementation should > support it, though

Re: [Format][Union] polymorphic vectors vs ADT style vectors

2024-04-02 Thread Antoine Pitrou
Thanks. The Arrow spec does support multiple union members with the same type, but not all implementations do. The C++ implementation should support it, though to my surprise we do not seem to have any tests for it. If the Java implementation doesn't, then you can probably open an issue for

Re: [Format][Union] polymorphic vectors vs ADT style vectors

2024-04-02 Thread Finn Völkel
> Can you explain what ADT means ? Sorry about that. ADT stands for Abstract Data Type. What do I mean by an ADT style vector? Let's take an example from the project I am on. We have an `op` union vector with three child vectors `put`, `delete`, `erase`. `delete` and `erase` have the same type bu

Re: [Format][Union] polymorphic vectors vs ADT style vectors

2024-04-02 Thread Steve Kim
Thank you for asking this question. I have the same question. I noted a similar problem in the c++/python implementation: https://github.com/apache/arrow/issues/19157#issuecomment-1528037394 On Tue, Apr 2, 2024, 04:30 Finn Völkel wrote: > Hi, > > my question primarily concerns the union layout

Re: [Format][Union] polymorphic vectors vs ADT style vectors

2024-04-02 Thread Antoine Pitrou
Can you explain what ADT means ? Le 02/04/2024 à 11:31, Finn Völkel a écrit : Hi, my question primarily concerns the union layout described at https://arrow.apache.org/docs/format/Columnar.html#union-layout There are two ways to use unions: - polymorphic vectors (world 1) - ADT st