Re: [Discuss][Format][Java] Finalizing Union Types

2019-06-18 Thread Wes McKinney
On Tue, Jun 18, 2019 at 6:38 AM Ben Kietzman wrote: > > I don't understand the utility of this indirection and to me it seems more > natural to remove Union.typeIds and state that the type buffer will always > contain the index of the corresponding child array. > > One use case for Union.typeIds i

Re: [Discuss][Format][Java] Finalizing Union Types

2019-06-18 Thread Ben Kietzman
I don't understand the utility of this indirection and to me it seems more natural to remove Union.typeIds and state that the type buffer will always contain the index of the corresponding child array. One use case for Union.typeIds is simplification of dropping types from the union. However perfo

Re: [Discuss][Format][Java] Finalizing Union Types

2019-06-17 Thread Wes McKinney
example sparse union: types: (int64, utf8) type_ids: [0, 4] type buffer: [0, 0, 0, 4, 4, 4] child 0: [1, 2, 3, --, --, --] child 1: [--, --, --, 'foo', 'bar', 'baz'] example dense union: types: (int64, utf8) type_ids: [0, 4] type buffer: [0, 0, 0, 4, 4, 4] offsets buffer: [0, 1, 2, 0, 1, 2]

Re: [Discuss][Format][Java] Finalizing Union Types

2019-06-17 Thread Antoine Pitrou
Le 17/06/2019 à 22:46, Wes McKinney a écrit : > https://github.com/apache/arrow/blob/master/format/Schema.fbs#L88 > > "optionally typeIds provides an indirection between the child offset > and the type id for each child typeIds[offset] is the id used in the > type vector" Does this mean typeIds

Re: [Discuss][Format][Java] Finalizing Union Types

2019-06-17 Thread Wes McKinney
https://github.com/apache/arrow/blob/master/format/Schema.fbs#L88 "optionally typeIds provides an indirection between the child offset and the type id for each child typeIds[offset] is the id used in the type vector" On Mon, Jun 17, 2019 at 12:26 PM Ben Kietzman wrote: > > Somewhat related: > >

Re: [Discuss][Format][Java] Finalizing Union Types

2019-06-17 Thread Ben Kietzman
Somewhat related: Could we clarify the expected content of the type_ids buffer of union arrays? Layout.rst seems to indicate these should be indices of the corresponding child array, but the C++ implement

Re: [Discuss][Format][Java] Finalizing Union Types

2019-06-17 Thread Micah Kornfield
Sounds good. Sorry I got distracted with some other stuff but should be getting back to this soonish On Monday, June 17, 2019, Wes McKinney wrote: > I'd already moved the Union issues to 1.0.0 so we are all good there > > On Mon, Jun 17, 2019 at 10:18 AM Wes McKinney wrote: > > > > I'm also +1

Re: [Discuss][Format][Java] Finalizing Union Types

2019-06-17 Thread Wes McKinney
I'd already moved the Union issues to 1.0.0 so we are all good there On Mon, Jun 17, 2019 at 10:18 AM Wes McKinney wrote: > > I'm also +1 for generalized unions as we currently have specified. The > objections from the Java users seems to be mostly on the basis of > performance in the union-of-pr

Re: [Discuss][Format][Java] Finalizing Union Types

2019-06-17 Thread Wes McKinney
I'm also +1 for generalized unions as we currently have specified. The objections from the Java users seems to be mostly on the basis of performance in the union-of-primitives case -- that's an implementation specific issue, so if Java needs to have a "GeneralizedDenseUnionVector" or something to h

Re: [Discuss][Format][Java] Finalizing Union Types

2019-06-09 Thread Ravindra Pindikura
On Sat, May 25, 2019 at 12:29 PM Micah Kornfield wrote: > Thanks for the responses, I've clipped the questions and provided responses > inline. > > is the proposal that both cpp & java will support only option 2 ? > > I guess 1 is a subset of 2 anyway. > > CPP already supports option 2. I would

Re: [Discuss][Format][Java] Finalizing Union Types

2019-05-24 Thread Micah Kornfield
Thanks for the responses, I've clipped the questions and provided responses inline. is the proposal that both cpp & java will support only option 2 ? > I guess 1 is a subset of 2 anyway. CPP already supports option 2. I would like to make CPP and java compatible, in a way that this acceptable fo

Re: [Discuss][Format][Java] Finalizing Union Types

2019-05-24 Thread Antoine Pitrou
I don't understand the limitation to different types, so +1 for generalized unions. That said, I don't think it's high-priority either. Regards Antoine. Le 24/05/2019 à 04:17, Micah Kornfield a écrit : > I'd like to bump this thread, to see if anyone has any comments. If nobody > objects I

Re: [Discuss][Format][Java] Finalizing Union Types

2019-05-24 Thread Ravindra Pindikura
Micah, Couple of questions inline : On Tue, May 21, 2019 at 10:21 AM Micah Kornfield wrote: > In the past [1] there hasn't been agreement on the final requirements for > union types. > > Briefly the two approaches that are currently advocated: > 1. Limit unions to only contain one field of e

Re: [Discuss][Format][Java] Finalizing Union Types

2019-05-23 Thread Micah Kornfield
I'd like to bump this thread, to see if anyone has any comments. If nobody objects I will try to start implementing the changes next week. Thanks, Micah On Mon, May 20, 2019 at 9:37 PM Micah Kornfield wrote: > In the past [1] there hasn't been agreement on the final requirements for > union ty

[Discuss][Format][Java] Finalizing Union Types

2019-05-20 Thread Micah Kornfield
In the past [1] there hasn't been agreement on the final requirements for union types. Briefly the two approaches that are currently advocated: 1. Limit unions to only contain one field of each individual type (e.g. you can't have two separate int32 fields). Java takes this approach. 2. General