Re: [DISCUSS] Removing top-level validity bitmap from Union type

2020-06-28 Thread Wes McKinney
I'm going to start a vote about this tomorrow if there are not more comments about this. On Thu, Jun 25, 2020 at 5:41 PM Wes McKinney wrote: > > I updated the PR to fix some issues with my edits that Antoine pointed > out. I can start working on a C++ patch to implement the C++ changes > in the n

Re: [DISCUSS] Removing top-level validity bitmap from Union type

2020-06-25 Thread Wes McKinney
I updated the PR to fix some issues with my edits that Antoine pointed out. I can start working on a C++ patch to implement the C++ changes in the next few days if that helps. Given the time urgency of deciding what to do on this if anyone else could express opinions it would be helpful. I see one

Re: [DISCUSS] Removing top-level validity bitmap from Union type

2020-06-24 Thread Wes McKinney
I drafted the specification changes that would be associated with the union changes https://github.com/apache/arrow/pull/7535 I'll start a separate discussion about incrementing the MetadataVersion since that must be discussed independently. Please take a look On Wed, Jun 24, 2020 at 3:50 PM We

Re: [DISCUSS] Removing top-level validity bitmap from Union type

2020-06-24 Thread Wes McKinney
I should also add that we could (with some effort) use the MetadataVersion V4/V5 indicator to offer backward compatibility for old serialized union data In any case, if there is consensus about this, we would need to have a vote and get busy with implementing and testing the changes. I could assis

Re: [DISCUSS] Removing top-level validity bitmap from Union type

2020-06-24 Thread Wes McKinney
On Wed, Jun 24, 2020 at 1:07 PM Francois Saint-Jacques wrote: > > OTOH, > > how do we handle NullType -> UnionType cast conversion? Do we > require some convention like the first children ArrayData null bitmap > to be set and all tags set to 0? Sure, that sounds like a reasonable implementation s

Re: [DISCUSS] Removing top-level validity bitmap from Union type

2020-06-24 Thread Francois Saint-Jacques
OTOH, how do we handle NullType -> UnionType cast conversion? Do we require some convention like the first children ArrayData null bitmap to be set and all tags set to 0? François On Wed, Jun 24, 2020 at 1:09 PM Antoine Pitrou wrote: > > > Le 24/06/2020 à 18:34, Wes McKinney a écrit : > > On We

Re: [DISCUSS] Removing top-level validity bitmap from Union type

2020-06-24 Thread Antoine Pitrou
Le 24/06/2020 à 18:34, Wes McKinney a écrit : > On Wed, Jun 24, 2020 at 11:08 AM Antoine Pitrou wrote: >> >> >> Le 24/06/2020 à 16:57, Wes McKinney a écrit : >>> hi folks, >>> >>> As discussed on the recent GitHub PR [1], as a means of reconciling >>> the long-standing cross-implementation incom

Re: [DISCUSS] Removing top-level validity bitmap from Union type

2020-06-24 Thread Wes McKinney
On Wed, Jun 24, 2020 at 11:08 AM Antoine Pitrou wrote: > > > Le 24/06/2020 à 16:57, Wes McKinney a écrit : > > hi folks, > > > > As discussed on the recent GitHub PR [1], as a means of reconciling > > the long-standing cross-implementation incompatibilities with Union > > types, it's been proposed

Re: [DISCUSS] Removing top-level validity bitmap from Union type

2020-06-24 Thread Antoine Pitrou
Le 24/06/2020 à 16:57, Wes McKinney a écrit : > hi folks, > > As discussed on the recent GitHub PR [1], as a means of reconciling > the long-standing cross-implementation incompatibilities with Union > types, it's been proposed to remove the top-level validity bitmap from > the Union data layout

Re: [DISCUSS] Removing top-level validity bitmap from Union type

2020-06-24 Thread Jacques Nadeau
Per my comments on the pr, I also think this is preferred. I believe we will avoid the potential for validity inconsistency and simplify construction of union data in most cases. On Wed, Jun 24, 2020, 7:58 AM Wes McKinney wrote: > hi folks, > > As discussed on the recent GitHub PR [1], as a mean

[DISCUSS] Removing top-level validity bitmap from Union type

2020-06-24 Thread Wes McKinney
hi folks, As discussed on the recent GitHub PR [1], as a means of reconciling the long-standing cross-implementation incompatibilities with Union types, it's been proposed to remove the top-level validity bitmap from the Union data layout and let validity be determined exclusively by the child arr