Re: [VOTE] Removing validity bitmap from Arrow union types

2020-06-29 Thread Ben Kietzman
+1 (non binding) On Tue, Jun 30, 2020, 00:24 Wes McKinney wrote: > +1 (binding) > > On Mon, Jun 29, 2020 at 11:09 PM Micah Kornfield > wrote: > > > > +1 (binding) (I had a couple of nits on language, that I put in the PR > > > > On Mon, Jun 29, 2020 at 2:24 PM Wes McKinney > wrote: > > > > > H

Re: [VOTE] Increment MetadataVersion in Schema.fbs from V4 to V5 for 1.0.0 release

2020-06-29 Thread Ben Kietzman
+1 (non binding) On Tue, Jun 30, 2020, 00:25 Wes McKinney wrote: > +1 (binding) > > On Mon, Jun 29, 2020 at 10:49 PM Micah Kornfield > wrote: > > > > +1 (binding) > > > > On Mon, Jun 29, 2020 at 2:43 PM Wes McKinney > wrote: > > > > > Hi, > > > > > > As discussed on the mailing list [1], in or

Re: [VOTE] Increment MetadataVersion in Schema.fbs from V4 to V5 for 1.0.0 release

2020-06-29 Thread Wes McKinney
+1 (binding) On Mon, Jun 29, 2020 at 10:49 PM Micah Kornfield wrote: > > +1 (binding) > > On Mon, Jun 29, 2020 at 2:43 PM Wes McKinney wrote: > > > Hi, > > > > As discussed on the mailing list [1], in order to demarcate the > > pre-1.0.0 and post-1.0.0 worlds, and to allow the > > forward-compat

Re: [VOTE] Permitting unsigned integers for Arrow dictionary indices

2020-06-29 Thread Wes McKinney
+1 (binding) On Mon, Jun 29, 2020 at 11:11 PM Ben Kietzman wrote: > > +1 (non binding) > > On Mon, Jun 29, 2020, 18:00 Wes McKinney wrote: > > > Hi, > > > > As discussed on the mailing list [1], it has been proposed to allow > > the use of unsigned dictionary indices (which is already technicall

Re: [VOTE] Removing validity bitmap from Arrow union types

2020-06-29 Thread Wes McKinney
+1 (binding) On Mon, Jun 29, 2020 at 11:09 PM Micah Kornfield wrote: > > +1 (binding) (I had a couple of nits on language, that I put in the PR > > On Mon, Jun 29, 2020 at 2:24 PM Wes McKinney wrote: > > > Hi, > > > > As discussed on the mailing list [1], it has been proposed to remove > > the v

Re: [VOTE] Permitting unsigned integers for Arrow dictionary indices

2020-06-29 Thread Ben Kietzman
+1 (non binding) On Mon, Jun 29, 2020, 18:00 Wes McKinney wrote: > Hi, > > As discussed on the mailing list [1], it has been proposed to allow > the use of unsigned dictionary indices (which is already technically > possible in our metadata serialization, but not allowed according to > the langu

Re: [VOTE] Removing validity bitmap from Arrow union types

2020-06-29 Thread Micah Kornfield
+1 (binding) (I had a couple of nits on language, that I put in the PR On Mon, Jun 29, 2020 at 2:24 PM Wes McKinney wrote: > Hi, > > As discussed on the mailing list [1], it has been proposed to remove > the validity bitmap buffer from Union types in the columnar format > specification and inste

Re: [VOTE] Increment MetadataVersion in Schema.fbs from V4 to V5 for 1.0.0 release

2020-06-29 Thread Micah Kornfield
+1 (binding) On Mon, Jun 29, 2020 at 2:43 PM Wes McKinney wrote: > Hi, > > As discussed on the mailing list [1], in order to demarcate the > pre-1.0.0 and post-1.0.0 worlds, and to allow the > forward-compatibility-protection changes we are making to actually > work (i.e. so that libraries can r

Re: [VOTE] Permitting unsigned integers for Arrow dictionary indices

2020-06-29 Thread Micah Kornfield
+1 (binding) On Mon, Jun 29, 2020 at 3:00 PM Wes McKinney wrote: > Hi, > > As discussed on the mailing list [1], it has been proposed to allow > the use of unsigned dictionary indices (which is already technically > possible in our metadata serialization, but not allowed according to > the langu

Re: Error while selecting columns from hierarchical parquet file

2020-06-29 Thread Micah Kornfield
It sounds like it should be possible, could you perhaps open a JIRA with a more complete example including schema? Thanks, Micah On Thu, Jun 25, 2020 at 4:10 PM Rafael Ladeira wrote: > Hi, > > Is it possible to read just selected columns from a dataframe with > hierarchical levels in the column

[VOTE] Permitting unsigned integers for Arrow dictionary indices

2020-06-29 Thread Wes McKinney
Hi, As discussed on the mailing list [1], it has been proposed to allow the use of unsigned dictionary indices (which is already technically possible in our metadata serialization, but not allowed according to the language of the columnar specification), with the following caveats: * Unless part

Re: [DISCUSS] Ongoing LZ4 problems with Parquet files

2020-06-29 Thread Wes McKinney
On Thu, Jun 25, 2020 at 3:31 AM Antoine Pitrou wrote: > > > Le 25/06/2020 à 00:02, Wes McKinney a écrit : > > hi folks, > > > > (cross-posting to dev@arrow and dev@parquet since there are > > stakeholders in both places) > > > > It seems there are still problems at least with the C++ implementatio

[VOTE] Increment MetadataVersion in Schema.fbs from V4 to V5 for 1.0.0 release

2020-06-29 Thread Wes McKinney
Hi, As discussed on the mailing list [1], in order to demarcate the pre-1.0.0 and post-1.0.0 worlds, and to allow the forward-compatibility-protection changes we are making to actually work (i.e. so that libraries can recognize that they have received data with a feature that they do not support),

[VOTE] Removing validity bitmap from Arrow union types

2020-06-29 Thread Wes McKinney
Hi, As discussed on the mailing list [1], it has been proposed to remove the validity bitmap buffer from Union types in the columnar format specification and instead let value validity be determined exclusively by constituent arrays of the union. One of the primary motivations for this is to simp

Re: Arrow for low-latency streaming of small batches?

2020-06-29 Thread Wes McKinney
On Fri, Jun 26, 2020 at 8:56 AM Chris Osborn wrote: > > Yes, it would be quite feasible to preallocate a region large enough for > several thousand rows for each column, assuming I read from that region while > it's still filling in. When that region is full, I could either allocate a > new big

Re: Deep copy for ArrayData,Array, Table in C++ API

2020-06-29 Thread Wes McKinney
On Mon, Jun 29, 2020 at 9:33 AM Radu Teodorescu wrote: > > Yes, > I am set for what I need at the moment but since I went for a deepish dive > into the current API, and this has been a recurring use case over the year I > would extend a few proposals, for expanding Take: > 1. Add support for pac

Re: [DISCUSS] Incrementing Arrow MetadataVersion from V4 to V5 for 1.0.0 release

2020-06-29 Thread Wes McKinney
Thanks David. Indeed it seems that exposing IpcWriteOptions is going to be critical here. I'd like to avoid an "environment variable" workaround at the C++ level instead only providing such things in e.g. Python like we did for the alignment patch On Mon, Jun 29, 2020 at 9:30 AM David Li wrote: >

Re: Deep copy for ArrayData,Array, Table in C++ API

2020-06-29 Thread Radu Teodorescu
Yes, I am set for what I need at the moment but since I went for a deepish dive into the current API, and this has been a recurring use case over the year I would extend a few proposals, for expanding Take: 1. Add support for packed indices - three avenues: a) expand Datum.Kind to allow f

Re: [DISCUSS] Incrementing Arrow MetadataVersion from V4 to V5 for 1.0.0 release

2020-06-29 Thread David Li
This would cause compatibility issues for Flight servers/clients between versions as well. The situation is a little worse since IpcWriteOptions isn't exposed and so you can't control what version you write. But just exposing them in lieu of a full negotiation (which we should start thinking about)