Re: [DISCUSS] IPC buffer layout for Null type

2019-09-20 Thread Wes McKinney
Thanks. I committed it and opened some new JIRA issues that have been attached to https://issues.apache.org/jira/browse/ARROW-1636 On Fri, Sep 20, 2019 at 4:37 AM Micah Kornfield wrote: > > I think committing as is, is the better of the two options. > > On Thu, Sep 19, 2019 at 12:35 PM Wes McKin

Re: [DISCUSS] IPC buffer layout for Null type

2019-09-20 Thread Micah Kornfield
I think committing as is, is the better of the two options. On Thu, Sep 19, 2019 at 12:35 PM Wes McKinney wrote: > OK, my preference, therefore, would be to rebase and merge my patch > without bothering with backwards compatibility code. The situations > where there would be an issue are fairly

Re: [DISCUSS] IPC buffer layout for Null type

2019-09-19 Thread Wes McKinney
OK, my preference, therefore, would be to rebase and merge my patch without bothering with backwards compatibility code. The situations where there would be an issue are fairly esoteric. https://github.com/apache/arrow/pull/5287 On Thu, Sep 19, 2019 at 2:29 PM Antoine Pitrou wrote: > > > Well, t

Re: [DISCUSS] IPC buffer layout for Null type

2019-09-19 Thread Antoine Pitrou
Well, this is an incompatible IPC change, so ideally it should be done now, not later. Regards Antoine. On Thu, 19 Sep 2019 14:08:37 -0500 Wes McKinney wrote: > I'm concerned about rushing through any patch for this for 0.15.0, but > each release with the status quo increases the risk of ma

Re: [DISCUSS] IPC buffer layout for Null type

2019-09-19 Thread Wes McKinney
I'm concerned about rushing through any patch for this for 0.15.0, but each release with the status quo increases the risk of making changes. Thoughts? On Fri, Sep 6, 2019 at 12:59 PM Wes McKinney wrote: > > On Fri, Sep 6, 2019 at 12:57 PM Micah Kornfield wrote: > > > > > > > > We can't because

Re: [DISCUSS] IPC buffer layout for Null type

2019-09-06 Thread Wes McKinney
On Fri, Sep 6, 2019 at 12:57 PM Micah Kornfield wrote: > > > > > We can't because the buffer layout is not transmitted -- implementations > > make assumptions about what Buffer values correspond to each field. The > > only thing we could do to signal the change would be to increase the > > metadat

Re: [DISCUSS] IPC buffer layout for Null type

2019-09-06 Thread Micah Kornfield
> > We can't because the buffer layout is not transmitted -- implementations > make assumptions about what Buffer values correspond to each field. The > only thing we could do to signal the change would be to increase the > metadata version from V4 to V5. If we do this within 0.15.0 we could infer

Re: [DISCUSS] IPC buffer layout for Null type

2019-09-06 Thread Wes McKinney
On Fri, Sep 6, 2019, 12:08 PM Antoine Pitrou wrote: > > Null can also come up when converting a column with only NA values in a > CSV file. I don't remember for sure, but I think the same can happen > with JSON files as well. > > Can't we accept both forms when reading? It sounds like it should

Re: [DISCUSS] IPC buffer layout for Null type

2019-09-06 Thread Antoine Pitrou
Null can also come up when converting a column with only NA values in a CSV file. I don't remember for sure, but I think the same can happen with JSON files as well. Can't we accept both forms when reading? It sounds like it should be reasonably easy. Regards Antoine. Le 06/09/2019 à 17:36

Re: [DISCUSS] IPC buffer layout for Null type

2019-09-06 Thread Wes McKinney
hi Micah, Null wouldn't come up that often in practice. It could happen when converting from pandas, for example In [8]: df = pd.DataFrame({'col1': np.array([np.nan] * 10, dtype=object)}) In [9]: t = pa.table(df) In [10]: t Out[10]: pyarrow.Table col1: null metadata {b'pandas': b'{"ind

Re: [DISCUSS] IPC buffer layout for Null type

2019-09-05 Thread Micah Kornfield
Hi Wes and others, I don't have a sense of where Null arrays get created in the existing code base? Also, do you think it is worth the effort make this backwards compatible. We could in theory tie the buffer count to having the continuation value for alignment. The one area were I'm slightly conc

[DISCUSS] IPC buffer layout for Null type

2019-09-05 Thread Wes McKinney
hi folks, One of the as-yet-untested (in integration tests) parts of the columnar specification is the Null layout. In C++ we additionally implemented this by writing two length-0 "placeholder" buffers in the RecordBatch data header, but since the Null layout has no memory allocated nor any buffer