[jira] [Created] (ARROW-827) [Python] Variety of Parquet improvements to support Dask integration

2017-04-14 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-827: -- Summary: [Python] Variety of Parquet improvements to support Dask integration Key: ARROW-827 URL: https://issues.apache.org/jira/browse/ARROW-827 Project: Apache Arrow

[jira] [Created] (ARROW-826) Compilation error on Mac with -DARROW_PYTHON=on

2017-04-14 Thread Robert Nishihara (JIRA)
Robert Nishihara created ARROW-826: -- Summary: Compilation error on Mac with -DARROW_PYTHON=on Key: ARROW-826 URL: https://issues.apache.org/jira/browse/ARROW-826 Project: Apache Arrow Issue

Re: pyarrow using socket as sink for StreamWriter

2017-04-14 Thread Bryan Cutler
Made a Jira for this issue https://issues.apache.org/jira/browse/ARROW-822 On Apr 13, 2017 11:31 AM, "Bryan Cutler" wrote: > Hi Devs, > > What is the recommended way to use the pyarrow StreamWriter to write to a > socket? I've the following: > > - Use socket directly and get "TypeError: Unable

Re: Say no to zero length batches...

2017-04-14 Thread Jacques Nadeau
If I'm the sole voice on this perspective, I'll concede the point. I didn't even catch the increase in allowed record batch sizes as part of ARROW-661 and ARROW-679. :( I'm of split mind of the thoughts there: - We need more applications so making sure that we have the features available to supp

Re: Say no to zero length batches...

2017-04-14 Thread Wes McKinney
It seems like we could address these concerns by adding alternate write/read APIs that do the dropping (on write) / skipping (on load) automatically, so it doesn't have to bubble up into application logic. On Fri, Apr 14, 2017 at 7:56 PM, Wes McKinney wrote: > > Since Arrow already requires a ba

Re: Say no to zero length batches...

2017-04-14 Thread Wes McKinney
> Since Arrow already requires a batch to be no larger than 2^16-1 records in size, it won't map 1:1 to an arbitrary construct. This is only true of some Arrow applications (e.g. Drill), which is why I think this is an application-level concern. In ARROW-661 and ARROW-679, we modified the metadata

Re: Say no to zero length batches...

2017-04-14 Thread Jacques Nadeau
To Jason's comments: Data and control flow should be separate. Schema (a.k.a. a head-type message) is already defined separate from a batch of records. I'm all for a termination message as well from a stream perspective. (I don't think it makes sense to couple record batch size to termination-- I'

[jira] [Created] (ARROW-825) [Python] Generalize pyarrow.from_pylist to accept any object implementing the PySequence protocol

2017-04-14 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-825: -- Summary: [Python] Generalize pyarrow.from_pylist to accept any object implementing the PySequence protocol Key: ARROW-825 URL: https://issues.apache.org/jira/browse/ARROW-825

Re: Say no to zero length batches...

2017-04-14 Thread Ted Dunning
Speaking as a relative outsider, having the boundary cases for a transfer protocol be MORE restrictive than the senders and receivers is asking for boundary bugs. In this case, both the senders and receiver think that the boundary is 0 (empty lists, empty data frames, 0 results from a database). H

Re: Say no to zero length batches...

2017-04-14 Thread Jason Altekruse
I'm with Wes on this one. A bunch of systems have constructs that deal with zero length collections, lists, iterators, etc. These are established patterns that everyone knows they need to handle the empty case. Forcing applications to create an unnecessary protocol complexity of a special sentinel

[jira] [Created] (ARROW-824) Date and Time Vectors should reflect timezone-less semantics

2017-04-14 Thread Julien Le Dem (JIRA)
Julien Le Dem created ARROW-824: --- Summary: Date and Time Vectors should reflect timezone-less semantics Key: ARROW-824 URL: https://issues.apache.org/jira/browse/ARROW-824 Project: Apache Arrow

Re: Say no to zero length batches...

2017-04-14 Thread Wes McKinney
Here is valid pyarrow code that works right now: import pyarrow as pa rb = pa.RecordBatch.from_arrays([ pa.from_pylist([1, 2, 3]), pa.from_pylist(['foo', 'bar', 'baz']) ], names=['one', 'two']) batches = [rb, rb.slice(0, 0)] stream = pa.InMemoryOutputStream() writer = pa.StreamWriter(s

Say no to zero length batches...

2017-04-14 Thread Jacques Nadeau
Hey All, I had a quick comment on ARROW-783 that Wes responded to and I wanted to elevate the conversation here for a moment. My suggestion there was that we should disallow zero-length batches. Wes thought that should be an application level concern. I wanted to see what others thought. My gen

Re: Arrow 0.3 release timeline

2017-04-14 Thread Julien Le Dem
I reviewed the currently pending PRs on the java side. I opened 2 PRs for the opened java JIRAs from the list: ARROW-777, ARROW-720 On Fri, Apr 14, 2017 at 12:55 PM, Julien Le Dem wrote: > I'm looking through them > > On Fri, Apr 14, 2017 at 9:26 AM, Wes McKinney wrote: > >> hi all, >> >> I'm w

[jira] [Created] (ARROW-823) [Python] Devise a means to serialize arrays of arbitrary Python objects in Arrow IPC messages

2017-04-14 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-823: -- Summary: [Python] Devise a means to serialize arrays of arbitrary Python objects in Arrow IPC messages Key: ARROW-823 URL: https://issues.apache.org/jira/browse/ARROW-823

Re: Arrow 0.3 release timeline

2017-04-14 Thread Julien Le Dem
I'm looking through them On Fri, Apr 14, 2017 at 9:26 AM, Wes McKinney wrote: > hi all, > > I'm working to close out the remaining Python and C++ stuff we wanted > to get in to 0.3 for the sake of other Python projects that want to > use Arrow. > > There are 8 patches up that touch the Java code

[jira] [Created] (ARROW-822) [Python] StreamWriter fails to open with socket as sink

2017-04-14 Thread Bryan Cutler (JIRA)
Bryan Cutler created ARROW-822: -- Summary: [Python] StreamWriter fails to open with socket as sink Key: ARROW-822 URL: https://issues.apache.org/jira/browse/ARROW-822 Project: Apache Arrow Issue

Re: Arrow 0.3 release timeline

2017-04-14 Thread Wes McKinney
hi all, I'm working to close out the remaining Python and C++ stuff we wanted to get in to 0.3 for the sake of other Python projects that want to use Arrow. There are 8 patches up that touch the Java codebase. If we can get all these closed out then I think we should be able to cut a release cand

[jira] [Created] (ARROW-821) Extra file _table_api.h generated during Python build process

2017-04-14 Thread Phillip Cloud (JIRA)
Phillip Cloud created ARROW-821: --- Summary: Extra file _table_api.h generated during Python build process Key: ARROW-821 URL: https://issues.apache.org/jira/browse/ARROW-821 Project: Apache Arrow