[jira] [Created] (ARROW-5116) [Rust] move kernel related files under compute/kernels

2019-04-03 Thread Chao Sun (JIRA)
Chao Sun created ARROW-5116: --- Summary: [Rust] move kernel related files under compute/kernels Key: ARROW-5116 URL: https://issues.apache.org/jira/browse/ARROW-5116 Project: Apache Arrow Issue Type:

[jira] [Created] (ARROW-5115) [JS] Implement the Vector Builders

2019-04-03 Thread Paul Taylor (JIRA)
Paul Taylor created ARROW-5115: -- Summary: [JS] Implement the Vector Builders Key: ARROW-5115 URL: https://issues.apache.org/jira/browse/ARROW-5115 Project: Apache Arrow Issue Type: New Feature

[jira] [Created] (ARROW-5114) Test for cross-version Flight compatibility

2019-04-03 Thread David Li (JIRA)
David Li created ARROW-5114: --- Summary: Test for cross-version Flight compatibility Key: ARROW-5114 URL: https://issues.apache.org/jira/browse/ARROW-5114 Project: Apache Arrow Issue Type: Improvemen

Re: [RESULT][VOTE] Release Apache Arrow 0.13.0 - RC4

2019-04-03 Thread Wes McKinney
hi folks -- I mistakenly omitted some of the new committers since 0.12.0 from the announcement blog post. http://arrow.apache.org/blog/2019/04/02/0.13.0-release/ We actually added 4 committers since 0.12.0 (Micah, Sun, Paddy, and Ravindra). I'll update the blog post a bit later today. Please acce

Re: [RESULT][VOTE] Release Apache Arrow 0.13.0 - RC4

2019-04-03 Thread Wes McKinney
Some other docs still need to be updated: * GLib * Java * JavaScript The documentation generation script and Docker setup is still broken per https://issues.apache.org/jira/browse/ARROW-4309 On Tue, Apr 2, 2019 at 12:54 PM Wes McKinney wrote: > > I'll update the main docs site today > > On Mon,

[jira] [Created] (ARROW-5113) [C++][Flight] Unit tests in C++ for DoPut

2019-04-03 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5113: --- Summary: [C++][Flight] Unit tests in C++ for DoPut Key: ARROW-5113 URL: https://issues.apache.org/jira/browse/ARROW-5113 Project: Apache Arrow Issue Type: Impr

Re: round-trip tests for Arrow files

2019-04-03 Thread Wes McKinney
hi Sebastien, The integration tests indeed should work (they are run in the for release verification script [1]), so something is either wrong with your C++ build or your environment if integration_test.py fails. It would be great to get Go into the integration tests to have proof that the impleme

round-trip tests for Arrow files

2019-04-03 Thread Sebastien Binet
hi there, I am working on the deserialization support for the Go backend. at this point, I have (I think) primitive and binary/string arrays working with a simple Arrow file I created like so: import pyarrow as pa data = [ pa.array([1, 2, 3, None, 5], type="i4"), pa.array(['foo', 'bar', '

Re: [VOTE] Add new DurationInterval Type to Arrow Format

2019-04-03 Thread Jacques Nadeau
Yes, copy and paste error: +1 to add the new type (binding) On Wed, Apr 3, 2019 at 8:36 AM Wes McKinney wrote: > +1 (binding) to add the new type > > On Wed, Apr 3, 2019 at 10:35 AM Micah Kornfield > wrote: > > > > +1 (non-binding). > > > > P.S. Copy and paste error on the plus 1 option from t

Re: [DISCUSS] Change Flight ListFlights return value to stream of FlightDescriptor

2019-04-03 Thread Jacques Nadeau
> > Can you explain what you call "splits"? > Per Wes's comments, FlightItineraries inside FlightGetInfo. also is it possible a service have tons of flights? Yes > if so, some kind of > pagination need to be done here? Criteria and the stream interface should be sufficient. We need to work

Re: [DISCUSS] Change Flight ListFlights return value to stream of FlightDescriptor

2019-04-03 Thread ming zhang
imho, list just show what a service has. does not need to provide detail information about each one. we already have a separate method to fetch detail about a flight. optionally we could change GetFlightInfo(FlightDescriptor) returns (FlightGetInfo) {} to input a stream so someone could fetch a bat

[jira] [Created] (ARROW-5112) [Go] implement writing arrays to Arrow file

2019-04-03 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5112: -- Summary: [Go] implement writing arrays to Arrow file Key: ARROW-5112 URL: https://issues.apache.org/jira/browse/ARROW-5112 Project: Apache Arrow Issue Ty

[jira] [Created] (ARROW-5111) [Go] implement reading list arrays from Arrow file

2019-04-03 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5111: -- Summary: [Go] implement reading list arrays from Arrow file Key: ARROW-5111 URL: https://issues.apache.org/jira/browse/ARROW-5111 Project: Apache Arrow I

[jira] [Created] (ARROW-5110) [Go] implement reading struct arrays from Arrow file

2019-04-03 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5110: -- Summary: [Go] implement reading struct arrays from Arrow file Key: ARROW-5110 URL: https://issues.apache.org/jira/browse/ARROW-5110 Project: Apache Arrow

[jira] [Created] (ARROW-5109) [Go] implement reading binary/string arrays from Arrow file

2019-04-03 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5109: -- Summary: [Go] implement reading binary/string arrays from Arrow file Key: ARROW-5109 URL: https://issues.apache.org/jira/browse/ARROW-5109 Project: Apache Arrow

[jira] [Created] (ARROW-5108) [Go] implement reading primitive arrays from Arrow file

2019-04-03 Thread Sebastien Binet (JIRA)
Sebastien Binet created ARROW-5108: -- Summary: [Go] implement reading primitive arrays from Arrow file Key: ARROW-5108 URL: https://issues.apache.org/jira/browse/ARROW-5108 Project: Apache Arrow

Arrow sync call 12:00 US/Eastern, 16:00 UTC

2019-04-03 Thread Wes McKinney
hi folks The regularly scheduled twice-monthly sync call starts in a few minutes https://meet.google.com/vtm-teks-phx I am conflicted again today and unable to join, but I will look out for any notes and follow up discussions. - Wes

Re: [VOTE] Proposed changes to Arrow Flight protocol

2019-04-03 Thread Bryan Cutler
+1 (non-binding) On Wed, Apr 3, 2019 at 7:52 AM Jacques Nadeau wrote: > I'm +1 to all four (binding) > > On Wed, Apr 3, 2019 at 1:56 AM Antoine Pitrou wrote: > > > > > > > Le 03/04/2019 à 02:05, Wes McKinney a écrit : > > > Hi, > > > > > > David Li has proposed to make the following additions o

Re: [DISCUSS] Change Flight ListFlights return value to stream of FlightDescriptor

2019-04-03 Thread Wes McKinney
hi Jacques, I agree with you -- I had this concern also during implementation that the query plans have to be generated both in ListFlights and GetFlightInfo. Antoine -- I think "splits" here means the pieces of a distributed dataset. So if a Flight is spread across multiple hosts, then when you

Re: [VOTE] Add new DurationInterval Type to Arrow Format

2019-04-03 Thread Wes McKinney
+1 (binding) to add the new type On Wed, Apr 3, 2019 at 10:35 AM Micah Kornfield wrote: > > +1 (non-binding). > > P.S. Copy and paste error on the plus 1 option from the flight vote? > > On Wednesday, April 3, 2019, Jacques Nadeau wrote: > > > I'd like to propose a change to the Arrow format to

Re: [VOTE] Add new DurationInterval Type to Arrow Format

2019-04-03 Thread Micah Kornfield
+1 (non-binding). P.S. Copy and paste error on the plus 1 option from the flight vote? On Wednesday, April 3, 2019, Jacques Nadeau wrote: > I'd like to propose a change to the Arrow format to support a new duration > type. Details below. Threads on mailing list around discussion. > > > // An ab

Re: [Flight] Question about C++ implementation

2019-04-03 Thread Antoine Pitrou
Hi David, Thanks for the clarification. I agree about not exposing Protobuf in the public API. But it's a pity it forces us to such manual data wrangling... Regards Antoine. Le 03/04/2019 à 17:12, David Li a écrit : > Hi Antoine, > > The ToProto/FromProto methods convert between Protobuf

Re: [Flight] Question about C++ implementation

2019-04-03 Thread David Li
Hi Antoine, The ToProto/FromProto methods convert between Protobuf structs and Flight-specific structs. They aren't actually parsing or serializing anything. While you could argue for just using the Protobuf structs directly, I there are a few reasons not to: - We don't want to expose Protobuf in

Re: [DISCUSS][Format] Time Interval Changes

2019-04-03 Thread Micah Kornfield
Sgtm, I think a PMC member needs to kick it off? On Wednesday, April 3, 2019, Wes McKinney wrote: > Agreed > > On Wed, Apr 3, 2019 at 9:53 AM Jacques Nadeau wrote: > > > > Option 1 sounds good to me. Let's take to a vote. > > > > On Tue, Apr 2, 2019 at 8:53 PM Micah Kornfield > wrote: > >> > >

Re: [DISCUSS] Change Flight ListFlights return value to stream of FlightDescriptor

2019-04-03 Thread Antoine Pitrou
On Wed, 3 Apr 2019 08:01:25 -0700 Jacques Nadeau wrote: > Right now, the ListFlights method returns a stream of FlightGetInfo (to be > renamed FlightInfo). This actually turns out to be quite expensive in many > cases since splits have to be generated. I'd like to propose changing this > method to

[Flight] Question about C++ implementation

2019-04-03 Thread Antoine Pitrou
Hello, Why do we have parsing / unparsing implementations in src/arrow/flight/internal.cc? I assumed gRPC / protobuf would give this to us for free. Instead it seems we have to write it ourselves? Regards Antoine.

[DISCUSS] Change Flight ListFlights return value to stream of FlightDescriptor

2019-04-03 Thread Jacques Nadeau
Right now, the ListFlights method returns a stream of FlightGetInfo (to be renamed FlightInfo). This actually turns out to be quite expensive in many cases since splits have to be generated. I'd like to propose changing this method to return a stream of FlightDescriptors instead. What do people thi

Re: How to understand this comment

2019-04-03 Thread Wes McKinney
On Wed, Apr 3, 2019 at 9:56 AM Jacques Nadeau wrote: > > > > > "To consume the whole flight, generally all endpoints must be consumed" > > > > I actually think the problem is we changed some names at one point and this > comment is a bit behind. This should probably read that "To consume the > who

[VOTE] Add new DurationInterval Type to Arrow Format

2019-04-03 Thread Jacques Nadeau
I'd like to propose a change to the Arrow format to support a new duration type. Details below. Threads on mailing list around discussion. // An absolute length of time unrelated to any calendar artifacts. For the purposes /// of Arrow Implementations, adding this value to a Timestamp ("t1") nai

Re: [DISCUSS][Format] Time Interval Changes

2019-04-03 Thread Wes McKinney
Agreed On Wed, Apr 3, 2019 at 9:53 AM Jacques Nadeau wrote: > > Option 1 sounds good to me. Let's take to a vote. > > On Tue, Apr 2, 2019 at 8:53 PM Micah Kornfield wrote: >> >> Based on the discussion so far, my attempt at concrete Schema proposals >> below.Jacques I think summarizes what w

Re: How to understand this comment

2019-04-03 Thread Jacques Nadeau
> > "To consume the whole flight, generally all endpoints must be consumed" > I actually think the problem is we changed some names at one point and this comment is a bit behind. This should probably read that "To consume the whole flight all Flight Itineraries must be consumed. Any endpoint liste

Re: [DISCUSS][Format] Time Interval Changes

2019-04-03 Thread Jacques Nadeau
Option 1 sounds good to me. Let's take to a vote. On Tue, Apr 2, 2019 at 8:53 PM Micah Kornfield wrote: > Based on the discussion so far, my attempt at concrete Schema proposals > below.Jacques I think summarizes what we've discussed, apologies if > I've misunderstood. Wes would Option 1 wo

Re: [VOTE] Proposed changes to Arrow Flight protocol

2019-04-03 Thread Jacques Nadeau
I'm +1 to all four (binding) On Wed, Apr 3, 2019 at 1:56 AM Antoine Pitrou wrote: > > > Le 03/04/2019 à 02:05, Wes McKinney a écrit : > > Hi, > > > > David Li has proposed to make the following additions or changes > > to the Flight gRPC service definition [1] and general design, as > explained

Re: How to understand this comment

2019-04-03 Thread Wes McKinney
hi, On Wed, Apr 3, 2019 at 8:19 AM ming zhang wrote: > > but then the comment is not right? since not all flights will be consumed? > > "To consume the whole > > > >* flight, all endpoints must be consumed." > I think we might be splitting hairs a little bit. Would it help if we added "gener

Re: How to understand this comment

2019-04-03 Thread Jacques Nadeau
The current model is there is a fixed number of itineraries. The available endpoints could include multiple transports theoretically. You're example where there are a variable number of itineraries depending on protocol is not currently supported. In that case, I would suggest that the list includ

Re: How to understand this comment

2019-04-03 Thread ming zhang
but then the comment is not right? since not all flights will be consumed? "To consume the whole > > >* flight, all endpoints must be consumed." if we introduce a itinerary concept, we have a complete story and mental model. something like message FlightGetInfo { // schema of the dataset a

[jira] [Created] (ARROW-5107) [Release] Validate non-RC source and binary artifacts

2019-04-03 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-5107: -- Summary: [Release] Validate non-RC source and binary artifacts Key: ARROW-5107 URL: https://issues.apache.org/jira/browse/ARROW-5107 Project: Apache Arrow

[jira] [Created] (ARROW-5106) ARROW-5094: [Packaging] [C++/Python] Add conda package verification scripts

2019-04-03 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-5106: -- Summary: ARROW-5094: [Packaging] [C++/Python] Add conda package verification scripts Key: ARROW-5106 URL: https://issues.apache.org/jira/browse/ARROW-5106 Project

[jira] [Created] (ARROW-5105) ARROW-5094: [Packaging] [Python] Add Wheel verification scripts

2019-04-03 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-5105: -- Summary: ARROW-5094: [Packaging] [Python] Add Wheel verification scripts Key: ARROW-5105 URL: https://issues.apache.org/jira/browse/ARROW-5105 Project: Apache Arr

[jira] [Created] (ARROW-5104) [Python/C++] Schema for empty tables include index column as integer

2019-04-03 Thread Florian Jetter (JIRA)
Florian Jetter created ARROW-5104: - Summary: [Python/C++] Schema for empty tables include index column as integer Key: ARROW-5104 URL: https://issues.apache.org/jira/browse/ARROW-5104 Project: Apache

[jira] [Created] (ARROW-5103) Segfault when using chunked_array.to_pandas on array different types (edge case)

2019-04-03 Thread Artem KOZHEVNIKOV (JIRA)
Artem KOZHEVNIKOV created ARROW-5103: Summary: Segfault when using chunked_array.to_pandas on array different types (edge case) Key: ARROW-5103 URL: https://issues.apache.org/jira/browse/ARROW-5103

Re: C++ and Python size problems with Arrow 0.13.0

2019-04-03 Thread Krisztián Szűcs
This is what the wheel contains before running auditwheel: -rwxr-xr-x 1 root root 128K Apr 3 09:02 libarrow_boost_filesystem.so -rwxr-xr-x 1 root root 128K Apr 3 09:02 libarrow_boost_filesystem.so.1.66.0 -rwxr-xr-x 1 root root 1.2M Apr 3 09:02 libarrow_boost_regex.so -rwxr-xr-x 1 root root

Re: [VOTE] Proposed changes to Arrow Flight protocol

2019-04-03 Thread Antoine Pitrou
Le 03/04/2019 à 02:05, Wes McKinney a écrit : > Hi, > > David Li has proposed to make the following additions or changes > to the Flight gRPC service definition [1] and general design, as explained in > greater detail in the linked Google Docs document [2]. Arrow > Flight is an in-development m

[jira] [Created] (ARROW-5102) [C++] Reduce header dependencies

2019-04-03 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-5102: - Summary: [C++] Reduce header dependencies Key: ARROW-5102 URL: https://issues.apache.org/jira/browse/ARROW-5102 Project: Apache Arrow Issue Type: Wish

Re: [Discuss][Format] Arrow Flight URI scheme proposal

2019-04-03 Thread Antoine Pitrou
I can. Regards Antoine. Le 03/04/2019 à 02:32, Wes McKinney a écrit : > I started a vote for the other Flight discussion thread, which will > close on Friday. Since I'm about to leave on vacation can Antoine or > Jacques run the vote for this one? > > Thanks > > On Tue, Apr 2, 2019 at 7:07

Re: C++ and Python size problems with Arrow 0.13.0

2019-04-03 Thread Antoine Pitrou
Le 03/04/2019 à 02:23, Wes McKinney a écrit : > > $ ll Library/lib/ > total 741796 > -rw-r--r-- 1 wesm wesm 1507048 Mar 27 23:34 arrow.lib > -rw-r--r-- 1 wesm wesm 76184 Mar 27 23:35 arrow_python.lib > -rw-r--r-- 1 wesm wesm 61322082 Mar 27 23:36 arrow_python_static.lib > -rw-r--r-- 1 wes

Re: C++ and Python size problems with Arrow 0.13.0

2019-04-03 Thread Krisztián Szűcs
On Wed, Apr 3, 2019 at 2:24 AM Wes McKinney wrote: > hi folks, > > I that the arrow-cpp conda packages for Windows have ballooned in size > to nearly 140 megabytes for RC4 > > > https://bintray.com/apache/arrow/python-rc/0.13.0-rc4#files/python-rc/0.13.0-rc4 > > Looking at one of these packages i

[jira] [Created] (ARROW-5101) [Packaging] Avoid bundling static libraries in Windows conda packages

2019-04-03 Thread Antoine Pitrou (JIRA)
Antoine Pitrou created ARROW-5101: - Summary: [Packaging] Avoid bundling static libraries in Windows conda packages Key: ARROW-5101 URL: https://issues.apache.org/jira/browse/ARROW-5101 Project: Apache