Re: [Discuss] Format additions to Arrow for sparse data and data integrity

2019-07-09 Thread Micah Kornfield
Hi Jacques, > That's quite interesting. Can you share more about the use case. Sorry I realized I missed answering this. We are still investigating, so the initial diagnosis might be off. The use-case is a data transfer application, reading data at rest, translating it to arrow and sending it o

Re: [Discuss] Compatibility Guarantees and Versioning Post "1.0.0"

2019-07-09 Thread Micah Kornfield
Hi Eric, Short answer: I think your understanding matches what I was proposing. Longer answer below. So, for example, we release library v1.0.0 in a few months and then library > v2.0.0 a few months after that. In v2.0.0, C++, Python, and Java didn't > make any breaking API changes from 1.0.0. Bu

[jira] [Created] (ARROW-5897) [Java] Remove duplicated logic in MapVector

2019-07-09 Thread Liya Fan (JIRA)
Liya Fan created ARROW-5897: --- Summary: [Java] Remove duplicated logic in MapVector Key: ARROW-5897 URL: https://issues.apache.org/jira/browse/ARROW-5897 Project: Apache Arrow Issue Type: Improvemen

Re: [DISCUSS][C++] Evaluating the arrow::Column C++ class

2019-07-09 Thread Wes McKinney
Thanks for the feedback. I just posted a PR that removes the class in the C++ and Python libraries, hopefully this will help with the discussion. That I was able to do it in less than a day should be good evidence that the abstraction may be superfluous https://github.com/apache/arrow/pull/4841

Re: Spark and Arrow Flight

2019-07-09 Thread Wes McKinney
Hi Ryan, have you thought about developing this inside Apache Arrow? On Tue, Jul 9, 2019, 5:42 PM Bryan Cutler wrote: > Great, thanks Ryan! I'll take a look > > On Tue, Jul 9, 2019 at 3:31 PM Ryan Murray wrote: > > > Hi Bryan, > > > > I have an implementation of option #3 nearly ready for a PR.

[jira] [Created] (ARROW-5896) [C#] Array Builders should take an initial capacity in their constructors

2019-07-09 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5896: --- Summary: [C#] Array Builders should take an initial capacity in their constructors Key: ARROW-5896 URL: https://issues.apache.org/jira/browse/ARROW-5896 Project: Apache

Re: Spark and Arrow Flight

2019-07-09 Thread Bryan Cutler
Great, thanks Ryan! I'll take a look On Tue, Jul 9, 2019 at 3:31 PM Ryan Murray wrote: > Hi Bryan, > > I have an implementation of option #3 nearly ready for a PR. I will mention > you when I publish it. > > The working prototype for the Spark connector is here: > https://github.com/rymurr/fligh

Re: Spark and Arrow Flight

2019-07-09 Thread Ryan Murray
Hi Bryan, I have an implementation of option #3 nearly ready for a PR. I will mention you when I publish it. The working prototype for the Spark connector is here: https://github.com/rymurr/flight-spark-source. It technically works (and is very fast!) however the implementation is pretty dodgy an

Re: Spark and Arrow Flight

2019-07-09 Thread Bryan Cutler
I'm in favor of option #3 also, but not sure what the best thing to do with the existing FlightInfo response is. I'm definitely interested in connecting Spark with Flight, can you share more details of your work or is it planned to be open sourced? Thanks, Bryan On Tue, Jul 2, 2019 at 3:35 AM Ant

Re: [DISCUSS][C++] Evaluating the arrow::Column C++ class

2019-07-09 Thread Tim Swast
FWIW, I found the Column class to be confusing in Python. It felt redundant / unneeded to actually create Tables. On Tue, Jul 9, 2019 at 11:19 AM Wes McKinney wrote: > On Tue, Jul 9, 2019 at 1:14 PM Antoine Pitrou wrote: > > > > > > Le 08/07/2019 à 23:17, Wes McKinney a écrit : > > > > > > I'm

RE: [Discuss] Compatibility Guarantees and Versioning Post "1.0.0"

2019-07-09 Thread Eric Erhardt
Just to be sure I fully understand the proposal: For the Library Version, we are going to increment the MAJOR version on every normal release, and increment the MINOR version if we need to release a patch/bug fix type of release. Since SemVer allows for API breaking changes on MAJOR versions, t

Re: [DISCUSS] Need for 0.14.1 release due to Python package problems, Parquet forward compatibility problems

2019-07-09 Thread Wes McKinney
Hi Eric -- of course! On Tue, Jul 9, 2019, 4:03 PM Eric Erhardt wrote: > Can we propose getting changes other than Python or Parquet related into > this release? > > For example, I found a critical issue in the C# implementation that, if > possible, I'd like to get included in a patch release. >

RE: [DISCUSS] Need for 0.14.1 release due to Python package problems, Parquet forward compatibility problems

2019-07-09 Thread Eric Erhardt
Can we propose getting changes other than Python or Parquet related into this release? For example, I found a critical issue in the C# implementation that, if possible, I'd like to get included in a patch release. https://github.com/apache/arrow/pull/4836 Eric -Original Message- From

[jira] [Created] (ARROW-5895) [Python] New version stores timestamps as epoch ms instead of ISO timestamp string

2019-07-09 Thread John Wilson (JIRA)
John Wilson created ARROW-5895: -- Summary: [Python] New version stores timestamps as epoch ms instead of ISO timestamp string Key: ARROW-5895 URL: https://issues.apache.org/jira/browse/ARROW-5895 Project:

[jira] [Created] (ARROW-5894) libgandiva.so.14 is exporting libstdc++ symbols

2019-07-09 Thread Zhuo Peng (JIRA)
Zhuo Peng created ARROW-5894: Summary: libgandiva.so.14 is exporting libstdc++ symbols Key: ARROW-5894 URL: https://issues.apache.org/jira/browse/ARROW-5894 Project: Apache Arrow Issue Type: Bug

Re: [DISCUSS][C++] Evaluating the arrow::Column C++ class

2019-07-09 Thread Wes McKinney
On Tue, Jul 9, 2019 at 1:14 PM Antoine Pitrou wrote: > > > Le 08/07/2019 à 23:17, Wes McKinney a écrit : > > > > I'm concerned about continuing to maintain the Column class as it's > > spilling complexity into computational libraries and bindings alike. > > > > The Python Column class for example

[jira] [Created] (ARROW-5893) [C++] Remove arrow::Column class from C++ library

2019-07-09 Thread Wes McKinney (JIRA)
Wes McKinney created ARROW-5893: --- Summary: [C++] Remove arrow::Column class from C++ library Key: ARROW-5893 URL: https://issues.apache.org/jira/browse/ARROW-5893 Project: Apache Arrow Issue Ty

Re: [DISCUSS][C++] Evaluating the arrow::Column C++ class

2019-07-09 Thread Antoine Pitrou
Le 08/07/2019 à 23:17, Wes McKinney a écrit : > > I'm concerned about continuing to maintain the Column class as it's > spilling complexity into computational libraries and bindings alike. > > The Python Column class for example mostly forwards method calls to > the underlying ChunkedArray > >

[jira] [Created] (ARROW-5892) [C++][Gandiva] Support function aliases

2019-07-09 Thread Prudhvi Porandla (JIRA)
Prudhvi Porandla created ARROW-5892: --- Summary: [C++][Gandiva] Support function aliases Key: ARROW-5892 URL: https://issues.apache.org/jira/browse/ARROW-5892 Project: Apache Arrow Issue Type

[jira] [Created] (ARROW-5891) [C++][Gandiva] Remove duplicates in function registries

2019-07-09 Thread Prudhvi Porandla (JIRA)
Prudhvi Porandla created ARROW-5891: --- Summary: [C++][Gandiva] Remove duplicates in function registries Key: ARROW-5891 URL: https://issues.apache.org/jira/browse/ARROW-5891 Project: Apache Arrow

[jira] [Created] (ARROW-5890) [C++][Python] Support ExtensionType arrays in more kernels

2019-07-09 Thread Joris Van den Bossche (JIRA)
Joris Van den Bossche created ARROW-5890: Summary: [C++][Python] Support ExtensionType arrays in more kernels Key: ARROW-5890 URL: https://issues.apache.org/jira/browse/ARROW-5890 Project: Apac

[jira] [Created] (ARROW-5889) [Python][C++] Parquet backwards compat for timestamps without timezone broken

2019-07-09 Thread Florian Jetter (JIRA)
Florian Jetter created ARROW-5889: - Summary: [Python][C++] Parquet backwards compat for timestamps without timezone broken Key: ARROW-5889 URL: https://issues.apache.org/jira/browse/ARROW-5889 Project

[jira] [Created] (ARROW-5888) [Python][C++] Parquet write metadata not roundtrip safe for timezone timestamps

2019-07-09 Thread Florian Jetter (JIRA)
Florian Jetter created ARROW-5888: - Summary: [Python][C++] Parquet write metadata not roundtrip safe for timezone timestamps Key: ARROW-5888 URL: https://issues.apache.org/jira/browse/ARROW-5888 Proje

Re: [DISCUSS][C++] Evaluating the arrow::Column C++ class

2019-07-09 Thread Francois Saint-Jacques
I'm also +1 on removing this class. François On Tue, Jul 9, 2019 at 10:57 AM Uwe L. Korn wrote: > > This sounds fine to me, thus I'm +1 on removing this class. > > On Tue, Jul 9, 2019, at 2:11 PM, Wes McKinney wrote: > > Yes, the schema would be the point of truth for the Field. The ChunkedArray

Re: [DISCUSS][C++] Evaluating the arrow::Column C++ class

2019-07-09 Thread Wes McKinney
I'll try to spend a little time soon refactoring to see how disruptive the change would be, and also to help persuade others about the benefits. On Tue, Jul 9, 2019 at 9:57 AM Uwe L. Korn wrote: > > This sounds fine to me, thus I'm +1 on removing this class. > > On Tue, Jul 9, 2019, at 2:11 PM, W

Re: [DISCUSS][C++] Evaluating the arrow::Column C++ class

2019-07-09 Thread Uwe L. Korn
This sounds fine to me, thus I'm +1 on removing this class. On Tue, Jul 9, 2019, at 2:11 PM, Wes McKinney wrote: > Yes, the schema would be the point of truth for the Field. The ChunkedArray > type would have to be validated against the schema types as with RecordBatch > > On Tue, Jul 9, 2019, 2:

[jira] [Created] (ARROW-5887) [C#] ArrowStreamWriter writes FieldNodes in wrong order

2019-07-09 Thread Eric Erhardt (JIRA)
Eric Erhardt created ARROW-5887: --- Summary: [C#] ArrowStreamWriter writes FieldNodes in wrong order Key: ARROW-5887 URL: https://issues.apache.org/jira/browse/ARROW-5887 Project: Apache Arrow Is

[jira] [Created] (ARROW-5886) [Python][Packaging] Manylinux1/2010 complience issue with libz

2019-07-09 Thread Krisztian Szucs (JIRA)
Krisztian Szucs created ARROW-5886: -- Summary: [Python][Packaging] Manylinux1/2010 complience issue with libz Key: ARROW-5886 URL: https://issues.apache.org/jira/browse/ARROW-5886 Project: Apache Arro

[jira] [Created] (ARROW-5885) Support optional arrow components via extras_require

2019-07-09 Thread George Sakkis (JIRA)
George Sakkis created ARROW-5885: Summary: Support optional arrow components via extras_require Key: ARROW-5885 URL: https://issues.apache.org/jira/browse/ARROW-5885 Project: Apache Arrow Iss

Re: [DISCUSS] Need for 0.14.1 release due to Python package problems, Parquet forward compatibility problems

2019-07-09 Thread Wes McKinney
On Tue, Jul 9, 2019 at 12:02 AM Sutou Kouhei wrote: > > Hi, > > > If the problems can be resolved quickly, I should think we could cut > > an RC for 0.14.1 by the end of this week. The RC could either be cut > > from a maintenance branch or out of master -- any thoughts about this > > (cutting fro

Re: [DISCUSS][C++] Evaluating the arrow::Column C++ class

2019-07-09 Thread Wes McKinney
Yes, the schema would be the point of truth for the Field. The ChunkedArray type would have to be validated against the schema types as with RecordBatch On Tue, Jul 9, 2019, 2:54 AM Uwe L. Korn wrote: > Hello Wes, > > where do you intend the Field object living then? Would this be part of > the

[jira] [Created] (ARROW-5884) [Java] Fix the get method of StructVector

2019-07-09 Thread Liya Fan (JIRA)
Liya Fan created ARROW-5884: --- Summary: [Java] Fix the get method of StructVector Key: ARROW-5884 URL: https://issues.apache.org/jira/browse/ARROW-5884 Project: Apache Arrow Issue Type: Bug

[jira] [Created] (ARROW-5883) [Java] Support Dictionary Encoding for List type

2019-07-09 Thread Ji Liu (JIRA)
Ji Liu created ARROW-5883: - Summary: [Java] Support Dictionary Encoding for List type Key: ARROW-5883 URL: https://issues.apache.org/jira/browse/ARROW-5883 Project: Apache Arrow Issue Type: Improveme

[jira] [Created] (ARROW-5882) [C++][Gandiva] Throw error if divisor is 0 in integer mod functions

2019-07-09 Thread Prudhvi Porandla (JIRA)
Prudhvi Porandla created ARROW-5882: --- Summary: [C++][Gandiva] Throw error if divisor is 0 in integer mod functions Key: ARROW-5882 URL: https://issues.apache.org/jira/browse/ARROW-5882 Project: Apa

Re: [DISCUSS][C++] Evaluating the arrow::Column C++ class

2019-07-09 Thread Uwe L. Korn
Hello Wes, where do you intend the Field object living then? Would this be part of the schema of the Table object? Uwe On Mon, Jul 8, 2019, at 11:18 PM, Wes McKinney wrote: > hi folks, > > For some time now I have been uncertain about the utility provided by > the arrow::Column C++ class. Fund

[jira] [Created] (ARROW-5881) [Java] Provide functionalities to efficiently determine if a validity buffer has completely 1 bits/0 bits

2019-07-09 Thread Liya Fan (JIRA)
Liya Fan created ARROW-5881: --- Summary: [Java] Provide functionalities to efficiently determine if a validity buffer has completely 1 bits/0 bits Key: ARROW-5881 URL: https://issues.apache.org/jira/browse/ARROW-5881