Re: Parquet to Arrow in Java

2019-08-04 Thread Micah Kornfield
Hi Anoop, I think a contribution would be welcome. There was a recent discussion thread on what would be expected from new "readers" for Arrow data in Java [1]. I think its worth reading through but my recollections of the highlights are: 1. A short design sketch in the JIRA that will track the

[jira] [Created] (ARROW-6131) [C++] Optimize the Arrow UTF-8-string-validation

2019-08-04 Thread Yuqi Gu (JIRA)
Yuqi Gu created ARROW-6131: -- Summary: [C++] Optimize the Arrow UTF-8-string-validation Key: ARROW-6131 URL: https://issues.apache.org/jira/browse/ARROW-6131 Project: Apache Arrow Issue Type: Improv

Re: [VOTE] Adopt FORMAT and LIBRARY SemVer-based version schemes for Arrow 1.0.0 and beyond

2019-08-04 Thread Jacques Nadeau
Looks good. +1 from me. Thanks for driving this to conclusion. On Wed, Jul 31, 2019, 12:04 PM Bryan Cutler wrote: > +1 (non-binding) > > On Wed, Jul 31, 2019 at 8:59 AM Uwe L. Korn wrote: > > > +1 from me. > > > > I really like the separate versions > > > > Uwe > > > > On Tue, Jul 30, 2019, at

[jira] [Created] (ARROW-6130) [Release] Use 0.15.0 as the next release

2019-08-04 Thread Sutou Kouhei (JIRA)
Sutou Kouhei created ARROW-6130: --- Summary: [Release] Use 0.15.0 as the next release Key: ARROW-6130 URL: https://issues.apache.org/jira/browse/ARROW-6130 Project: Apache Arrow Issue Type: Impro

Re: Parquet to Arrow in Java

2019-08-04 Thread Anoop Johnson
Thanks for the response Micah. I could implement this and contribute to Arrow Java. To help me get started, are there any pointers on how the C++ or Rust implementations currently read Parquet into Arrow? Are they reading Parquet row-by-row and building Arrow batches or are there better ways of imp

[jira] [Created] (ARROW-6129) Row_groups duplicate Rows

2019-08-04 Thread albertoramon (JIRA)
albertoramon created ARROW-6129: --- Summary: Row_groups duplicate Rows Key: ARROW-6129 URL: https://issues.apache.org/jira/browse/ARROW-6129 Project: Apache Arrow Issue Type: Bug Compon