Re: [RESULT] [VOTE] Accept donation of Arrow Ruby bindings

2018-05-18 Thread Kouhei Sutou
Hi, Thanks all! What should I do as the next action? Is https://github.com/apache/arrow/pull/1990#issuecomment-388199184 helpful? Thanks, -- kou In "[RESULT] [VOTE] Accept donation of Arrow Ruby bindings" on Thu, 17 May 2018 07:41:53 +0900, Wes McKinney wrote: > With 5 binding +1 votes

C++: Covariant Return Types for Array::type()

2018-05-18 Thread joshuastorck
I've put together a proposal for using covariant return types in Array::type() in the C++ library. I wanted to get some feedback before putting together a PR in case it's too controversial or would require to much re-factoring of the code: https://docs.google.com/document/d/14mLO9uNIcrb-yTj_byB

Re: Refactoring the Rust API

2018-05-18 Thread Wes McKinney
hi Andy, I gave a read through the Rust implementation. I have never programmed in Rust (hope to change that someday!), so some of the programming constructs are lost on me, but I focused on the Arrow columnar questions. My notes / questions follow cheers Wes ## High level comments * Do you pla

Re: PyArrow and Parquet DELTA_BINARY_PACKED

2018-05-18 Thread Feras Salim
Hi Wes, The raw file in CSV is about a gig. Gzipped is about 50mb and the most I could compress it with parquet V1 was 21mb and V2 (same settings) about 25mb. It's quite surprising that it changes how the data is encoded between versions, given that Uwe said "The only difference between the two v

Re: Continuous benchmarking setup

2018-05-18 Thread Wes McKinney
I know the tool we are using for Python benchmarks is Python-specific -- it would be interesting to see if there's a way to ingest benchmark output (as JSON or some other output) from other programming languages. On Mon, May 14, 2018 at 8:56 AM, Brian Hulette wrote: > Is anyone aware of a way we

Re: PyArrow and Parquet DELTA_BINARY_PACKED

2018-05-18 Thread Wes McKinney
hi Feras, How large are the files? For small files, differences in metadata could impact the file size more significantly. I would be surprised if this were the case with larger files, though (I'm not sure what fraction of a column chunk consists of data page headers vs. actual data in practice)

Re: New Arrow PMC Member: Siddharth Teotia

2018-05-18 Thread Bryan Cutler
Congratulations Sidd! On Thu, May 17, 2018 at 11:28 AM, Wes McKinney wrote: > The Project Management Committee (PMC) for Apache Arrow has invited > Siddharth Teotia to become a PMC member and we are pleased to announce > that he has accepted. > > Congratulations and welcome, Sidd! >

Rust IPC and Integration Testing

2018-05-18 Thread Andy Grove
Hi, Now that the refactor I've been working on has been merged, the next priority for me personally with the Rust implementation is getting IPC and integration testing working. Unfortunately the official Flatbuffers Rust version is not available yet and my recent attempts at contacting the author