Re: A couple of questions about pyarrow.parquet

2019-05-23 Thread Uwe L. Korn
Hello Ted, regarding predicate pushdown in Python, have a look at my unfinished PR at https://github.com/apache/arrow/pull/2623. This was stopped since we were missing native filter in Arrow. The requirements for that have now been implemented and we could probably reactivate the PR. Uwe On S

Re: A couple of questions about pyarrow.parquet

2019-05-17 Thread Ted Gooch
Thanks Micah and Wes. Definitely interested in the *Predicate Pushdown* and *Schema inference, schema-on-read, and schema normalization *sections. On Fri, May 17, 2019 at 12:47 PM Wes McKinney wrote: > Please see also > > > https://docs.google.com/document/d/1bVhzifD38qDypnSjtf8exvpP3sSB5x_Kw9m

Re: A couple of questions about pyarrow.parquet

2019-05-17 Thread Wes McKinney
Please see also https://docs.google.com/document/d/1bVhzifD38qDypnSjtf8exvpP3sSB5x_Kw9m-n66FB2c/edit?usp=drivesdk And prior mailing list discussion. I will comment in more detail on the other items later On Fri, May 17, 2019, 2:44 PM Micah Kornfield wrote: > I can't help on the first question.

Re: A couple of questions about pyarrow.parquet

2019-05-17 Thread Micah Kornfield
I can't help on the first question. Regarding push-down predicates, there is an open JIRA [1] to do just that [1] https://issues.apache.org/jira/browse/PARQUET-473