The parquet c++ implementation has all the facilities to expose the required information to implement predicate pushdown. The experimental Dataset API does make use of this with parquet. See [1] for an example of the API. Or a real-life usage with the nyc-tlc taxi dataset [2]. The relevant implementation that takes care of pushdown predicate is found in [3].
[1] https://github.com/apache/arrow/blob/master/cpp/src/arrow/dataset/dataset_test.cc#L289-L409 [2] https://github.com/apache/arrow/blob/master/cpp/examples/arrow/dataset-parquet-scan-example.cc [3] https://github.com/apache/arrow/blob/master/cpp/src/arrow/dataset/file_parquet.cc On Fri, Nov 15, 2019 at 1:08 AM Micah Kornfield <emkornfi...@gmail.com> wrote: > > #1 if there isn't a JIRA I would guess no-one is working on it (Note I > would expect at least the initial work to be in aParquet JIRA item, and > this is probably a discussion for that mailing list). > #2. There are some open PR to expose the parquet reader through JNI to java > [1] > #3. Its possible Dremio has some code that does this. I'm not sure what > the current status of predicate pushdown in the C++ code base is. > > > [1] https://github.com/apache/arrow/pull/5719 > > > On Wed, Nov 13, 2019 at 6:05 PM Chang Chen <baibaic...@gmail.com> wrote: > > > Hi > > > > I am trying to find doc about current parquet-cpp current status. i > > googled it, but i didn't find any useful information. > > > > here are what i concerned about: > > #1 column indexes (https://issues.apache.org/jira/browse/PARQUET-1201), > > the corresponding java implementation already supported it last year, > > though it wasn't pushed to repo. > > #2 A vectorized column reader interface which can be integrated in JAVA. > > #3 the feature was illustrated here( > > https://www.dremio.com/webinars/columnar-roadmap-apache-parquet-and-arrow/ > > ), > > a better predict push down algorithm. > > > > Thanks > >