Re: Future of Iceberg Parquet Reader

2019-05-29 Thread Ryan Blue
Am I correct that performance is the main reason to have a custom reader in Iceberg? Are there any other purposes? A common question I get is why not improve parquet-mr instead of writing a new reader? I know that almost every system that cares about performance has its own reader, but why so? Per

Re: Future of Iceberg Parquet Reader

2019-05-28 Thread Wes McKinney
On Tue, May 28, 2019 at 11:19 AM Daniel Weeks wrote: > > Hey Anton, > > #1) Part of the reason Iceberg has a custom reader is to help resolve some of > the Iceberg specific aspects of how parquet files are read (e.g. column > resolution by id, iceberg expressions). Also, it's been a struggle to

Re: Future of Iceberg Parquet Reader

2019-05-28 Thread Daniel Weeks
Hey Anton, #1) Part of the reason Iceberg has a custom reader is to help resolve some of the Iceberg specific aspects of how parquet files are read (e.g. column resolution by id, iceberg expressions). Also, it's been a struggle to get agreement on a good vectorized api. I don't believe the objec

Re: Future of Iceberg Parquet Reader

2019-05-28 Thread Wes McKinney
hi Anton, On point #5, I would suggest doing the work either in Apache Arrow or in the Parquet Java project -- we are developing both Parquet C++ and Rust codebases within the apache/arrow repository so I think you would find an active community there. I know that there has been a lot of interest

Future of Iceberg Parquet Reader

2019-05-28 Thread Anton Okolnychyi
Hi, I see more and more questions around Iceberg Parquet reader. I think it would be useful to have a thread that clarifies all open questions and explains the long-term plan. 1. Am I correct that performance is the main reason to have a custom reader in Iceberg? Are there any other purposes?