Am I correct that performance is the main reason to have a custom reader in
Iceberg? Are there any other purposes? A common question I get is why not
improve parquet-mr instead of writing a new reader? I know that almost
every system that cares about performance has its own reader, but why so?
Per
On Tue, May 28, 2019 at 11:19 AM Daniel Weeks
wrote:
>
> Hey Anton,
>
> #1) Part of the reason Iceberg has a custom reader is to help resolve some of
> the Iceberg specific aspects of how parquet files are read (e.g. column
> resolution by id, iceberg expressions). Also, it's been a struggle to
Hey Anton,
#1) Part of the reason Iceberg has a custom reader is to help resolve some
of the Iceberg specific aspects of how parquet files are read (e.g. column
resolution by id, iceberg expressions). Also, it's been a struggle to get
agreement on a good vectorized api. I don't believe the objec
hi Anton,
On point #5, I would suggest doing the work either in Apache Arrow or
in the Parquet Java project -- we are developing both Parquet C++ and
Rust codebases within the apache/arrow repository so I think you would
find an active community there. I know that there has been a lot of
interest
Hi,
I see more and more questions around Iceberg Parquet reader. I think it would
be useful to have a thread that clarifies all open questions and explains the
long-term plan.
1. Am I correct that performance is the main reason to have a custom reader in
Iceberg? Are there any other purposes?