Iceberg reading Parquet files to Arrow format

Mike Zhang Fri, 04 Feb 2022 21:01:45 -0800

I am reading the Iceberg code regarding the Parquet reading path and see
the Parquet files are red to Arrow format first. I wonder how much
performance gain we could have by doing that. Let’s take the example of the
Spark application with Iceberg. If the Parquet file is red directly to
Spark RDD records, shouldn’t it be faster than Parquet->Arrow->Spark
Record? Since Iceberg is converting to Arrow first today, there must be
some benefits of that. So I feel I miss something. Can somebody help to
explain?

Iceberg reading Parquet files to Arrow format

Reply via email to