Reading data from Iceberg table into Apache Arrow in Java

Mayur Srivastava Thu, 11 Feb 2021 07:22:52 -0800

Hi,

We have an existing time series data access service based on Arrow/Flight which
uses Apache Arrow format data to perform writes and reads (using time range
queries) from a bespoke table-backend based on a S3 compatible storage.

We are trying to replace our bespoke table-backend with Iceberg tables. For
integrating with Iceberg, we are using Iceberg core+data+parquet modules
directly to write and read data. I would like to note that our service cannot
use the Spark route to write or read the data. In our current Iceberg reader
integration code, we are using
IcebergGenerics.read(table).select(...).where(...).build() to iterate through
the data row-by-row. Instead of this (potentially slower) read path which needs
conversion between rows and Arrow VectorSchemaRoot, we want to use a vectorized
read path which directly returns an Arrow VectorSchemaRoot as a callback or
Arrow record batches as the result set.

I have noticed that Iceberg already has an Arrow module
https://github.com/apache/iceberg/tree/master/arrow/src/main/java/org/apache/iceberg/arrow.
I have also looked into https://github.com/apache/iceberg/issues/9 and
https://github.com/apache/iceberg/milestone/2. But, I'm not sure about the
current status of the vectorized reader support. I'm also not sure how this
Arrow module is being used to perform a vectorized read to execute a query on
an Iceberg table in the core/data/parquet library.

I have a few questions regarding the Vectorized reader/Arrow support:

1. Is it possible to run a vectorized read on an Iceberg table to return
data in Arrow format using a non-Spark reader in Java?

2. Is there an example of reading data in Arrow format from an Iceberg
table?

3. Is the Spark read path completely vectorized? I ask this question to
find out if we can borrow from the vectorized Spark reader or we can move code
from vectorized Spark reader to the Iceberg core library.

Let me know if you have any questions for me.

Thanks,

Mayur

Reading data from Iceberg table into Apache Arrow in Java

Reply via email to