Hi Y'all

As discussed in the last community sync, we are beginning to gather up
folks who are interested in various efforts for Iceberg V4. To that end,
I'd like to use this thread as a gathering point for folks interested in
the metadata file format shift to Parquet. I wrote a quick abstract to
describe the purpose of this group.

Following this I'll be working on a full design document or if someone has
one in prod please let us know and we can start discussing/working on
it there.

*Abstract: Parquet as Metadata File Format*

Currently the Iceberg SDK and Spec use Avro file format files for all
Manifest Lists and Manifests. The row oriented format was selected
because it was assumed that most metadata would be read in its entirety.
This has turned out to seldom be the case and the ability to read
single elements of the metrics would be very useful for query planning. To
address this we propose switching the underlying manifest format
from Avro to Parquet. In V4, Avro files would still be readable but all new
metadata files would be written in Parquet instead of Avro.

Reply via email to