Hi everyone,

Like Russell’s recent note, I’m starting a thread to connect those of us
that are interested in the idea of changing Iceberg’s metadata in v4 so
that in most cases committing a change only requires writing one additional
metadata file.

*Idea: One-file commits*

The current Iceberg metadata structure requires writing at least one
manifest and a new manifest list to produce a new snapshot. The goal of
this work is to allow more flexibility by allowing the manifest list layer
to store data and delete files. As a result, only one file write would be
needed before committing the new snapshot. In addition, this work will also
try to explore:

   - Avoiding small manifests that must be read in parallel and later
   compacted (metadata maintenance changes)
   - Extend metadata skipping to use aggregated column ranges that are
   compatible with geospatial data (manifest metadata)
   - Using soft deletes to avoid rewriting existing manifests (metadata DVs)

If you’re interested in these problems, please reply!

Ryan

Reply via email to