Thanks for kicking this thread off Ryan, I'm interested in helping out here! I've been working on a proposal in this area and it would be great to collaborate with different folks and exchange ideas here, since I think a lot of people are interested in solving this problem.
Thanks, Amogh Jahagirdar On Thu, May 29, 2025 at 2:25 PM Ryan Blue <rdb...@gmail.com> wrote: > Hi everyone, > > Like Russell’s recent note, I’m starting a thread to connect those of us > that are interested in the idea of changing Iceberg’s metadata in v4 so > that in most cases committing a change only requires writing one additional > metadata file. > > *Idea: One-file commits* > > The current Iceberg metadata structure requires writing at least one > manifest and a new manifest list to produce a new snapshot. The goal of > this work is to allow more flexibility by allowing the manifest list layer > to store data and delete files. As a result, only one file write would be > needed before committing the new snapshot. In addition, this work will also > try to explore: > > - Avoiding small manifests that must be read in parallel and later > compacted (metadata maintenance changes) > - Extend metadata skipping to use aggregated column ranges that are > compatible with geospatial data (manifest metadata) > - Using soft deletes to avoid rewriting existing manifests (metadata > DVs) > > If you’re interested in these problems, please reply! > > Ryan >