Hello, I am interested in contributing to this effort.
Thanks, Namratha On Thu, May 29, 2025 at 1:36 PM Amogh Jahagirdar <2am...@gmail.com> wrote: > Thanks for kicking this thread off Ryan, I'm interested in helping out > here! I've been working on a proposal in this area and it would be great to > collaborate with different folks and exchange ideas here, since I think a > lot of people are interested in solving this problem. > > Thanks, > Amogh Jahagirdar > > On Thu, May 29, 2025 at 2:25 PM Ryan Blue <rdb...@gmail.com> wrote: > >> Hi everyone, >> >> Like Russell’s recent note, I’m starting a thread to connect those of us >> that are interested in the idea of changing Iceberg’s metadata in v4 so >> that in most cases committing a change only requires writing one additional >> metadata file. >> >> *Idea: One-file commits* >> >> The current Iceberg metadata structure requires writing at least one >> manifest and a new manifest list to produce a new snapshot. The goal of >> this work is to allow more flexibility by allowing the manifest list layer >> to store data and delete files. As a result, only one file write would be >> needed before committing the new snapshot. In addition, this work will also >> try to explore: >> >> - Avoiding small manifests that must be read in parallel and later >> compacted (metadata maintenance changes) >> - Extend metadata skipping to use aggregated column ranges that are >> compatible with geospatial data (manifest metadata) >> - Using soft deletes to avoid rewriting existing manifests (metadata >> DVs) >> >> If you’re interested in these problems, please reply! >> >> Ryan >> >