I am interested in this idea and looking forward to collaboration.

Thanks,
Huang-Hsiang

> On Jun 2, 2025, at 10:14 AM, namratha mk <nmk...@gmail.com> wrote:
> 
> Hello,
> 
> I am interested in contributing to this effort. 
> 
> Thanks,
> Namratha
> 
> On Thu, May 29, 2025 at 1:36 PM Amogh Jahagirdar <2am...@gmail.com 
> <mailto:2am...@gmail.com>> wrote:
>> Thanks for kicking this thread off Ryan, I'm interested in helping out here! 
>> I've been working on a proposal in this area and it would be great to 
>> collaborate with different folks and exchange ideas here, since I think a 
>> lot of people are interested in solving this problem.
>> 
>> Thanks,
>> Amogh Jahagirdar
>> 
>> On Thu, May 29, 2025 at 2:25 PM Ryan Blue <rdb...@gmail.com 
>> <mailto:rdb...@gmail.com>> wrote:
>>> Hi everyone,
>>> 
>>> Like Russell’s recent note, I’m starting a thread to connect those of us 
>>> that are interested in the idea of changing Iceberg’s metadata in v4 so 
>>> that in most cases committing a change only requires writing one additional 
>>> metadata file.
>>> 
>>> Idea: One-file commits
>>> 
>>> The current Iceberg metadata structure requires writing at least one 
>>> manifest and a new manifest list to produce a new snapshot. The goal of 
>>> this work is to allow more flexibility by allowing the manifest list layer 
>>> to store data and delete files. As a result, only one file write would be 
>>> needed before committing the new snapshot. In addition, this work will also 
>>> try to explore:
>>> 
>>> Avoiding small manifests that must be read in parallel and later compacted 
>>> (metadata maintenance changes)
>>> Extend metadata skipping to use aggregated column ranges that are 
>>> compatible with geospatial data (manifest metadata)
>>> Using soft deletes to avoid rewriting existing manifests (metadata DVs)
>>> If you’re interested in these problems, please reply!
>>> 
>>> Ryan
>>> 

Reply via email to