Re: [DISCUSS] v4 - One file commits

Ryan Blue Fri, 30 May 2025 10:51:47 -0700

does it make sense to take metadata json file into consideration as well?
Currently it is just a large json string containing all snapshots. Since it
is also on the critical path of a commit, I’m not sure if we can explore
incremental semantics on it together with manifest list files to reduce the
commit overhead.


No, table-level metadata and the content metadata tree are separate
concepts. We’ve addressed table-level metadata and the snapshot list by
introducing change-based commits in the REST protocol. This work aims to
make updates to the content metadata tree more lightweight.

On Thu, May 29, 2025 at 6:37 PM Gang Wu <[email protected]> wrote:

> This is a long-awaited discussion!
>
> BTW, does it make sense to take metadata json file into consideration as
> well? Currently it is just a large json string containing all snapshots.
> Since it is also on the critical path of a commit, I'm not sure if we can
> explore incremental semantics on it together with manifest list files to
> reduce the commit overhead.
>
> Best,
> Gang
>
> On Fri, May 30, 2025 at 7:10 AM Steven Wu <[email protected]> wrote:
>
>> This will be great for users. metadata can self adapt. Start with a
>> compacted one file. As the table grows in size, the metadata can adapt to a
>> tree or linked structure.
>>
>> On Thu, May 29, 2025 at 3:44 PM Russell Spitzer <
>> [email protected]> wrote:
>>
>>> I’m also super excited about this idea
>>>
>>> On Thu, May 29, 2025 at 3:37 PM Amogh Jahagirdar <[email protected]>
>>> wrote:
>>>
>>>> Thanks for kicking this thread off Ryan, I'm interested in helping out
>>>> here! I've been working on a proposal in this area and it would be great to
>>>> collaborate with different folks and exchange ideas here, since I think a
>>>> lot of people are interested in solving this problem.
>>>>
>>>> Thanks,
>>>> Amogh Jahagirdar
>>>>
>>>> On Thu, May 29, 2025 at 2:25 PM Ryan Blue <[email protected]> wrote:
>>>>
>>>>> Hi everyone,
>>>>>
>>>>> Like Russell’s recent note, I’m starting a thread to connect those of
>>>>> us that are interested in the idea of changing Iceberg’s metadata in v4 so
>>>>> that in most cases committing a change only requires writing one 
>>>>> additional
>>>>> metadata file.
>>>>>
>>>>> *Idea: One-file commits*
>>>>>
>>>>> The current Iceberg metadata structure requires writing at least one
>>>>> manifest and a new manifest list to produce a new snapshot. The goal of
>>>>> this work is to allow more flexibility by allowing the manifest list layer
>>>>> to store data and delete files. As a result, only one file write would be
>>>>> needed before committing the new snapshot. In addition, this work will 
>>>>> also
>>>>> try to explore:
>>>>>
>>>>>    - Avoiding small manifests that must be read in parallel and later
>>>>>    compacted (metadata maintenance changes)
>>>>>    - Extend metadata skipping to use aggregated column ranges that
>>>>>    are compatible with geospatial data (manifest metadata)
>>>>>    - Using soft deletes to avoid rewriting existing manifests
>>>>>    (metadata DVs)
>>>>>
>>>>> If you’re interested in these problems, please reply!
>>>>>
>>>>> Ryan
>>>>>
>>>>

Re: [DISCUSS] v4 - One file commits

Reply via email to