This is a long-awaited discussion!

BTW, does it make sense to take metadata json file into consideration as
well? Currently it is just a large json string containing all snapshots.
Since it is also on the critical path of a commit, I'm not sure if we can
explore incremental semantics on it together with manifest list files to
reduce the commit overhead.

Best,
Gang

On Fri, May 30, 2025 at 7:10 AM Steven Wu <stevenz...@gmail.com> wrote:

> This will be great for users. metadata can self adapt. Start with a
> compacted one file. As the table grows in size, the metadata can adapt to a
> tree or linked structure.
>
> On Thu, May 29, 2025 at 3:44 PM Russell Spitzer <russell.spit...@gmail.com>
> wrote:
>
>> I’m also super excited about this idea
>>
>> On Thu, May 29, 2025 at 3:37 PM Amogh Jahagirdar <2am...@gmail.com>
>> wrote:
>>
>>> Thanks for kicking this thread off Ryan, I'm interested in helping out
>>> here! I've been working on a proposal in this area and it would be great to
>>> collaborate with different folks and exchange ideas here, since I think a
>>> lot of people are interested in solving this problem.
>>>
>>> Thanks,
>>> Amogh Jahagirdar
>>>
>>> On Thu, May 29, 2025 at 2:25 PM Ryan Blue <rdb...@gmail.com> wrote:
>>>
>>>> Hi everyone,
>>>>
>>>> Like Russell’s recent note, I’m starting a thread to connect those of
>>>> us that are interested in the idea of changing Iceberg’s metadata in v4 so
>>>> that in most cases committing a change only requires writing one additional
>>>> metadata file.
>>>>
>>>> *Idea: One-file commits*
>>>>
>>>> The current Iceberg metadata structure requires writing at least one
>>>> manifest and a new manifest list to produce a new snapshot. The goal of
>>>> this work is to allow more flexibility by allowing the manifest list layer
>>>> to store data and delete files. As a result, only one file write would be
>>>> needed before committing the new snapshot. In addition, this work will also
>>>> try to explore:
>>>>
>>>>    - Avoiding small manifests that must be read in parallel and later
>>>>    compacted (metadata maintenance changes)
>>>>    - Extend metadata skipping to use aggregated column ranges that are
>>>>    compatible with geospatial data (manifest metadata)
>>>>    - Using soft deletes to avoid rewriting existing manifests
>>>>    (metadata DVs)
>>>>
>>>> If you’re interested in these problems, please reply!
>>>>
>>>> Ryan
>>>>
>>>

Reply via email to