I am also super excited about the idea ! I would love to contribute.

On Thu, May 29, 2025 at 6:54 PM Yufei Gu <flyrain...@gmail.com> wrote:

> BTW, does it make sense to take metadata json file into consideration as
>> well? Currently it is just a large json string containing all snapshots.
>> Since it is also on the critical path of a commit, I'm not sure if we can
>> explore incremental semantics on it together with manifest list files to
>> reduce the commit overhead.
>
>
> For metadata.json file, the REST APIs have provided an incremental style
> update already via a variety of table update requests. The community is
> also working on the lift of a mandatory physical metadata.json file in the
> storage, in which case, the REST catalog doesn't have to deal with file IO
> anymore. Metadata.json could live within a key-value, RDMS or even just in
> memory.
>
> Yufei
>
>
> On Thu, May 29, 2025 at 6:35 PM Gang Wu <ust...@gmail.com> wrote:
>
>> This is a long-awaited discussion!
>>
>> BTW, does it make sense to take metadata json file into consideration as
>> well? Currently it is just a large json string containing all snapshots.
>> Since it is also on the critical path of a commit, I'm not sure if we can
>> explore incremental semantics on it together with manifest list files to
>> reduce the commit overhead.
>>
>> Best,
>> Gang
>>
>> On Fri, May 30, 2025 at 7:10 AM Steven Wu <stevenz...@gmail.com> wrote:
>>
>>> This will be great for users. metadata can self adapt. Start with a
>>> compacted one file. As the table grows in size, the metadata can adapt to a
>>> tree or linked structure.
>>>
>>> On Thu, May 29, 2025 at 3:44 PM Russell Spitzer <
>>> russell.spit...@gmail.com> wrote:
>>>
>>>> I’m also super excited about this idea
>>>>
>>>> On Thu, May 29, 2025 at 3:37 PM Amogh Jahagirdar <2am...@gmail.com>
>>>> wrote:
>>>>
>>>>> Thanks for kicking this thread off Ryan, I'm interested in helping out
>>>>> here! I've been working on a proposal in this area and it would be great 
>>>>> to
>>>>> collaborate with different folks and exchange ideas here, since I think a
>>>>> lot of people are interested in solving this problem.
>>>>>
>>>>> Thanks,
>>>>> Amogh Jahagirdar
>>>>>
>>>>> On Thu, May 29, 2025 at 2:25 PM Ryan Blue <rdb...@gmail.com> wrote:
>>>>>
>>>>>> Hi everyone,
>>>>>>
>>>>>> Like Russell’s recent note, I’m starting a thread to connect those of
>>>>>> us that are interested in the idea of changing Iceberg’s metadata in v4 
>>>>>> so
>>>>>> that in most cases committing a change only requires writing one 
>>>>>> additional
>>>>>> metadata file.
>>>>>>
>>>>>> *Idea: One-file commits*
>>>>>>
>>>>>> The current Iceberg metadata structure requires writing at least one
>>>>>> manifest and a new manifest list to produce a new snapshot. The goal of
>>>>>> this work is to allow more flexibility by allowing the manifest list 
>>>>>> layer
>>>>>> to store data and delete files. As a result, only one file write would be
>>>>>> needed before committing the new snapshot. In addition, this work will 
>>>>>> also
>>>>>> try to explore:
>>>>>>
>>>>>>    - Avoiding small manifests that must be read in parallel and
>>>>>>    later compacted (metadata maintenance changes)
>>>>>>    - Extend metadata skipping to use aggregated column ranges that
>>>>>>    are compatible with geospatial data (manifest metadata)
>>>>>>    - Using soft deletes to avoid rewriting existing manifests
>>>>>>    (metadata DVs)
>>>>>>
>>>>>> If you’re interested in these problems, please reply!
>>>>>>
>>>>>> Ryan
>>>>>>
>>>>>

Reply via email to