Look forward to when Iceberg can move on a bit from its name, to handle
slightly faster data.  Interested as well to follow along, if I can !

Do we plan to store this files in columnar format as well?
>
Is that the other thread?
https://lists.apache.org/thread/phdo75zmt8j9r44ngd7vdhtxqq63yxsp

Thanks
Szehon

Thanks
Szehon

On Thu, May 29, 2025 at 10:42 PM Péter Váry <peter.vary.apa...@gmail.com>
wrote:

> Count me in!
> Do we plan to store this files in columnar format as well?
>
> On Fri, May 30, 2025, 04:00 Prashant Singh <prashant010...@gmail.com>
> wrote:
>
>> I am also super excited about the idea ! I would love to contribute.
>>
>> On Thu, May 29, 2025 at 6:54 PM Yufei Gu <flyrain...@gmail.com> wrote:
>>
>>> BTW, does it make sense to take metadata json file into consideration as
>>>> well? Currently it is just a large json string containing all snapshots.
>>>> Since it is also on the critical path of a commit, I'm not sure if we can
>>>> explore incremental semantics on it together with manifest list files to
>>>> reduce the commit overhead.
>>>
>>>
>>> For metadata.json file, the REST APIs have provided an incremental style
>>> update already via a variety of table update requests. The community is
>>> also working on the lift of a mandatory physical metadata.json file in the
>>> storage, in which case, the REST catalog doesn't have to deal with file IO
>>> anymore. Metadata.json could live within a key-value, RDMS or even just in
>>> memory.
>>>
>>> Yufei
>>>
>>>
>>> On Thu, May 29, 2025 at 6:35 PM Gang Wu <ust...@gmail.com> wrote:
>>>
>>>> This is a long-awaited discussion!
>>>>
>>>> BTW, does it make sense to take metadata json file into consideration
>>>> as well? Currently it is just a large json string containing all snapshots.
>>>> Since it is also on the critical path of a commit, I'm not sure if we can
>>>> explore incremental semantics on it together with manifest list files to
>>>> reduce the commit overhead.
>>>>
>>>> Best,
>>>> Gang
>>>>
>>>> On Fri, May 30, 2025 at 7:10 AM Steven Wu <stevenz...@gmail.com> wrote:
>>>>
>>>>> This will be great for users. metadata can self adapt. Start with a
>>>>> compacted one file. As the table grows in size, the metadata can adapt to 
>>>>> a
>>>>> tree or linked structure.
>>>>>
>>>>> On Thu, May 29, 2025 at 3:44 PM Russell Spitzer <
>>>>> russell.spit...@gmail.com> wrote:
>>>>>
>>>>>> I’m also super excited about this idea
>>>>>>
>>>>>> On Thu, May 29, 2025 at 3:37 PM Amogh Jahagirdar <2am...@gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>>> Thanks for kicking this thread off Ryan, I'm interested in helping
>>>>>>> out here! I've been working on a proposal in this area and it would be
>>>>>>> great to collaborate with different folks and exchange ideas here, 
>>>>>>> since I
>>>>>>> think a lot of people are interested in solving this problem.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Amogh Jahagirdar
>>>>>>>
>>>>>>> On Thu, May 29, 2025 at 2:25 PM Ryan Blue <rdb...@gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi everyone,
>>>>>>>>
>>>>>>>> Like Russell’s recent note, I’m starting a thread to connect those
>>>>>>>> of us that are interested in the idea of changing Iceberg’s metadata 
>>>>>>>> in v4
>>>>>>>> so that in most cases committing a change only requires writing one
>>>>>>>> additional metadata file.
>>>>>>>>
>>>>>>>> *Idea: One-file commits*
>>>>>>>>
>>>>>>>> The current Iceberg metadata structure requires writing at least
>>>>>>>> one manifest and a new manifest list to produce a new snapshot. The 
>>>>>>>> goal of
>>>>>>>> this work is to allow more flexibility by allowing the manifest list 
>>>>>>>> layer
>>>>>>>> to store data and delete files. As a result, only one file write would 
>>>>>>>> be
>>>>>>>> needed before committing the new snapshot. In addition, this work will 
>>>>>>>> also
>>>>>>>> try to explore:
>>>>>>>>
>>>>>>>>    - Avoiding small manifests that must be read in parallel and
>>>>>>>>    later compacted (metadata maintenance changes)
>>>>>>>>    - Extend metadata skipping to use aggregated column ranges that
>>>>>>>>    are compatible with geospatial data (manifest metadata)
>>>>>>>>    - Using soft deletes to avoid rewriting existing manifests
>>>>>>>>    (metadata DVs)
>>>>>>>>
>>>>>>>> If you’re interested in these problems, please reply!
>>>>>>>>
>>>>>>>> Ryan
>>>>>>>>
>>>>>>>

Reply via email to