Look forward to when Iceberg can move on a bit from its name, to handle slightly faster data. Interested as well to follow along, if I can !
Do we plan to store this files in columnar format as well? > Is that the other thread? https://lists.apache.org/thread/phdo75zmt8j9r44ngd7vdhtxqq63yxsp Thanks Szehon Thanks Szehon On Thu, May 29, 2025 at 10:42 PM Péter Váry <peter.vary.apa...@gmail.com> wrote: > Count me in! > Do we plan to store this files in columnar format as well? > > On Fri, May 30, 2025, 04:00 Prashant Singh <prashant010...@gmail.com> > wrote: > >> I am also super excited about the idea ! I would love to contribute. >> >> On Thu, May 29, 2025 at 6:54 PM Yufei Gu <flyrain...@gmail.com> wrote: >> >>> BTW, does it make sense to take metadata json file into consideration as >>>> well? Currently it is just a large json string containing all snapshots. >>>> Since it is also on the critical path of a commit, I'm not sure if we can >>>> explore incremental semantics on it together with manifest list files to >>>> reduce the commit overhead. >>> >>> >>> For metadata.json file, the REST APIs have provided an incremental style >>> update already via a variety of table update requests. The community is >>> also working on the lift of a mandatory physical metadata.json file in the >>> storage, in which case, the REST catalog doesn't have to deal with file IO >>> anymore. Metadata.json could live within a key-value, RDMS or even just in >>> memory. >>> >>> Yufei >>> >>> >>> On Thu, May 29, 2025 at 6:35 PM Gang Wu <ust...@gmail.com> wrote: >>> >>>> This is a long-awaited discussion! >>>> >>>> BTW, does it make sense to take metadata json file into consideration >>>> as well? Currently it is just a large json string containing all snapshots. >>>> Since it is also on the critical path of a commit, I'm not sure if we can >>>> explore incremental semantics on it together with manifest list files to >>>> reduce the commit overhead. >>>> >>>> Best, >>>> Gang >>>> >>>> On Fri, May 30, 2025 at 7:10 AM Steven Wu <stevenz...@gmail.com> wrote: >>>> >>>>> This will be great for users. metadata can self adapt. Start with a >>>>> compacted one file. As the table grows in size, the metadata can adapt to >>>>> a >>>>> tree or linked structure. >>>>> >>>>> On Thu, May 29, 2025 at 3:44 PM Russell Spitzer < >>>>> russell.spit...@gmail.com> wrote: >>>>> >>>>>> I’m also super excited about this idea >>>>>> >>>>>> On Thu, May 29, 2025 at 3:37 PM Amogh Jahagirdar <2am...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> Thanks for kicking this thread off Ryan, I'm interested in helping >>>>>>> out here! I've been working on a proposal in this area and it would be >>>>>>> great to collaborate with different folks and exchange ideas here, >>>>>>> since I >>>>>>> think a lot of people are interested in solving this problem. >>>>>>> >>>>>>> Thanks, >>>>>>> Amogh Jahagirdar >>>>>>> >>>>>>> On Thu, May 29, 2025 at 2:25 PM Ryan Blue <rdb...@gmail.com> wrote: >>>>>>> >>>>>>>> Hi everyone, >>>>>>>> >>>>>>>> Like Russell’s recent note, I’m starting a thread to connect those >>>>>>>> of us that are interested in the idea of changing Iceberg’s metadata >>>>>>>> in v4 >>>>>>>> so that in most cases committing a change only requires writing one >>>>>>>> additional metadata file. >>>>>>>> >>>>>>>> *Idea: One-file commits* >>>>>>>> >>>>>>>> The current Iceberg metadata structure requires writing at least >>>>>>>> one manifest and a new manifest list to produce a new snapshot. The >>>>>>>> goal of >>>>>>>> this work is to allow more flexibility by allowing the manifest list >>>>>>>> layer >>>>>>>> to store data and delete files. As a result, only one file write would >>>>>>>> be >>>>>>>> needed before committing the new snapshot. In addition, this work will >>>>>>>> also >>>>>>>> try to explore: >>>>>>>> >>>>>>>> - Avoiding small manifests that must be read in parallel and >>>>>>>> later compacted (metadata maintenance changes) >>>>>>>> - Extend metadata skipping to use aggregated column ranges that >>>>>>>> are compatible with geospatial data (manifest metadata) >>>>>>>> - Using soft deletes to avoid rewriting existing manifests >>>>>>>> (metadata DVs) >>>>>>>> >>>>>>>> If you’re interested in these problems, please reply! >>>>>>>> >>>>>>>> Ryan >>>>>>>> >>>>>>>