Re: [DISCUSS] v4 - One file commits

2025-07-22 Thread Russell Spitzer
I think this is a great way forward, starting out with this much parallel development shows that we have a lot of consensus already :) On Tue, Jul 22, 2025 at 12:42 PM Amogh Jahagirdar <2am...@gmail.com> wrote: > Hey folks, just following up on this. It looks like our proposal and the > proposal

Re: [DISCUSS] v4 - One file commits

2025-07-22 Thread Amogh Jahagirdar
Hey folks, just following up on this. It looks like our proposal and the proposal that @Russell Spitzer shared are pretty aligned. I was just chatting with Russell about this, and we think it'd be best to combine both proposals and have a singular large effort on this. I can also set up a focused

Re: [DISCUSS] v4 - One file commits

2025-07-14 Thread Amogh Jahagirdar
Hey Russell, Thanks for sharing the proposal! A few of us (Ryan, Dan, Anoop and I) have also been working on a proposal for an adaptive metadata tree structure as part of enabling more efficient one file commits. From a read of the summary, it's great to see that we're thinking along the same line

Re: [DISCUSS] v4 - One file commits

2025-07-14 Thread Russell Spitzer
Hey y'all! We (Yi Fang, Steven Wu and Myself) wanted to share some of the thoughts we had on how one-file commits could work in Iceberg. This is pretty much just a high level overview of the concepts we think we need and how Iceberg would behave. We haven't gone very far into the actual implementa

Re: [DISCUSS] v4 - One file commits

2025-07-02 Thread John Zhuge
Very excited about the idea! On Wed, Jul 2, 2025 at 1:17 PM Anoop Johnson wrote: > I'm very interested in this initiative. Micah Kornfield and I presented > on > high-throughput ingestion for Iceberg tables at the 2024 Iceberg Summit, > w

Re: [DISCUSS] v4 - One file commits

2025-07-02 Thread Anoop Johnson
I'm very interested in this initiative. Micah Kornfield and I presented on high-throughput ingestion for Iceberg tables at the 2024 Iceberg Summit, which leveraged Google infrastructure like Colossus for efficient appends. This new proposal

Re: [DISCUSS] v4 - One file commits

2025-06-13 Thread Jagdeep Sidhu
Hi everyone, I am new to the Iceberg community but would love to participate in these discussions to reduce the number of file writes, especially for small writes/commits. Thank you! -Jagdeep On Thu, Jun 5, 2025 at 4:02 PM Anurag Mantripragada wrote: > We have been hitting all the metadata pro

Re: [DISCUSS] v4 - One file commits

2025-06-05 Thread Anurag Mantripragada
We have been hitting all the metadata problems you mentioned, Ryan. I’m on-board to help however I can to improve this area. ~ Anurag Mantripragada > On Jun 3, 2025, at 2:22 AM, Huang-Hsiang Cheng > wrote: > > I am interested in this idea and looking forward to collaboration. > > Thanks, >

Re: [DISCUSS] v4 - One file commits

2025-06-02 Thread Huang-Hsiang Cheng
I am interested in this idea and looking forward to collaboration. Thanks, Huang-Hsiang > On Jun 2, 2025, at 10:14 AM, namratha mk wrote: > > Hello, > > I am interested in contributing to this effort. > > Thanks, > Namratha > > On Thu, May 29, 2025 at 1:36 PM Amogh Jahagirdar <2am...@gmail.

Re: [DISCUSS] v4 - One file commits

2025-06-02 Thread namratha mk
Hello, I am interested in contributing to this effort. Thanks, Namratha On Thu, May 29, 2025 at 1:36 PM Amogh Jahagirdar <2am...@gmail.com> wrote: > Thanks for kicking this thread off Ryan, I'm interested in helping out > here! I've been working on a proposal in this area and it would be great

Re: [DISCUSS] v4 - One file commits

2025-06-02 Thread Steve Loughran
so this'll cut down on #of manifest files read, won't it? so improving query planning Does anyone have an estimate of what benefit this is likely to have in production deployments? On Thu, 29 May 2025 at 21:25, Ryan Blue wrote: > Hi everyone, > > Like Russell’s recent note, I’m starting a threa

Re: [DISCUSS] v4 - One file commits

2025-06-01 Thread Anas-ur-Rasheed Khan
Hey guys, I am an aspiring committer and am super interested in this change. I use Iceberg at reasonably high levels of concurrency in my pipelines, and would love to just be a fly on the wall in discussions (hopefully more). Best, AK On Fri, 30 May 2025 at 6:51 PM, Ryan Blue wrote: > does it

Re: [DISCUSS] v4 - One file commits

2025-05-30 Thread Ryan Blue
does it make sense to take metadata json file into consideration as well? Currently it is just a large json string containing all snapshots. Since it is also on the critical path of a commit, I’m not sure if we can explore incremental semantics on it together with manifest list files to reduce the

Re: [DISCUSS] v4 - One file commits

2025-05-29 Thread Szehon Ho
Look forward to when Iceberg can move on a bit from its name, to handle slightly faster data. Interested as well to follow along, if I can ! Do we plan to store this files in columnar format as well? > Is that the other thread? https://lists.apache.org/thread/phdo75zmt8j9r44ngd7vdhtxqq63yxsp Tha

Re: [DISCUSS] v4 - One file commits

2025-05-29 Thread Péter Váry
Count me in! Do we plan to store this files in columnar format as well? On Fri, May 30, 2025, 04:00 Prashant Singh wrote: > I am also super excited about the idea ! I would love to contribute. > > On Thu, May 29, 2025 at 6:54 PM Yufei Gu wrote: > >> BTW, does it make sense to take metadata json

Re: [DISCUSS] v4 - One file commits

2025-05-29 Thread Yufei Gu
> > BTW, does it make sense to take metadata json file into consideration as > well? Currently it is just a large json string containing all snapshots. > Since it is also on the critical path of a commit, I'm not sure if we can > explore incremental semantics on it together with manifest list files

Re: [DISCUSS] v4 - One file commits

2025-05-29 Thread Prashant Singh
I am also super excited about the idea ! I would love to contribute. On Thu, May 29, 2025 at 6:54 PM Yufei Gu wrote: > BTW, does it make sense to take metadata json file into consideration as >> well? Currently it is just a large json string containing all snapshots. >> Since it is also on the c

Re: [DISCUSS] v4 - One file commits

2025-05-29 Thread Ajantha Bhat
I am interested in these problems too. Looking forward to collaborating on this feature development. - Ajantha On Fri, May 30, 2025 at 7:07 AM Gang Wu wrote: > This is a long-awaited discussion! > > BTW, does it make sense to take metadata json file into consideration as > well? Currently it is

Re: [DISCUSS] v4 - One file commits

2025-05-29 Thread Gang Wu
This is a long-awaited discussion! BTW, does it make sense to take metadata json file into consideration as well? Currently it is just a large json string containing all snapshots. Since it is also on the critical path of a commit, I'm not sure if we can explore incremental semantics on it togethe

Re: [DISCUSS] v4 - One file commits

2025-05-29 Thread Steven Wu
This will be great for users. metadata can self adapt. Start with a compacted one file. As the table grows in size, the metadata can adapt to a tree or linked structure. On Thu, May 29, 2025 at 3:44 PM Russell Spitzer wrote: > I’m also super excited about this idea > > On Thu, May 29, 2025 at 3:

Re: [DISCUSS] v4 - One file commits

2025-05-29 Thread Russell Spitzer
I’m also super excited about this idea On Thu, May 29, 2025 at 3:37 PM Amogh Jahagirdar <2am...@gmail.com> wrote: > Thanks for kicking this thread off Ryan, I'm interested in helping out > here! I've been working on a proposal in this area and it would be great to > collaborate with different fol

Re: [DISCUSS] v4 - One file commits

2025-05-29 Thread Amogh Jahagirdar
Thanks for kicking this thread off Ryan, I'm interested in helping out here! I've been working on a proposal in this area and it would be great to collaborate with different folks and exchange ideas here, since I think a lot of people are interested in solving this problem. Thanks, Amogh Jahagirda

[DISCUSS] v4 - One file commits

2025-05-29 Thread Ryan Blue
Hi everyone, Like Russell’s recent note, I’m starting a thread to connect those of us that are interested in the idea of changing Iceberg’s metadata in v4 so that in most cases committing a change only requires writing one additional metadata file. *Idea: One-file commits* The current Iceberg me