Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-07 Thread Manu Zhang
Hi Amogh, Is it defined in the table spec that "replace" operation should carry over existing lineage info insteading of assigning new IDs? If not, we'd better firstly define it in spec because all engines and implementations need to follow it. On Tue, Jul 8, 2025 at 11:44 AM Amogh Jahagirdar <2a

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-07 Thread Amogh Jahagirdar
One other area I think we need to make sure works with row lineage before release is data file compaction. At the moment, it looks like compaction wil

Re: [DISCUSS] v4 - Improved column statistics

2025-07-07 Thread Jacky Lee
+1 for the wonderful feature. Please count me in if you need any help. Gábor Kaszab 于2025年7月7日周一 21:22写道: > > +1 Seems a great improvement! Let me know if I can help out with > implementation, measurements, etc.! > > Regards, > Gabor Kaszab > > John Zhuge ezt írta (időpont: 2025. jún. 5., Cs, 2

Re: [DISCUSS] Proposal for Relative Path Support In Table Spec

2025-07-07 Thread ally heev
Thanks. I can see it now On Tue, Jul 8, 2025 at 12:37 AM Kevin Liu wrote: > > I can see the new event on the dev calendar. > [image: Screenshot 2025-07-07 at 12.04.08 PM.png] > > Subscribe to the "Iceberg Dev Events" calendar here: > https://iceberg.apache.org/community/#iceberg-community-events

Re: [announce] iceberg dev syncs are centralized in the "Iceberg Dev Events" calendar

2025-07-07 Thread Renjie Liu
Thanks Kevin for driving this, rust dev sync has been transferred to dev events. On Thu, Jul 3, 2025 at 1:16 AM Kevin Liu wrote: > Hey everyone, > > As discussed on today's sync, Dan and I helped move all the relevant > Iceberg dev syncs to the "*Iceberg Dev Events*" calendar, which is > located

Re: [DISCUSS] Replace table transaction in REST Catalog

2025-07-07 Thread Ryan Blue
I agree that there should not be constraints on the new state of the table. What I disagree with is that the semantics are TRUNCATE and INSERT. The semantics should replace the current schema, partitioning, and data without losing the table history or other catalog information like access controls.

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-07 Thread Amogh Jahagirdar
Sorry to clarify my position on the timestamp nano fix, I think as long as we're failing and not in a position to silently produce incorrect values I'm OK to not block on a fix for that. On Mon, Jul 7, 2025 at 5:19 PM Amogh Jahagirdar <2am...@gmail.com> wrote: > Thanks Steven, > > I agree that we

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-07 Thread Amogh Jahagirdar
Thanks Steven, I agree that we can defer the timestamp nano fix, I had to write some tests to prove it to myself but every long microseconds or string timestamp value that's within range would always fail with an exception when converting to nanos; so I think we're OK as is though that is a long s

Re: [DISCUSS] Proposal for Iceberg 1.9.2 Release to Fix Critical REST Client Issue

2025-07-07 Thread Prashant Singh
Hey Ryan, Yes, Iceberg users are hitting 504 and hence table corruption here: Iceberg slack - https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1747992294134219 Here is an Apache Polaris thread for 504 corruption (different user, Fivetran) - https://apache-polaris.slack.com/archives/C084QSKD6S

Re: cleanExpiredMetadata in RemoveSnapshots

2025-07-07 Thread Ryan Blue
I think it's reasonable to expose the options through the stored procedure. I just don't think that we want to change to make it the default behavior. On Mon, Jul 7, 2025 at 8:37 AM Manu Zhang wrote: > I’m not seeing how Spark procedure contradicts to the catalog solution. > Catalogs can make de

Re: [DISCUSS] Proposal for Iceberg 1.9.2 Release to Fix Critical REST Client Issue

2025-07-07 Thread Ryan Blue
If the 1.9.2 release doesn't fix the issue that users were hitting, which if I understand correctly was a 503, do we need to do a patch release? Are users hitting 502 and 504 and need to stop the retry? On Mon, Jul 7, 2025 at 3:11 PM Prashant Singh wrote: > Thanks for the background, Dennis ! Th

Re: [DISCUSS] Proposal for Iceberg 1.9.2 Release to Fix Critical REST Client Issue

2025-07-07 Thread Prashant Singh
Thanks for the background, Dennis ! Thanks Ryan, It seems like we are ok in treating all 5xx errors as CommitStateUnknown and hence not retry, considering what spec says about 502 , 504

[discuss] pyiceberg returning iterators instead of lists

2025-07-07 Thread Jayce Slesar
Hey all, Recently been working on supporting pagination in the list methods in the rest catalog in pyiceberg and I think we have formed an opinion about wanting to do this lazily to respect the case that a user has a trillion objects in a response, we don't eagerly load a trillion objects into mem

[VOTE] Release Apache Iceberg 1.9.2 RC0

2025-07-07 Thread Prashant Singh
Hi everyone, I propose the following RC to be released as the official Apache Iceberg 1.9.2 release. The commit id is 071d5606bc6199a0be9b3f274ec7fbf111d88821 * This corresponds to the tag: apache-iceberg-1.9.2-rc0 * https://github.com/apache/iceberg/commits/apache-iceberg-1.9.2-rc0 * https://git

Re: Iceberg 1.10.0 release update - July 1, 2025

2025-07-07 Thread Steven Wu
Updates for the remaining 3 open PRs for 1.10.0 milestone https://github.com/apache/iceberg/milestone/54 Core: Keep track of data files to be removed for orphaned DV detection * got 2 approvals. should be merged in a day or two Spark 4.0: Migrate Ice

Re: [DISCUSS] Proposal for Relative Path Support In Table Spec

2025-07-07 Thread Kevin Liu
I can see the new event on the dev calendar. [image: Screenshot 2025-07-07 at 12.04.08 PM.png] Subscribe to the "Iceberg Dev Events" calendar here: https://iceberg.apache.org/community/#iceberg-community-events Best, Kevin Liu On Mon, Jul 7, 2025 at 11:38 AM Daniel Weeks wrote: > Hey Ally (a

Re: [DISCUSS] Proposal for Relative Path Support In Table Spec

2025-07-07 Thread Daniel Weeks
Hey Ally (and everyone else). We hadn't scheduled the discussion for relative paths, but I just added an event to the dev calendar for Thursday at 9am (PT). Let me know if you still don't see it on the calendar. -Dan On Sat, Jul 5, 2025 at 9:37 PM Jean-Baptiste Onofré wrote: > Hi Talat > > Th

Re: cleanExpiredMetadata in RemoveSnapshots

2025-07-07 Thread Manu Zhang
I’m not seeing how Spark procedure contradicts to the catalog solution. Catalogs can make decisions based on policies and pass down parameters to spark procedures to execute. In addition, it can be used by all catalogs and table maintenance systems. Regards, Manu Gábor Kaszab 于2025年7月7日 周一21:31写道

Re: cleanExpiredMetadata in RemoveSnapshots

2025-07-07 Thread Gábor Kaszab
Thanks for the response, JB! This could be a responsibility of the catalog and in turn a TMS, I agree. However, that seems more a mig/long-term solution, while the Spark expire_snapshots procedure is already there, the Java core implementation to clean expired specs and schemas is already there wi

Re: [DISCUSS] v4 - Improved column statistics

2025-07-07 Thread Gábor Kaszab
+1 Seems a great improvement! Let me know if I can help out with implementation, measurements, etc.! Regards, Gabor Kaszab John Zhuge ezt írta (időpont: 2025. jún. 5., Cs, 23:41): > +1 Looking forward to this feature > > John Zhuge > > > On Thu, Jun 5, 2025 at 2:22 PM Ryan Blue wrote: > >> > I