Re: [DISCUSS] Proposal for Iceberg 1.9.2 Release to Fix Critical REST Client Issue

2025-07-07 Thread Prashant Singh
Hey Ryan, Yes, Iceberg users are hitting 504 and hence table corruption here: Iceberg slack - https://apache-iceberg.slack.com/archives/C025PH0G1D4/p1747992294134219 Here is an Apache Polaris thread for 504 corruption (different user, Fivetran) - https://apache-polaris.slack.com/archives/C084QSKD6S

Re: [DISCUSS] Proposal for Iceberg 1.9.2 Release to Fix Critical REST Client Issue

2025-07-07 Thread Ryan Blue
If the 1.9.2 release doesn't fix the issue that users were hitting, which if I understand correctly was a 503, do we need to do a patch release? Are users hitting 502 and 504 and need to stop the retry? On Mon, Jul 7, 2025 at 3:11 PM Prashant Singh wrote: > Thanks for the background, Dennis ! Th

Re: [DISCUSS] Proposal for Iceberg 1.9.2 Release to Fix Critical REST Client Issue

2025-07-07 Thread Prashant Singh
Thanks for the background, Dennis ! Thanks Ryan, It seems like we are ok in treating all 5xx errors as CommitStateUnknown and hence not retry, considering what spec says about 502 , 504

Re: [DISCUSS] Proposal for Iceberg 1.9.2 Release to Fix Critical REST Client Issue

2025-06-26 Thread Ryan Blue
Thanks for all the background, Dennis! I think I agree that we should treat all 5xx errors as CommitStateUnknown because we don't necessarily know what was passed to the service. We would need to update the spec, right? On Thu, Jun 26, 2025 at 3:03 PM Dennis Huo wrote: > Discussed offline with v

Re: [DISCUSS] Proposal for Iceberg 1.9.2 Release to Fix Critical REST Client Issue

2025-06-26 Thread Dennis Huo
Discussed offline with various folks who were interested in more details about what kinds of standard infra components use 503 for things that should be interpreted as "CommitStateUnknown", so I dug into some common open-source components to track down common sources of such 503s. Two in particula

Re: [DISCUSS] Proposal for Iceberg 1.9.2 Release to Fix Critical REST Client Issue

2025-06-18 Thread Dennis Huo
Sounds like the confusion is indeed stemming from anchoring on behaviors of 502 and 504 as established prior art while not applying the same initial decision factors to 503. If we don't trust 5xx error codes, then wouldn't it make the most sense to > throw CommitStateUnknown for any 5xx response?

Re: [DISCUSS] Proposal for Iceberg 1.9.2 Release to Fix Critical REST Client Issue

2025-06-18 Thread Prashant Singh
EDIT: corrected some typos: Hey Ryan, My reasoning was the following: we retry on 502 / 504 as well here , we have a spec definition stating here

Re: [DISCUSS] Proposal for Iceberg 1.9.2 Release to Fix Critical REST Client Issue

2025-06-18 Thread Prashant Singh
Hey Ryan, My reasoning was the following: we retry on 502 / 504 as well here , we have a spec definition stating here

Re: [DISCUSS] Proposal for Iceberg 1.9.2 Release to Fix Critical REST Client Issue

2025-06-18 Thread Ryan Blue
Are we confident that the REST client fix is the right path? If I understand correctly, the issue is that there is a load balancer that is converting a 500 response to a 503 that the client will automatically retry, even though the 500 response is not retry-able. Then the retry results in a 409 be

Re: [DISCUSS] Proposal for Iceberg 1.9.2 Release to Fix Critical REST Client Issue

2025-06-17 Thread Russell Spitzer
Sorry I didn't get to reply here, I think the fix Ajantha is contributing is extremely important but probably out of scope for a patch release. Because it will take a bit of manual intervention to fix after jumping to the next version I think we should save this for 1.10.0 which also should come ou

Re: [DISCUSS] Proposal for Iceberg 1.9.2 Release to Fix Critical REST Client Issue

2025-06-17 Thread Prashant Singh
Thank you for confirming over slack Ajantha, I also double checked with Russell offline, this seems to be a behaviour change which can't go in a patch release, maybe 1.10 then. So I think we should be good for now. That being said, I will start working on getting a RC out with just this commit

Re: [DISCUSS] Proposal for Iceberg 1.9.2 Release to Fix Critical REST Client Issue

2025-06-17 Thread Prashant Singh
Hey Ajantha, I was going to wait for 2-3 days before cutting an RC anyway :), to see if anyone has an objection or some more *critical* changes to get in. Thank you for the heads up ! Best, Prashant Singh On Mon, Jun 16, 2025 at 11:02 AM Ajantha Bhat wrote: > I have a PR that needs to be i

Re: [DISCUSS] Proposal for Iceberg 1.9.2 Release to Fix Critical REST Client Issue

2025-06-16 Thread Ajantha Bhat
I have a PR that needs to be included for 1.9.2 as well! I was about to start a release discussion for 1.9.2 tomorrow. It is from an oversight during partition stats refactoring (in 1.9.0). We missed that field ids are tracked in spec during refactoring! Because of it, schema field ids are not as

Re: [DISCUSS] Proposal for Iceberg 1.9.2 Release to Fix Critical REST Client Issue

2025-06-16 Thread Steven Wu
+1 for a 1.9.2 release On Mon, Jun 16, 2025 at 10:53 AM Prashant Singh wrote: > Hey Kevin, > This goes well before 1.8, if you will see the issue that my PR refers to > is reported from iceberg 1.7, It has been there since the beginning of the > IRC client. > We were having similar debates on if

Re: [DISCUSS] Proposal for Iceberg 1.9.2 Release to Fix Critical REST Client Issue

2025-06-16 Thread Prashant Singh
Hey Kevin, This goes well before 1.8, if you will see the issue that my PR refers to is reported from iceberg 1.7, It has been there since the beginning of the IRC client. We were having similar debates on if we should patch all the releases, but I think this requires more wider discussions, but 1.

Re: [DISCUSS] Proposal for Iceberg 1.9.2 Release to Fix Critical REST Client Issue

2025-06-16 Thread Kevin Liu
Hi Prashant, This sounds like a good reason to do a patch release. I'm +1 Do you know if this is also affecting other minor versions? Do we also need to patch 1.8.x? Best, Kevin Liu On Mon, Jun 16, 2025 at 10:30 AM Prashant Singh wrote: > Hey all, > > A couple of users have recently reported