PR LGTM, thanks for following up on this!

Reiterating my comments in the PR, IMO especially now that Russell's
https://github.com/apache/iceberg/pull/13449 is merged so that all the
read-only methods can still perform low-level retries, this change makes
sense as a safer default for commit behaviors.

I could imagine a few potential follow-up features if people want to
achieve better availability in situations where more out-of-band knowledge
of specific server behaviors can be used on a case-by-case basis:

1. If anyone feels very strongly that in a given service environment some
particular 5xx code really is safely retriable without reconciliation, we
could consider extracting a runtime config that lists the non-retriable
HTTP codes instead of only having the hard-coded switch statement coming at
the cost of some code complexity
2. We could make it easier to inject a
dynamically-classloaded HttpRequestRetryStrategy in the HttpClient (
https://github.com/apache/iceberg/blob/a8fdb23682a7c083ea6ff9873f1531dd9d465aa7/core/src/main/java/org/apache/iceberg/rest/HTTPClient.java#L127)
- it seems there's already some precedent for injectable helpers here such
as rest.client.tls.configurer-impl -- custom impls could either be as
simple as using a different set of 5xx non-retriable codes, or as complex
as cooperating with other customized client-side logic to use an end-to-end
idempotency request-uuid to distinguish safe retries
3. We could consider letting other response headers such as "Retry-After" (
https://developer.mozilla.org/en-US/docs/Web/HTTP/Reference/Headers/Retry-After)
be a heuristic to interpret the *intent* of the server to communicate a
"graceful unavailability" that is safe to retry, vs an unexpected failure
where state is unknown

In any case though, it does seem prudent to start with the safe default.

On Wed, Jul 23, 2025 at 1:22 PM Prashant Singh <prashant010...@gmail.com>
wrote:

> Hey All,
> Presently we treat 503 status code in spec here <http:///> as if
> the request was not tried at all. While doing 1.9.2 it was brought up that
> this exception might not be true and some of the very common services like
> Istio and Envoy were brought up as examples.
>
> As a result I wanted to formally bring up disabling retries on 503 on
> updateTable and treating that as CommitStateUnknown similar to what we do
> for 500, 502, 504. The rationale for this being 503 doesn't necessarily
> mean the request was not tried at all and systems like ENVOY can return 503
> in the middle of processing.
>
> Adding detailed analysis of my colleague Dennis here :
> https://lists.apache.org/thread/h8xowcntomh7r5woz24cv27v3o8r9wgd
>
> It seemed like most people participating in the thread above agreed,
> starting this thread to see if there are objections to it in a wider forum.
>
> Meanwhile, I have filed a PR for the same
> [1] https://github.com/apache/iceberg/pull/13619
>
> Please let us know your thoughts and provide your feedback !
>
> Best,
> Prashant Singh
>
>
>
>

Reply via email to