The overall approach looks solid to me. My only concern is with the use of
the Retry-After header, it doesn't guarantee whether the server actually
processed the commit. It's possible the server completed the commit and
then became unavailable, still returning the Retry-After header. If the
client
I would agree with making 503 non-retryable by default. Though I don't
necessarily agree with the spec change in the PR. I feel like the only
thing that needs to be updated is the component definition in the spec to
say that clients may only retry if a `Retry-After` header is present.
Otherwise,
I'm in favor of marking 503 as non-retryable for updateTable to avoid
possible data corruption. Even though 503 can sometimes mean throttling, I
think treating it as CommitStateUnknown is the safer choice for commit
operations. We can look into better handling for throttling cases, like
using 429,
503 is throttling on many (but not all) of the AWS services (s3, lambda,
redshift..) - though not dynamodb or others that return 400 + some message.
If aws were to deploy an endpoint, they'd have to decide what to do. or
even better, the spec could say "there is a specific error code for
throttl
PR LGTM, thanks for following up on this!
Reiterating my comments in the PR, IMO especially now that Russell's
https://github.com/apache/iceberg/pull/13449 is merged so that all the
read-only methods can still perform low-level retries, this change makes
sense as a safer default for commit behavio
Hey All,
Presently we treat 503 status code in spec here as if
the request was not tried at all. While doing 1.9.2 it was brought up that
this exception might not be true and some of the very common services like
Istio and Envoy were brought up as examples.
As a result I wanted to formally bring