+1 to this being mostly targeting a "low-level" retry semantic. Expanding on that though I'd say even "client-side retries" really have two distinct flavors:
A. Business-logic-agnostic retries, e.g. in a common low-level HTTP client library - behaviorally, these should behave largely the same as "network infra retries". The key distinction is that in this case any content hashing would be *post* serialization and even agnostic to request-body content-type (i.e. not JSON-specific). B. Application-specific retries, such as when Iceberg client will potentially rebase on a new snapshot I think this aligns with what Peter and others mentioned earlier where trying to canonicalize the *semantic* content of a request is probably brittle/risky. And as Yufei mentions, case 2.B (client-side real application-layer retries) should be using a new idempotency-key if it's ever doing the retry at the later that requires re-serializating JSON. Overall though I agree making the content-hash checking optional is a good idea. On Fri, Sep 19, 2025 at 4:33 PM huaxin gao <huaxin.ga...@gmail.com> wrote: > Thanks, Peter and Yufei. I agree the main use case is > network‑infrastructure retries. To keep the specification simple and move > the proposal forward, let’s make the baseline key‑only idempotency. If > there’s demand, we can add an optional payload‑binding mode (canonical JSON > + SHA‑256), advertised via /v1/config. > > Thanks, > > Huaxin > > On Fri, Sep 19, 2025 at 1:31 PM Yufei Gu <flyrain...@gmail.com> wrote: > >> "*Network infrastructure retries*" would be the dominant use case. I'd >> NOT recommend clients retry with the same idempotency key if it regenerated >> the request, instead, clients should reload before retry in that case. >> >> Yufei >> >> >> On Fri, Sep 19, 2025 at 2:05 AM Péter Váry <peter.vary.apa...@gmail.com> >> wrote: >> >>> Hi Huaxin, >>> >>> Could you clarify the specific use cases we intend to support regarding >>> retry checking? Here are a couple of possibilities I had in mind: >>> >>> - *Network infrastructure retries* – where the exact same request is >>> retried. >>> - *Client-side retries* – where the client regenerates the request >>> using the same program logic, resulting in identical content. >>> >>> If there are no security or other concerns, I’d suggest keeping the >>> specification simple and avoiding mechanisms that surface client-side >>> implementation errors. The cleanest approach might be to ignore the request >>> content and rely solely on a user-provided key. >>> >>> Alternatively, we could include an optional error code in the response, >>> which implementations may use to signal conflicts. The actual conflict >>> detection logic can be left to the implementations—we don’t need to define >>> it in the specification. If we go this route, we should also offer a way to >>> disable these checks, since there will inevitably be cases where >>> semantically identical requests are incorrectly flagged as conflicting. >>> >>> Thanks, >>> Peter >>> >>> huaxin gao <huaxin.ga...@gmail.com> ezt írta (időpont: 2025. szept. >>> 19., P, 1:38): >>> >>>> Thanks Steven for the +1 and for raising the fingerprint question! >>>> Great points! >>>> >>>> What we need to protect against: >>>> >>>> >>>> - Same logical request, different bytes across retries (pretty vs >>>> compact JSON, map key order, ...). >>>> - Accidental key reuse with a changed payload. >>>> >>>> Options and tradeoffs: >>>> >>>> >>>> - Exact byte checksum (e.g., SHA‑256 over raw body) >>>> - Pro: trivial, fast >>>> - Con: too strict; benign diffs cause false mismatches >>>> >>>> >>>> - Canonical JSON over full request, then hash (proposed) >>>> - Pro: stable across whitespace/key order; simple to implement >>>> for typed payloads >>>> - Con: slightly more work than raw checksum; >>>> >>>> >>>> - Checksum of selected fields / field-by-field match >>>> - Pro: can be faster for huge payloads; can ignore noisy fields >>>> - Con: could misses legitimate differences >>>> >>>> >>>> - Request digest/signature >>>> - Pro: very strong >>>> - Con: heavyweight >>>> >>>> Maybe we could make this configurable: >>>> >>>> >>>> - canonical-json-sha256 (default) >>>> - raw-bytes-sha256 (strict) >>>> - trust-client-key (no fingerprint check) >>>> >>>> On the IETF draft status: >>>> >>>> I have also noted the draft’s expiry. We will align with its semantics >>>> for now and can adjust if a new version lands. >>>> >>>> Thanks, >>>> >>>> Huaxin >>>> >>>> On Thu, Sep 18, 2025 at 4:01 PM Steven Wu <stevenz...@gmail.com> wrote: >>>> >>>>> +1 for the feature that can make retry safe for 500s and improve the >>>>> client fault-tolerance of transient server failures. >>>>> >>>>> Peter and Dimitri raised a good question on the fingerprint. The IETF >>>>> draft doesn't actually define the fingerprint algo. We can also go with >>>>> simple checksum of the entire request payload, which would be cheap to >>>>> compute. Do we anticipate any anticipated scenarios where clients may >>>>> rewrite the payload in different forms of serialized bytes during retries? >>>>> >>>>> * Checksum of the entire request payload. >>>>> * Checksum of selected element(s) in the request payload. >>>>> * Field value match for each field in the request payload. >>>>> * Field value match for selected element(s) in the request payload. >>>>> * Request digest/signature >>>>> >>>>> >>>>> BTW, the IETF draft seems to have expired without approval >>>>> >>>>> https://datatracker.ietf.org/doc/draft-ietf-httpapi-idempotency-key-header/ >>>>> >>>>> On Thu, Sep 18, 2025 at 3:46 PM huaxin gao <huaxin.ga...@gmail.com> >>>>> wrote: >>>>> >>>>>> Thanks Peter and Dmitri for the thoughtful feedback! I really >>>>>> appreciate you taking a close look at my proposal. I agree that "semantic >>>>>> equality" is tricky, that's why the scope here is intentionally narrow. >>>>>> >>>>>> Just to clarify scope: I’m not trying to solve general semantic >>>>>> equivalence. For these specific, typed request payloads, I serialize to a >>>>>> deterministic JSON and hash it. That normalizes benign diffs (map order, >>>>>> whitespace) without trying to infer meaning. The goal is a stable >>>>>> fingerprint so that if a key is accidentally reused with a changed >>>>>> payload, >>>>>> we surface that instead of silently diverging. >>>>>> >>>>>> To make this feel less brittle, I’ll add tests for the practical >>>>>> cases (ordering/whitespace, nested maps, a clear null‑vs‑missing rule, >>>>>> numeric formatting), plus end‑to‑end tests in the in‑memory REST fixture >>>>>> with failure injection (in‑flight dup, finalize failure -> reconcile, >>>>>> etc.). Happy to walk through these if helpful. >>>>>> >>>>>> I’m also open to adding a config switch for “trust‑client‑key only” >>>>>> if that’s preferred in some environments. My intent is to stay aligned >>>>>> with >>>>>> the IETF Idempotency‑Key guidance (first request wins; conflicting reuse >>>>>> is >>>>>> rejected, and reusing a key with a different request payload is rejected >>>>>> via an idempotency fingerprint) while keeping things as simple as >>>>>> possible >>>>>> and protecting us from accidental key misuse. Would love to align on the >>>>>> lightest approach that meets those goals. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Huaxin >>>>>> >>>>>> On Thu, Sep 18, 2025 at 6:17 AM Dmitri Bourlatchkov <di...@apache.org> >>>>>> wrote: >>>>>> >>>>>>> Hi All, >>>>>>> >>>>>>> I agree that checking request contents is almost redundant in this >>>>>>> case. >>>>>>> >>>>>>> If the randomness quality of Idempotency-Key value is good, >>>>>>> collisions are very unlikely on the server side. Given that, any content >>>>>>> checks the server performs are essentially validating that clients >>>>>>> correctly reuse the generated Idempotency-Key value. (this is mostly the >>>>>>> same as my comment on the related Polaris discussion). >>>>>>> >>>>>>> I'd like to propose making the content check optional so that >>>>>>> servers may or may not implement it according to their design principles >>>>>>> and constraints and emphasizing that clients should use unique keys >>>>>>> (e.g. >>>>>>> UUIDs)... basically going with option 2 from Peter's email. >>>>>>> >>>>>>> I believe this is in line with the SHOULD word used for this case in >>>>>>> the IETF draft [1] (section 2.7). >>>>>>> >>>>>>> [1] >>>>>>> https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header-06 >>>>>>> >>>>>>> Thanks, >>>>>>> Dmitri. >>>>>>> >>>>>>> On Thu, Sep 18, 2025 at 7:56 AM Péter Váry < >>>>>>> peter.vary.apa...@gmail.com> wrote: >>>>>>> >>>>>>>> Thanks Huaxin for the proposal, and sorry for the late review - I >>>>>>>> had a bit of a busy week. >>>>>>>> I have one main question, which I have also added as a comment to >>>>>>>> the doc: >>>>>>>> - Why do we try to compare the request contents when the >>>>>>>> Idempotency-Key is the same for the requests? The comparison algorithm >>>>>>>> is a >>>>>>>> bit complicated, and seems brittle to me. Consistent field ordering, >>>>>>>> maps, >>>>>>>> and maybe even inconsistency in upper case/lower case letters might >>>>>>>> mean >>>>>>>> technically the same request. >>>>>>>> >>>>>>>> In my previous roles (admittedly more than 10 years ago) I was >>>>>>>> extensively working on APIs like this, and we have never really >>>>>>>> succeeded >>>>>>>> in creating a good enough "are these 2 requests are really the same >>>>>>>> semantically" checks. >>>>>>>> >>>>>>>> I would simplify these requirements, unless there are serious >>>>>>>> arguments for the existence of these checks: >>>>>>>> >>>>>>>> 1. Either check for exact matches - without any magic - this >>>>>>>> could be used for detecting issues where the duplication happens on >>>>>>>> the >>>>>>>> network side, or >>>>>>>> 2. Rely entirely on the clients to provide the correct >>>>>>>> Idempotency-Key. >>>>>>>> >>>>>>>> I would prefer the 2nd. >>>>>>>> Otherwise I agree with the contents of the proposal. It is nicely >>>>>>>> done! (edited) >>>>>>>> >>>>>>>> Yufei Gu <flyrain...@gmail.com> ezt írta (időpont: 2025. szept. >>>>>>>> 18., Cs, 2:54): >>>>>>>> >>>>>>>>> Thanks for the proposal. It's a nice feature to make retry more >>>>>>>>> reliable and efficient. Left some comments. >>>>>>>>> >>>>>>>>> Yufei >>>>>>>>> >>>>>>>>> >>>>>>>>> On Mon, Sep 15, 2025 at 3:53 PM Kevin Liu <kevinjq...@apache.org> >>>>>>>>> wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> Thanks for writing up the proposal! Makes sense to add >>>>>>>>>> idempotency to mutation requests. >>>>>>>>>> >>>>>>>>>> It would be helpful to add this feature to both the catalog test >>>>>>>>>> framework and the iceberg-rest-fixture >>>>>>>>>> <https://github.com/apache/iceberg/blob/754679ddccdf81a97dc65d40f1a2a6fb9f6ee9b0/open-api/src/testFixtures/java/org/apache/iceberg/rest/RESTCatalogServer.java#L112>. >>>>>>>>>> The latter is used by the subprojects for testing and would come in >>>>>>>>>> handy >>>>>>>>>> when we want to test out the client implementation. >>>>>>>>>> >>>>>>>>>> For other reviewers, the Stripe documentation on idempotency was >>>>>>>>>> a helpful read, https://docs.stripe.com/api/idempotent_requests. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Best, >>>>>>>>>> Kevin Liu >>>>>>>>>> >>>>>>>>>> On Mon, Sep 15, 2025 at 11:38 AM Szehon Ho < >>>>>>>>>> szehon.apa...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> Sounds like fairly standard practice and makes sense to me in >>>>>>>>>>> the first read. >>>>>>>>>>> >>>>>>>>>>> Thanks, >>>>>>>>>>> Szehon >>>>>>>>>>> >>>>>>>>>>> On Mon, Sep 15, 2025 at 10:09 AM Russell Spitzer < >>>>>>>>>>> russellspit...@apache.org> wrote: >>>>>>>>>>> >>>>>>>>>>>> I think based on the feedback on the proposal and in recent >>>>>>>>>>>> syncs we should probably move forward with the actual Spec Change >>>>>>>>>>>> PR so we >>>>>>>>>>>> can see what this looks like and move on to a discussion of how >>>>>>>>>>>> the Catalog >>>>>>>>>>>> test framework should test this. >>>>>>>>>>>> >>>>>>>>>>>> On 2025/08/22 18:26:23 huaxin gao wrote: >>>>>>>>>>>> > Hi all, >>>>>>>>>>>> > >>>>>>>>>>>> > I’d like to propose a change to Iceberg’s REST API to make >>>>>>>>>>>> mutation >>>>>>>>>>>> > requests safely retryable. >>>>>>>>>>>> > >>>>>>>>>>>> > *The Problem* >>>>>>>>>>>> > If a POST mutation (e.g., updateTable) succeeds in the >>>>>>>>>>>> catalog but the >>>>>>>>>>>> > client doesn’t receive the response (timeout, connection >>>>>>>>>>>> closed, etc.), a >>>>>>>>>>>> > second attempt can hit 409 Conflict. The client interprets >>>>>>>>>>>> the 409 as a >>>>>>>>>>>> > failed commit and deletes the associated metadata files, >>>>>>>>>>>> causing >>>>>>>>>>>> > catalog/storage inconsistency. >>>>>>>>>>>> > >>>>>>>>>>>> > *The Proposed Solution* >>>>>>>>>>>> > Introduces an optional Idempotency-Key HTTP header on REST >>>>>>>>>>>> mutation >>>>>>>>>>>> > endpoints and has the Iceberg client pass it through. >>>>>>>>>>>> > >>>>>>>>>>>> > *Semantics *(first processed request wins): >>>>>>>>>>>> > >>>>>>>>>>>> > - >>>>>>>>>>>> > >>>>>>>>>>>> > Same key + same canonical payload -> return the original >>>>>>>>>>>> result (no >>>>>>>>>>>> > re-execution). >>>>>>>>>>>> > - >>>>>>>>>>>> > >>>>>>>>>>>> > Same key + different payload -> 422 (Unprocessable >>>>>>>>>>>> Content). >>>>>>>>>>>> > >>>>>>>>>>>> > *Capability discovery:* catalogs can advertise support and >>>>>>>>>>>> retention so >>>>>>>>>>>> > clients know when a retry is safe, e.g. >>>>>>>>>>>> > >>>>>>>>>>>> > { >>>>>>>>>>>> > "idempotency-tokens-respected": true, >>>>>>>>>>>> > "idempotency-token-lifetime": "30m" } >>>>>>>>>>>> > >>>>>>>>>>>> > *Scope in Iceberg:* update the OpenAPI to include the header, >>>>>>>>>>>> and add >>>>>>>>>>>> > client pass-through + honoring capability discovery. No server >>>>>>>>>>>> > implementation is mandated—catalogs (e.g., Polaris) can >>>>>>>>>>>> implement >>>>>>>>>>>> > storage/TTL/replay as they choose. >>>>>>>>>>>> > >>>>>>>>>>>> > *Standards alignment:* uses the industry-standard header name >>>>>>>>>>>> and matches >>>>>>>>>>>> > the IETF HTTPAPI Idempotency-Key draft >>>>>>>>>>>> > < >>>>>>>>>>>> https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header >>>>>>>>>>>> > >>>>>>>>>>>> > semantics. >>>>>>>>>>>> > >>>>>>>>>>>> > *Compatibility:* fully backward compatible. Servers that >>>>>>>>>>>> don’t support it >>>>>>>>>>>> > can ignore the header; clients can detect support via >>>>>>>>>>>> capability discovery. >>>>>>>>>>>> > >>>>>>>>>>>> > Here is the proposal >>>>>>>>>>>> > < >>>>>>>>>>>> https://docs.google.com/document/d/1WyiIk08JRe8AjWh63txIP4i2xcIUHYQWFrF_1CCS3uw/edit?tab=t.0 >>>>>>>>>>>> >. >>>>>>>>>>>> > Looking forward to your thoughts. >>>>>>>>>>>> > >>>>>>>>>>>> > Thanks, >>>>>>>>>>>> > >>>>>>>>>>>> > Huaxin >>>>>>>>>>>> > >>>>>>>>>>>> >>>>>>>>>>>