Thank you all for taking the time to review and discuss! I’ve responded to all questions and updated the proposal. If there are no additional concerns, I’ll proceed to start a VOTE thread.
Thanks, Huaxin On Mon, Sep 22, 2025 at 1:30 AM Maninder Parmar < parmar.maninder...@gmail.com> wrote: > +1, for low level retry which ensures that the idempotent key is never > committed twice. I also agree that canonicalizing the request body where > the client can change it due to conflict resolution and retry would be hard > to get right. > > On Sat, Sep 20, 2025 at 5:58 AM Dennis Huo <huoi...@gmail.com> wrote: > >> +1 to this being mostly targeting a "low-level" retry semantic. Expanding >> on that though I'd say even "client-side retries" really have two distinct >> flavors: >> >> A. Business-logic-agnostic retries, e.g. in a common low-level HTTP >> client library - behaviorally, these should behave largely the same as >> "network infra retries". The key distinction is that in this case any >> content hashing would be *post* serialization and even agnostic to >> request-body content-type (i.e. not JSON-specific). >> B. Application-specific retries, such as when Iceberg client will >> potentially rebase on a new snapshot >> >> I think this aligns with what Peter and others mentioned earlier where >> trying to canonicalize the *semantic* content of a request is probably >> brittle/risky. And as Yufei mentions, case 2.B (client-side real >> application-layer retries) should be using a new idempotency-key if it's >> ever doing the retry at the later that requires re-serializating JSON. >> >> Overall though I agree making the content-hash checking optional is a >> good idea. >> >> On Fri, Sep 19, 2025 at 4:33 PM huaxin gao <huaxin.ga...@gmail.com> >> wrote: >> >>> Thanks, Peter and Yufei. I agree the main use case is >>> network‑infrastructure retries. To keep the specification simple and move >>> the proposal forward, let’s make the baseline key‑only idempotency. If >>> there’s demand, we can add an optional payload‑binding mode (canonical JSON >>> + SHA‑256), advertised via /v1/config. >>> >>> Thanks, >>> >>> Huaxin >>> >>> On Fri, Sep 19, 2025 at 1:31 PM Yufei Gu <flyrain...@gmail.com> wrote: >>> >>>> "*Network infrastructure retries*" would be the dominant use case. I'd >>>> NOT recommend clients retry with the same idempotency key if it regenerated >>>> the request, instead, clients should reload before retry in that case. >>>> >>>> Yufei >>>> >>>> >>>> On Fri, Sep 19, 2025 at 2:05 AM Péter Váry <peter.vary.apa...@gmail.com> >>>> wrote: >>>> >>>>> Hi Huaxin, >>>>> >>>>> Could you clarify the specific use cases we intend to support >>>>> regarding retry checking? Here are a couple of possibilities I had in >>>>> mind: >>>>> >>>>> - *Network infrastructure retries* – where the exact same request >>>>> is retried. >>>>> - *Client-side retries* – where the client regenerates the request >>>>> using the same program logic, resulting in identical content. >>>>> >>>>> If there are no security or other concerns, I’d suggest keeping the >>>>> specification simple and avoiding mechanisms that surface client-side >>>>> implementation errors. The cleanest approach might be to ignore the >>>>> request >>>>> content and rely solely on a user-provided key. >>>>> >>>>> Alternatively, we could include an optional error code in the >>>>> response, which implementations may use to signal conflicts. The actual >>>>> conflict detection logic can be left to the implementations—we don’t need >>>>> to define it in the specification. If we go this route, we should also >>>>> offer a way to disable these checks, since there will inevitably be cases >>>>> where semantically identical requests are incorrectly flagged as >>>>> conflicting. >>>>> >>>>> Thanks, >>>>> Peter >>>>> >>>>> huaxin gao <huaxin.ga...@gmail.com> ezt írta (időpont: 2025. szept. >>>>> 19., P, 1:38): >>>>> >>>>>> Thanks Steven for the +1 and for raising the fingerprint question! >>>>>> Great points! >>>>>> >>>>>> What we need to protect against: >>>>>> >>>>>> >>>>>> - Same logical request, different bytes across retries (pretty vs >>>>>> compact JSON, map key order, ...). >>>>>> - Accidental key reuse with a changed payload. >>>>>> >>>>>> Options and tradeoffs: >>>>>> >>>>>> >>>>>> - Exact byte checksum (e.g., SHA‑256 over raw body) >>>>>> - Pro: trivial, fast >>>>>> - Con: too strict; benign diffs cause false mismatches >>>>>> >>>>>> >>>>>> - Canonical JSON over full request, then hash (proposed) >>>>>> - Pro: stable across whitespace/key order; simple to implement >>>>>> for typed payloads >>>>>> - Con: slightly more work than raw checksum; >>>>>> >>>>>> >>>>>> - Checksum of selected fields / field-by-field match >>>>>> - Pro: can be faster for huge payloads; can ignore noisy fields >>>>>> - Con: could misses legitimate differences >>>>>> >>>>>> >>>>>> - Request digest/signature >>>>>> - Pro: very strong >>>>>> - Con: heavyweight >>>>>> >>>>>> Maybe we could make this configurable: >>>>>> >>>>>> >>>>>> - canonical-json-sha256 (default) >>>>>> - raw-bytes-sha256 (strict) >>>>>> - trust-client-key (no fingerprint check) >>>>>> >>>>>> On the IETF draft status: >>>>>> >>>>>> I have also noted the draft’s expiry. We will align with >>>>>> its semantics for now and can adjust if a new version lands. >>>>>> >>>>>> Thanks, >>>>>> >>>>>> Huaxin >>>>>> >>>>>> On Thu, Sep 18, 2025 at 4:01 PM Steven Wu <stevenz...@gmail.com> >>>>>> wrote: >>>>>> >>>>>>> +1 for the feature that can make retry safe for 500s and improve the >>>>>>> client fault-tolerance of transient server failures. >>>>>>> >>>>>>> Peter and Dimitri raised a good question on the fingerprint. The >>>>>>> IETF draft doesn't actually define the fingerprint algo. We can also go >>>>>>> with simple checksum of the entire request payload, which would be >>>>>>> cheap to >>>>>>> compute. Do we anticipate any anticipated scenarios where clients may >>>>>>> rewrite the payload in different forms of serialized bytes during >>>>>>> retries? >>>>>>> >>>>>>> * Checksum of the entire request payload. >>>>>>> * Checksum of selected element(s) in the request payload. >>>>>>> * Field value match for each field in the request payload. >>>>>>> * Field value match for selected element(s) in the request payload. >>>>>>> * Request digest/signature >>>>>>> >>>>>>> >>>>>>> BTW, the IETF draft seems to have expired without approval >>>>>>> >>>>>>> https://datatracker.ietf.org/doc/draft-ietf-httpapi-idempotency-key-header/ >>>>>>> >>>>>>> On Thu, Sep 18, 2025 at 3:46 PM huaxin gao <huaxin.ga...@gmail.com> >>>>>>> wrote: >>>>>>> >>>>>>>> Thanks Peter and Dmitri for the thoughtful feedback! I really >>>>>>>> appreciate you taking a close look at my proposal. I agree that >>>>>>>> "semantic >>>>>>>> equality" is tricky, that's why the scope here is intentionally narrow. >>>>>>>> >>>>>>>> Just to clarify scope: I’m not trying to solve general semantic >>>>>>>> equivalence. For these specific, typed request payloads, I serialize >>>>>>>> to a >>>>>>>> deterministic JSON and hash it. That normalizes benign diffs (map >>>>>>>> order, >>>>>>>> whitespace) without trying to infer meaning. The goal is a stable >>>>>>>> fingerprint so that if a key is accidentally reused with a changed >>>>>>>> payload, >>>>>>>> we surface that instead of silently diverging. >>>>>>>> >>>>>>>> To make this feel less brittle, I’ll add tests for the practical >>>>>>>> cases (ordering/whitespace, nested maps, a clear null‑vs‑missing rule, >>>>>>>> numeric formatting), plus end‑to‑end tests in the in‑memory REST >>>>>>>> fixture >>>>>>>> with failure injection (in‑flight dup, finalize failure -> reconcile, >>>>>>>> etc.). Happy to walk through these if helpful. >>>>>>>> >>>>>>>> I’m also open to adding a config switch for “trust‑client‑key only” >>>>>>>> if that’s preferred in some environments. My intent is to stay aligned >>>>>>>> with >>>>>>>> the IETF Idempotency‑Key guidance (first request wins; conflicting >>>>>>>> reuse is >>>>>>>> rejected, and reusing a key with a different request payload is >>>>>>>> rejected >>>>>>>> via an idempotency fingerprint) while keeping things as simple as >>>>>>>> possible >>>>>>>> and protecting us from accidental key misuse. Would love to align on >>>>>>>> the >>>>>>>> lightest approach that meets those goals. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Huaxin >>>>>>>> >>>>>>>> On Thu, Sep 18, 2025 at 6:17 AM Dmitri Bourlatchkov < >>>>>>>> di...@apache.org> wrote: >>>>>>>> >>>>>>>>> Hi All, >>>>>>>>> >>>>>>>>> I agree that checking request contents is almost redundant in this >>>>>>>>> case. >>>>>>>>> >>>>>>>>> If the randomness quality of Idempotency-Key value is good, >>>>>>>>> collisions are very unlikely on the server side. Given that, any >>>>>>>>> content >>>>>>>>> checks the server performs are essentially validating that clients >>>>>>>>> correctly reuse the generated Idempotency-Key value. (this is mostly >>>>>>>>> the >>>>>>>>> same as my comment on the related Polaris discussion). >>>>>>>>> >>>>>>>>> I'd like to propose making the content check optional so that >>>>>>>>> servers may or may not implement it according to their design >>>>>>>>> principles >>>>>>>>> and constraints and emphasizing that clients should use unique keys >>>>>>>>> (e.g. >>>>>>>>> UUIDs)... basically going with option 2 from Peter's email. >>>>>>>>> >>>>>>>>> I believe this is in line with the SHOULD word used for this case >>>>>>>>> in the IETF draft [1] (section 2.7). >>>>>>>>> >>>>>>>>> [1] >>>>>>>>> https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header-06 >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Dmitri. >>>>>>>>> >>>>>>>>> On Thu, Sep 18, 2025 at 7:56 AM Péter Váry < >>>>>>>>> peter.vary.apa...@gmail.com> wrote: >>>>>>>>> >>>>>>>>>> Thanks Huaxin for the proposal, and sorry for the late review - I >>>>>>>>>> had a bit of a busy week. >>>>>>>>>> I have one main question, which I have also added as a comment to >>>>>>>>>> the doc: >>>>>>>>>> - Why do we try to compare the request contents when the >>>>>>>>>> Idempotency-Key is the same for the requests? The comparison >>>>>>>>>> algorithm is a >>>>>>>>>> bit complicated, and seems brittle to me. Consistent field ordering, >>>>>>>>>> maps, >>>>>>>>>> and maybe even inconsistency in upper case/lower case letters might >>>>>>>>>> mean >>>>>>>>>> technically the same request. >>>>>>>>>> >>>>>>>>>> In my previous roles (admittedly more than 10 years ago) I was >>>>>>>>>> extensively working on APIs like this, and we have never really >>>>>>>>>> succeeded >>>>>>>>>> in creating a good enough "are these 2 requests are really the same >>>>>>>>>> semantically" checks. >>>>>>>>>> >>>>>>>>>> I would simplify these requirements, unless there are serious >>>>>>>>>> arguments for the existence of these checks: >>>>>>>>>> >>>>>>>>>> 1. Either check for exact matches - without any magic - this >>>>>>>>>> could be used for detecting issues where the duplication happens >>>>>>>>>> on the >>>>>>>>>> network side, or >>>>>>>>>> 2. Rely entirely on the clients to provide the correct >>>>>>>>>> Idempotency-Key. >>>>>>>>>> >>>>>>>>>> I would prefer the 2nd. >>>>>>>>>> Otherwise I agree with the contents of the proposal. It is nicely >>>>>>>>>> done! (edited) >>>>>>>>>> >>>>>>>>>> Yufei Gu <flyrain...@gmail.com> ezt írta (időpont: 2025. szept. >>>>>>>>>> 18., Cs, 2:54): >>>>>>>>>> >>>>>>>>>>> Thanks for the proposal. It's a nice feature to make retry more >>>>>>>>>>> reliable and efficient. Left some comments. >>>>>>>>>>> >>>>>>>>>>> Yufei >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Mon, Sep 15, 2025 at 3:53 PM Kevin Liu <kevinjq...@apache.org> >>>>>>>>>>> wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Thanks for writing up the proposal! Makes sense to add >>>>>>>>>>>> idempotency to mutation requests. >>>>>>>>>>>> >>>>>>>>>>>> It would be helpful to add this feature to both the catalog >>>>>>>>>>>> test framework and the iceberg-rest-fixture >>>>>>>>>>>> <https://github.com/apache/iceberg/blob/754679ddccdf81a97dc65d40f1a2a6fb9f6ee9b0/open-api/src/testFixtures/java/org/apache/iceberg/rest/RESTCatalogServer.java#L112>. >>>>>>>>>>>> The latter is used by the subprojects for testing and would come >>>>>>>>>>>> in handy >>>>>>>>>>>> when we want to test out the client implementation. >>>>>>>>>>>> >>>>>>>>>>>> For other reviewers, the Stripe documentation on idempotency >>>>>>>>>>>> was a helpful read, >>>>>>>>>>>> https://docs.stripe.com/api/idempotent_requests. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Best, >>>>>>>>>>>> Kevin Liu >>>>>>>>>>>> >>>>>>>>>>>> On Mon, Sep 15, 2025 at 11:38 AM Szehon Ho < >>>>>>>>>>>> szehon.apa...@gmail.com> wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> Sounds like fairly standard practice and makes sense to me in >>>>>>>>>>>>> the first read. >>>>>>>>>>>>> >>>>>>>>>>>>> Thanks, >>>>>>>>>>>>> Szehon >>>>>>>>>>>>> >>>>>>>>>>>>> On Mon, Sep 15, 2025 at 10:09 AM Russell Spitzer < >>>>>>>>>>>>> russellspit...@apache.org> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> I think based on the feedback on the proposal and in recent >>>>>>>>>>>>>> syncs we should probably move forward with the actual Spec >>>>>>>>>>>>>> Change PR so we >>>>>>>>>>>>>> can see what this looks like and move on to a discussion of how >>>>>>>>>>>>>> the Catalog >>>>>>>>>>>>>> test framework should test this. >>>>>>>>>>>>>> >>>>>>>>>>>>>> On 2025/08/22 18:26:23 huaxin gao wrote: >>>>>>>>>>>>>> > Hi all, >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > I’d like to propose a change to Iceberg’s REST API to make >>>>>>>>>>>>>> mutation >>>>>>>>>>>>>> > requests safely retryable. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > *The Problem* >>>>>>>>>>>>>> > If a POST mutation (e.g., updateTable) succeeds in the >>>>>>>>>>>>>> catalog but the >>>>>>>>>>>>>> > client doesn’t receive the response (timeout, connection >>>>>>>>>>>>>> closed, etc.), a >>>>>>>>>>>>>> > second attempt can hit 409 Conflict. The client interprets >>>>>>>>>>>>>> the 409 as a >>>>>>>>>>>>>> > failed commit and deletes the associated metadata files, >>>>>>>>>>>>>> causing >>>>>>>>>>>>>> > catalog/storage inconsistency. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > *The Proposed Solution* >>>>>>>>>>>>>> > Introduces an optional Idempotency-Key HTTP header on REST >>>>>>>>>>>>>> mutation >>>>>>>>>>>>>> > endpoints and has the Iceberg client pass it through. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > *Semantics *(first processed request wins): >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > - >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Same key + same canonical payload -> return the original >>>>>>>>>>>>>> result (no >>>>>>>>>>>>>> > re-execution). >>>>>>>>>>>>>> > - >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Same key + different payload -> 422 (Unprocessable >>>>>>>>>>>>>> Content). >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > *Capability discovery:* catalogs can advertise support and >>>>>>>>>>>>>> retention so >>>>>>>>>>>>>> > clients know when a retry is safe, e.g. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > { >>>>>>>>>>>>>> > "idempotency-tokens-respected": true, >>>>>>>>>>>>>> > "idempotency-token-lifetime": "30m" } >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > *Scope in Iceberg:* update the OpenAPI to include the >>>>>>>>>>>>>> header, and add >>>>>>>>>>>>>> > client pass-through + honoring capability discovery. No >>>>>>>>>>>>>> server >>>>>>>>>>>>>> > implementation is mandated—catalogs (e.g., Polaris) can >>>>>>>>>>>>>> implement >>>>>>>>>>>>>> > storage/TTL/replay as they choose. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > *Standards alignment:* uses the industry-standard header >>>>>>>>>>>>>> name and matches >>>>>>>>>>>>>> > the IETF HTTPAPI Idempotency-Key draft >>>>>>>>>>>>>> > < >>>>>>>>>>>>>> https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > semantics. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > *Compatibility:* fully backward compatible. Servers that >>>>>>>>>>>>>> don’t support it >>>>>>>>>>>>>> > can ignore the header; clients can detect support via >>>>>>>>>>>>>> capability discovery. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Here is the proposal >>>>>>>>>>>>>> > < >>>>>>>>>>>>>> https://docs.google.com/document/d/1WyiIk08JRe8AjWh63txIP4i2xcIUHYQWFrF_1CCS3uw/edit?tab=t.0 >>>>>>>>>>>>>> >. >>>>>>>>>>>>>> > Looking forward to your thoughts. >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Thanks, >>>>>>>>>>>>>> > >>>>>>>>>>>>>> > Huaxin >>>>>>>>>>>>>> > >>>>>>>>>>>>>> >>>>>>>>>>>>>