Re: [DISCUSS] Iceberg REST Catalog Idempotency

Dennis Huo Fri, 19 Sep 2025 17:35:14 -0700

+1 to this being mostly targeting a "low-level" retry semantic. Expanding
on that though I'd say even "client-side retries" really have two distinct
flavors:


A. Business-logic-agnostic retries, e.g. in a common low-level HTTP client
library - behaviorally, these should behave largely the same as "network
infra retries". The key distinction is that in this case any content
hashing would be *post* serialization and even agnostic to request-body
content-type (i.e. not JSON-specific).
B. Application-specific retries, such as when Iceberg client will
potentially rebase on a new snapshot

I think this aligns with what Peter and others mentioned earlier where
trying to canonicalize the *semantic* content of a request is probably
brittle/risky. And as Yufei mentions, case 2.B (client-side real
application-layer retries) should be using a new idempotency-key if it's
ever doing the retry at the later that requires re-serializating JSON.

Overall though I agree making the content-hash checking optional is a good
idea.

On Fri, Sep 19, 2025 at 4:33 PM huaxin gao <[email protected]> wrote:

> Thanks, Peter and Yufei. I agree the main use case is
> network‑infrastructure retries. To keep the specification simple and move
> the proposal forward, let’s make the baseline key‑only idempotency. If
> there’s demand, we can add an optional payload‑binding mode (canonical JSON
> + SHA‑256), advertised via /v1/config.
>
> Thanks,
>
> Huaxin
>
> On Fri, Sep 19, 2025 at 1:31 PM Yufei Gu <[email protected]> wrote:
>
>> "*Network infrastructure retries*" would be the dominant use case. I'd
>> NOT recommend clients retry with the same idempotency key if it regenerated
>> the request, instead, clients should reload before retry in that case.
>>
>> Yufei
>>
>>
>> On Fri, Sep 19, 2025 at 2:05 AM Péter Váry <[email protected]>
>> wrote:
>>
>>> Hi Huaxin,
>>>
>>> Could you clarify the specific use cases we intend to support regarding
>>> retry checking? Here are a couple of possibilities I had in mind:
>>>
>>>    - *Network infrastructure retries* – where the exact same request is
>>>    retried.
>>>    - *Client-side retries* – where the client regenerates the request
>>>    using the same program logic, resulting in identical content.
>>>
>>> If there are no security or other concerns, I’d suggest keeping the
>>> specification simple and avoiding mechanisms that surface client-side
>>> implementation errors. The cleanest approach might be to ignore the request
>>> content and rely solely on a user-provided key.
>>>
>>> Alternatively, we could include an optional error code in the response,
>>> which implementations may use to signal conflicts. The actual conflict
>>> detection logic can be left to the implementations—we don’t need to define
>>> it in the specification. If we go this route, we should also offer a way to
>>> disable these checks, since there will inevitably be cases where
>>> semantically identical requests are incorrectly flagged as conflicting.
>>>
>>> Thanks,
>>> Peter
>>>
>>> huaxin gao <[email protected]> ezt írta (időpont: 2025. szept.
>>> 19., P, 1:38):
>>>
>>>> Thanks Steven for the +1 and for raising the fingerprint question!
>>>> Great points!
>>>>
>>>> What we need to protect against:
>>>>
>>>>
>>>>    - Same logical request, different bytes across retries (pretty vs
>>>>    compact JSON, map key order, ...).
>>>>    - Accidental key reuse with a changed payload.
>>>>
>>>> Options and tradeoffs:
>>>>
>>>>
>>>>    - Exact byte checksum (e.g., SHA‑256 over raw body)
>>>>       - Pro: trivial, fast
>>>>       - Con: too strict; benign diffs cause false mismatches
>>>>
>>>>
>>>>    - Canonical JSON over full request, then hash (proposed)
>>>>       - Pro: stable across whitespace/key order; simple to implement
>>>>       for typed payloads
>>>>       - Con: slightly more work than raw checksum;
>>>>
>>>>
>>>>    - Checksum of selected fields / field-by-field match
>>>>       - Pro: can be faster for huge payloads; can ignore noisy fields
>>>>       - Con: could misses legitimate differences
>>>>
>>>>
>>>>    - Request digest/signature
>>>>       - Pro: very strong
>>>>       - Con: heavyweight
>>>>
>>>> Maybe we could make this configurable:
>>>>
>>>>
>>>>    - canonical-json-sha256 (default)
>>>>    - raw-bytes-sha256 (strict)
>>>>    - trust-client-key (no fingerprint check)
>>>>
>>>> On the IETF draft status:
>>>>
>>>> I have also noted the draft’s expiry. We will align with its semantics
>>>> for now and can adjust if a new version lands.
>>>>
>>>> Thanks,
>>>>
>>>> Huaxin
>>>>
>>>> On Thu, Sep 18, 2025 at 4:01 PM Steven Wu <[email protected]> wrote:
>>>>
>>>>> +1 for the feature that can make retry safe for 500s and improve the
>>>>> client fault-tolerance of transient server failures.
>>>>>
>>>>> Peter and Dimitri raised a good question on the fingerprint. The IETF
>>>>> draft doesn't actually define the fingerprint algo. We can also go with
>>>>> simple checksum of the entire request payload, which would be cheap to
>>>>> compute. Do we anticipate any anticipated scenarios where clients may
>>>>> rewrite the payload in different forms of serialized bytes during retries?
>>>>>
>>>>>    *  Checksum of the entire request payload.
>>>>>    *  Checksum of selected element(s) in the request payload.
>>>>>    *  Field value match for each field in the request payload.
>>>>>    *  Field value match for selected element(s) in the request payload.
>>>>>    *  Request digest/signature
>>>>>
>>>>>
>>>>> BTW, the IETF draft seems to have expired without approval
>>>>>
>>>>> https://datatracker.ietf.org/doc/draft-ietf-httpapi-idempotency-key-header/
>>>>>
>>>>> On Thu, Sep 18, 2025 at 3:46 PM huaxin gao <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> Thanks Peter and Dmitri for the thoughtful feedback! I really
>>>>>> appreciate you taking a close look at my proposal. I agree that "semantic
>>>>>> equality" is tricky, that's why the scope here is intentionally narrow.
>>>>>>
>>>>>> Just to clarify scope: I’m not trying to solve general semantic
>>>>>> equivalence. For these specific, typed request payloads, I serialize to a
>>>>>> deterministic JSON and hash it. That normalizes benign diffs (map order,
>>>>>> whitespace) without trying to infer meaning. The goal is a stable
>>>>>> fingerprint so that if a key is accidentally reused with a changed 
>>>>>> payload,
>>>>>> we surface that instead of silently diverging.
>>>>>>
>>>>>> To make this feel less brittle, I’ll add tests for the practical
>>>>>> cases (ordering/whitespace, nested maps, a clear null‑vs‑missing rule,
>>>>>> numeric formatting), plus end‑to‑end tests in the in‑memory REST fixture
>>>>>> with failure injection (in‑flight dup, finalize failure -> reconcile,
>>>>>> etc.). Happy to walk through these if helpful.
>>>>>>
>>>>>> I’m also open to adding a config switch for “trust‑client‑key only”
>>>>>> if that’s preferred in some environments. My intent is to stay aligned 
>>>>>> with
>>>>>> the IETF Idempotency‑Key guidance (first request wins; conflicting reuse 
>>>>>> is
>>>>>> rejected, and reusing a key with a different request payload is rejected
>>>>>> via an idempotency fingerprint) while keeping things as simple as 
>>>>>> possible
>>>>>> and protecting us from accidental key misuse. Would love to align on the
>>>>>> lightest approach that meets those goals.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Huaxin
>>>>>>
>>>>>> On Thu, Sep 18, 2025 at 6:17 AM Dmitri Bourlatchkov <[email protected]>
>>>>>> wrote:
>>>>>>
>>>>>>> Hi All,
>>>>>>>
>>>>>>> I agree that checking request contents is almost redundant in this
>>>>>>> case.
>>>>>>>
>>>>>>> If the randomness quality of Idempotency-Key value is good,
>>>>>>> collisions are very unlikely on the server side. Given that, any content
>>>>>>> checks the server performs are essentially validating that clients
>>>>>>> correctly reuse the generated Idempotency-Key value. (this is mostly the
>>>>>>> same as my comment on the related Polaris discussion).
>>>>>>>
>>>>>>> I'd like to propose making the content check optional so that
>>>>>>> servers may or may not implement it according to their design principles
>>>>>>> and constraints and emphasizing that clients should use unique keys 
>>>>>>> (e.g.
>>>>>>> UUIDs)... basically going with option 2 from Peter's email.
>>>>>>>
>>>>>>> I believe this is in line with the SHOULD word used for this case in
>>>>>>> the IETF draft [1] (section 2.7).
>>>>>>>
>>>>>>> [1]
>>>>>>> https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header-06
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Dmitri.
>>>>>>>
>>>>>>> On Thu, Sep 18, 2025 at 7:56 AM Péter Váry <
>>>>>>> [email protected]> wrote:
>>>>>>>
>>>>>>>> Thanks Huaxin for the proposal, and sorry for the late review - I
>>>>>>>> had a bit of a busy week.
>>>>>>>> I have one main question, which I have also added as a comment to
>>>>>>>> the doc:
>>>>>>>> - Why do we try to compare the request contents when the
>>>>>>>> Idempotency-Key is the same for the requests? The comparison algorithm 
>>>>>>>> is a
>>>>>>>> bit complicated, and seems brittle to me. Consistent field ordering, 
>>>>>>>> maps,
>>>>>>>> and maybe even inconsistency in upper case/lower case letters might 
>>>>>>>> mean
>>>>>>>> technically the same request.
>>>>>>>>
>>>>>>>> In my previous roles (admittedly more than 10 years ago) I was
>>>>>>>> extensively working on APIs like this, and we have never really 
>>>>>>>> succeeded
>>>>>>>> in creating a good enough "are these 2 requests are really the same
>>>>>>>> semantically" checks.
>>>>>>>>
>>>>>>>> I would simplify these requirements, unless there are serious
>>>>>>>> arguments for the existence of these checks:
>>>>>>>>
>>>>>>>>    1. Either check for exact matches - without any magic - this
>>>>>>>>    could be used for detecting issues where the duplication happens on 
>>>>>>>> the
>>>>>>>>    network side, or
>>>>>>>>    2. Rely entirely on the clients to provide the correct
>>>>>>>>    Idempotency-Key.
>>>>>>>>
>>>>>>>> I would prefer the 2nd.
>>>>>>>> Otherwise I agree with the contents of the proposal. It is nicely
>>>>>>>> done! (edited)
>>>>>>>>
>>>>>>>> Yufei Gu <[email protected]> ezt írta (időpont: 2025. szept.
>>>>>>>> 18., Cs, 2:54):
>>>>>>>>
>>>>>>>>> Thanks for the proposal. It's a nice feature to make retry more
>>>>>>>>> reliable and efficient. Left some comments.
>>>>>>>>>
>>>>>>>>> Yufei
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On Mon, Sep 15, 2025 at 3:53 PM Kevin Liu <[email protected]>
>>>>>>>>> wrote:
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Thanks for writing up the proposal! Makes sense to add
>>>>>>>>>> idempotency to mutation requests.
>>>>>>>>>>
>>>>>>>>>> It would be helpful to add this feature to both the catalog test
>>>>>>>>>> framework and the iceberg-rest-fixture
>>>>>>>>>> <https://github.com/apache/iceberg/blob/754679ddccdf81a97dc65d40f1a2a6fb9f6ee9b0/open-api/src/testFixtures/java/org/apache/iceberg/rest/RESTCatalogServer.java#L112>.
>>>>>>>>>> The latter is used by the subprojects for testing and would come in 
>>>>>>>>>> handy
>>>>>>>>>> when we want to test out the client implementation.
>>>>>>>>>>
>>>>>>>>>> For other reviewers, the Stripe documentation on idempotency was
>>>>>>>>>> a helpful read, https://docs.stripe.com/api/idempotent_requests.
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Best,
>>>>>>>>>> Kevin Liu
>>>>>>>>>>
>>>>>>>>>> On Mon, Sep 15, 2025 at 11:38 AM Szehon Ho <
>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>
>>>>>>>>>>> Hi,
>>>>>>>>>>>
>>>>>>>>>>> Sounds like fairly standard practice and makes sense to me in
>>>>>>>>>>> the first read.
>>>>>>>>>>>
>>>>>>>>>>> Thanks,
>>>>>>>>>>> Szehon
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Sep 15, 2025 at 10:09 AM Russell Spitzer <
>>>>>>>>>>> [email protected]> wrote:
>>>>>>>>>>>
>>>>>>>>>>>> I think based on the feedback on the proposal and in recent
>>>>>>>>>>>> syncs we should probably move forward with the actual Spec Change 
>>>>>>>>>>>> PR so we
>>>>>>>>>>>> can see what this looks like and move on to a discussion of how 
>>>>>>>>>>>> the Catalog
>>>>>>>>>>>> test framework should test this.
>>>>>>>>>>>>
>>>>>>>>>>>> On 2025/08/22 18:26:23 huaxin gao wrote:
>>>>>>>>>>>> > Hi all,
>>>>>>>>>>>> >
>>>>>>>>>>>> > I’d like to propose a change to Iceberg’s REST API to make
>>>>>>>>>>>> mutation
>>>>>>>>>>>> > requests safely retryable.
>>>>>>>>>>>> >
>>>>>>>>>>>> > *The Problem*
>>>>>>>>>>>> > If a POST mutation (e.g., updateTable) succeeds in the
>>>>>>>>>>>> catalog but the
>>>>>>>>>>>> > client doesn’t receive the response (timeout, connection
>>>>>>>>>>>> closed, etc.), a
>>>>>>>>>>>> > second attempt can hit 409 Conflict. The client interprets
>>>>>>>>>>>> the 409 as a
>>>>>>>>>>>> > failed commit and deletes the associated metadata files,
>>>>>>>>>>>> causing
>>>>>>>>>>>> > catalog/storage inconsistency.
>>>>>>>>>>>> >
>>>>>>>>>>>> > *The Proposed Solution*
>>>>>>>>>>>> > Introduces an optional Idempotency-Key HTTP header on REST
>>>>>>>>>>>> mutation
>>>>>>>>>>>> > endpoints and has the Iceberg client pass it through.
>>>>>>>>>>>> >
>>>>>>>>>>>> > *Semantics *(first processed request wins):
>>>>>>>>>>>> >
>>>>>>>>>>>> >    -
>>>>>>>>>>>> >
>>>>>>>>>>>> >    Same key + same canonical payload -> return the original
>>>>>>>>>>>> result (no
>>>>>>>>>>>> >    re-execution).
>>>>>>>>>>>> >    -
>>>>>>>>>>>> >
>>>>>>>>>>>> >    Same key + different payload -> 422 (Unprocessable
>>>>>>>>>>>> Content).
>>>>>>>>>>>> >
>>>>>>>>>>>> > *Capability discovery:* catalogs can advertise support and
>>>>>>>>>>>> retention so
>>>>>>>>>>>> > clients know when a retry is safe, e.g.
>>>>>>>>>>>> >
>>>>>>>>>>>> > {
>>>>>>>>>>>> >   "idempotency-tokens-respected": true,
>>>>>>>>>>>> >   "idempotency-token-lifetime": "30m" }
>>>>>>>>>>>> >
>>>>>>>>>>>> > *Scope in Iceberg:* update the OpenAPI to include the header,
>>>>>>>>>>>> and add
>>>>>>>>>>>> > client pass-through + honoring capability discovery. No server
>>>>>>>>>>>> > implementation is mandated—catalogs (e.g., Polaris) can
>>>>>>>>>>>> implement
>>>>>>>>>>>> > storage/TTL/replay as they choose.
>>>>>>>>>>>> >
>>>>>>>>>>>> > *Standards alignment:* uses the industry-standard header name
>>>>>>>>>>>> and matches
>>>>>>>>>>>> > the IETF HTTPAPI Idempotency-Key draft
>>>>>>>>>>>> > <
>>>>>>>>>>>> https://datatracker.ietf.org/doc/html/draft-ietf-httpapi-idempotency-key-header
>>>>>>>>>>>> >
>>>>>>>>>>>> > semantics.
>>>>>>>>>>>> >
>>>>>>>>>>>> > *Compatibility:* fully backward compatible. Servers that
>>>>>>>>>>>> don’t support it
>>>>>>>>>>>> > can ignore the header; clients can detect support via
>>>>>>>>>>>> capability discovery.
>>>>>>>>>>>> >
>>>>>>>>>>>> > Here is the proposal
>>>>>>>>>>>> > <
>>>>>>>>>>>> https://docs.google.com/document/d/1WyiIk08JRe8AjWh63txIP4i2xcIUHYQWFrF_1CCS3uw/edit?tab=t.0
>>>>>>>>>>>> >.
>>>>>>>>>>>> > Looking forward to your thoughts.
>>>>>>>>>>>> >
>>>>>>>>>>>> > Thanks,
>>>>>>>>>>>> >
>>>>>>>>>>>> > Huaxin
>>>>>>>>>>>> >
>>>>>>>>>>>>
>>>>>>>>>>>

Re: [DISCUSS] Iceberg REST Catalog Idempotency

Reply via email to