Hi all, supporting idempotency is a great enhancement for Polaris!
Being able to see the whole architecture and code design would help a lot. The best way to achieve this and iterate on the overall approach is to have a (draft) PR containing at least enough to let everybody perform tests against it and inspect the solution top-down - or even have the whole thing in one PR and later split it into "reviewable" smaller PRs. Robert On Mon, Dec 15, 2025 at 8:04 PM huaxin gao <[email protected]> wrote: > > Hi all, > > Thanks again for the detailed feedback on the idempotency design. I've gone > through all the comments and updated the design doc accordingly. If you see > any remaining gaps or have further questions, please let me know. If there > are no further comments, I'll resume the implementation work based on the > revised document. > > Best, > Huaxin > > On Tue, Dec 9, 2025 at 11:15 PM huaxin gao <[email protected]> wrote: > > > Hi Robert, > > > > > > Quick follow‑up to my mail from yesterday: I’ve just updated the proposal > > text to incorporate your comments. In particular: > > > > > > I clarified the finalization rules so the server MUST NOT return 2xx if > > commit/update preconditions (expected base snapshot, requested schema > > changes, etc.) are not satisfied; in those cases the handler returns an > > appropriate 4xx, and that 4xx is what the idempotency layer finalizes and > > replays. > > > > > > I added an explicit replay‑failure path: if a previously finalized result > > can no longer be reproduced (e.g., table dropped and metadata purged), the > > server returns a 5xx with subtype idempotency_replay_failed and does not > > try to re‑run the old mutation. > > > > > > Under Multi‑node Coordination, I wrote down the stale‑lease/reconciliation > > behavior more concretely and noted that pod restarts/crashes show up as > > missing heartbeats; duplicates then see a stale lease, run reconciliation > > once, and either return the original result or a 503 rather than waiting > > indefinitely. > > > > > > I added a short Failure modes note after the IdempotencyStore SPI > > describing how we handle a pluggable backend that is down/slow: > > coordination‑critical paths fail fast with a defined 5xx, while > > heartbeat/finalize are best‑effort so Polaris itself doesn’t get stuck. > > > > > > I also tightened the Non‑Goals section to state explicitly that this > > iteration only targets the built‑in Polaris Iceberg REST catalog, not > > federated or non‑IRC APIs. > > > > > > Happy to tweak the wording further if you think any of these areas still > > need more precision. > > > > > > Best, > > > > Huaxin > > > > > > On Mon, Dec 8, 2025 at 8:50 PM huaxin gao <[email protected]> wrote: > > > >> Hi Robert,Thanks a lot for taking the time to write such a detailed note > >> — I really appreciate the careful review and the references back to the > >> Iceberg spec. I agree this cuts across API, persistence, and > >> distributed‑systems concerns, so we need to get the design right before we > >> treat it as “done”.On a few of your main points: > >> > >> - Scope & Iceberg semantics > >> > >> The “key‑only semantics” phrase is my shorthand, not Iceberg wording. > >> What I meant is: in the earlier Iceberg mailing‑list discussion we > >> converged on baseline key‑only idempotency aimed at low‑level/network > >> retries, with no payload fingerprinting in the protocol. Servers treat > >> Idempotency-Key as an opaque token bound to a single > >> operation/resource/realm; if we ever explore payload‑binding, that would be > >> a separate follow‑up discussion, not part of the current design. > >> > >> > >> - Key vs fingerprinting > >> > >> I agree that a bare Idempotency-Key + entity identifier is not enough to > >> protect against buggy or malicious clients; fingerprinting the full request > >> would be stronger. For v1 I was trying to stay aligned with the Iceberg > >> REST spec (client‑supplied key, no payload fingerprint in the contract) and > >> keep the server implementation simple, but I’ll add a section that: > >> > >> > >> - calls out the risk you describe (two different logical requests > >> reusing the same key), > >> - spells out the binding we do enforce (key + operation type + > >> resource + realm), and > >> - treats request‑fingerprinting as a possible follow‑on > >> enhancement rather than something we’re silently ignoring. > >> - Multi‑pod, liveness and “followers waiting forever” > >> > >> I’ve expanded the design doc to describe the heartbeat/lease mechanism > >> and the behavior when the primary pod dies. In short: we don’t let > >> followers wait unboundedly. Each owner periodically calls updateHeartbeat, > >> and duplicates only wait while now − heartbeat_at is within a short > >> lease window; once that lease expires, we hand control to a reconciliation > >> step rather than continuing to block. I’ll make sure this algorithm and its > >> failure modes (including timeouts/back‑pressure limits) are written down > >> more rigorously, not just implied in the text. > >> > >> > >> - Backend pluggability and failure > >> > >> I agree that if the idempotency backend is pluggable, the design has to > >> cover the backend down / nuked explicitly so Polaris doesn’t just hang. I > >> will add a note in the design doc if the idempotency backend is unavailable > >> we must fail requests in a bounded way (not hang), and treat > >> hartbeat/finalize as best-effort so Polaris doesn't get stuck. > >> > >> > >> - Quarkus collaboration and scope > >> > >> I like the idea of collaborating with the Quarkus community on a more > >> generic JAX‑RS idempotency layer, and I agree there’s nothing inherently > >> “Polaris‑only” about many of these concerns. For the moment I’d still like > >> to keep this proposal scoped to the Polaris REST catalog (IRC) so we can > >> converge on concrete semantics there first, but I’ll add a short “future > >> work” section that talks about factoring out the generic pieces and > >> exploring Quarkus integration once we have agreement on the core behavior. > >> > >> Best, > >> Huaxin > >> > >> > >> > >> > >> On Mon, Dec 8, 2025 at 2:23 AM Robert Stupp <[email protected]> wrote: > >> > >>> Hi, > >>> > >>> > Spec alignment: Iceberg chose key‑only semantics (no payload > >>> fingerprinting) > >>> > >>> I do not see this "key-only semantics" mentioned anywhere in the > >>> Iceberg spec [1]. > >>> The Iceberg spec requirement "The idempotency key must be globally > >>> unique" [1] OTOH is impossible for a client to guarantee. > >>> > >>> It is "relatively easy" for clients to implement some retry mechanism > >>> and add some HTTP header (it is probably also not easy for clients to > >>> implement properly, see all the issues that happened in the past wrt > >>> when to (not) throw a CommitFailedException). > >>> It is definitely a complex task for servers. > >>> > >>> This feature touches many application, HTTP, security and distributed > >>> systems aspects and we should be very careful. > >>> I'd like to repeat my proposal to collaborate with the Quarkus > >>> community, because they have extensive knowledge about all these > >>> things and are very open to collaboration. I do not see any > >>> "specialties" that are unique to Polaris and force us to come up with > >>> our very own implementation. > >>> > >>> In any case, we should first design this functionality very carefully. > >>> Consider all use cases, the potential logical and technical states, > >>> exceptions, race conditions and failure scenarios. > >>> After we have consensus on all that, we can move on to the code. > >>> > >>> Some comments around the design and the feature itself: > >>> * The multi-table-commit endpoint doesn't seem to fit into the design > >>> (many "resources" not one)? > >>> * A resource has been deleted before the current state can be served > >>> ("delete" succeeds before the "follower of an update" finishes). The > >>> idempotent-request code would yield "serve this table-metadata" - but > >>> it cannot, as the metadata has been purged. > >>> * Two "idempotent requests" yielding different results, racing with > >>> another. While that's mentioned in the Iceberg spec as "response body > >>> *may* reflect a newer state", I am not convinced this is what all > >>> clients can cope with. For example, a client that adds a column to the > >>> schema expects that column to be present upon successful request > >>> completion. But the "response body may reflect a newer state" > >>> exception means that an "add column" operation can legitably yield a > >>> schema without the added column. Similar for column type changes, > >>> column removals, extending to sort-orders and partition-specs and lots > >>> more. This can lead to subtle issues in all clients and query engines. > >>> Isn't it a server-bug yielding a success-response to an Iceberg > >>> update-table request with non-fulfillable update-request-requirements? > >>> * "Sole idempotency-key + entity-identifier" is not enough. Two > >>> identical idempotency keys, either intentional or due to a bug, would > >>> lead to wrong responses. This would _not_ be a problem with > >>> fingerprinting _all_ inputs for the operation. All existing > >>> implementations and experience reports/posts that I could find do > >>> request fingerprinting, including the request body. > >>> * It is unclear whether this feature is intended for the "built in" > >>> Polaris Iceberg catalog or does it include federated catalogs? Is it > >>> useful for non-IRC APIs? > >>> > >>> Some comments around the technical things. All these could be > >>> "offloaded" to Quarkus, leveraging async event loop processing and > >>> circuit breaking: > >>> * If a pod executing the "primary" request dies, do "followers" wait > >>> "forever"? What is the actual distributed algorithm to resolve this > >>> without having a lot of threads spinning for a long time? This can > >>> happen for buggy clients, bad client configurations or intentionally > >>> bad clients. > >>> * Technical failure and rolling-restart/upgrade scenarios should be > >>> considered. > >>> * As the "idempotent request coordination backend" is pluggable, the > >>> design should also consider the case that the backend state is nuked, > >>> becomes unresponsive or fails in the meantime. We should avoid failing > >>> Polaris if this subsystem fails, or letting this subsystem be a reason > >>> for its existence (aka retry due to timeouts because the > >>> idempotent-request subsystem hangs). > >>> > >>> Robert > >>> > >>> [1] > >>> https://github.com/apache/iceberg/blob/19b4bd024486d9d516d0e547e273419c1bc7074e/open-api/rest-catalog-open-api.yaml#L1933-L1961 > >>> > >>> On Wed, Nov 26, 2025 at 7:08 PM huaxin gao <[email protected]> > >>> wrote: > >>> > > >>> > Thanks Robert for the thoughtful note! > >>> > > >>> > A generic JAX‑RS/Quarkus idempotency layer would be useful broadly, and > >>> > Quarkus’s distributed cache is a good building block. For Polaris, > >>> though, > >>> > we need a few things that go beyond caching or generic locking: > >>> > > >>> > > >>> > - No external lock service: we use an atomic “first‑writer‑wins” > >>> reserve > >>> > via a unique key in durable storage (single upsert), so exactly one > >>> node > >>> > owns a key; others see the existing row. > >>> > - Spec alignment: Iceberg chose key‑only semantics (no payload > >>> > fingerprinting). Safety comes from first‑acceptance plus binding > >>> > {operationType, resourceId, realm}; mismatched reuse -> 422; > >>> duplicates do > >>> > not re‑execute. > >>> > - Liveness and failover: heartbeat/lease while IN_PROGRESS and > >>> > reconciliation on stale owners (finalize‑gap/takeover) so > >>> duplicates don’t > >>> > block indefinitely and we avoid double execution. > >>> > - Durable replay: persist a minimal, equivalent response (not > >>> > in‑memory/TTL cache) and a clear status policy (finalize only > >>> 2xx/terminal > >>> > 4xx; never 5xx). > >>> > > >>> > Phase I: I’ll focus on implementing this in Polaris behind a small > >>> > storage‑agnostic SPI and wiring the flows. > >>> > > >>> > Phase II: we can revisit extracting the core into a reusable > >>> JAX‑RS/Quarkus > >>> > module, but for now I’d like to keep the scope on shipping Polaris v1. > >>> > Thanks, > >>> > Huaxin > >>> > > >>> > On Tue, Nov 25, 2025 at 11:18 PM Robert Stupp <[email protected]> wrote: > >>> > > >>> > > Hi all, > >>> > > > >>> > > To build an idempotent service, it seems necessary to consider some > >>> > > things, naming a few: > >>> > > * distributed locking, resilient to failure scenarios > >>> > > * distributed caching > >>> > > * request fingerprinting > >>> > > * request failure scenarios > >>> > > > >>> > > I think a generic JAX-RS idempotency functionality would be > >>> beneficial > >>> > > not just for Polaris. > >>> > > I can imagine that the Quarkus project would be very interested in > >>> > > such a thing. For example, Quarkus already has functionality for > >>> > > distributed caching in place, which is a building block for > >>> idempotent > >>> > > responses. > >>> > > Have we considered joining forces with them and leveraging synergies? > >>> > > > >>> > > Robert > >>> > > > >>> > > On Wed, Nov 26, 2025 at 4:57 AM huaxin gao <[email protected]> > >>> wrote: > >>> > > > > >>> > > > Hi Dmitri, > >>> > > > > >>> > > > Thanks for the reply and the detailed comments in the proposal. > >>> You’re > >>> > > > right: the goal is to implement the recently approved Iceberg > >>> > > > Idempotency-Key spec, and we don’t plan any additional REST > >>> Catalog API > >>> > > > changes in Polaris. I’ve refocused the proposal on the server-side > >>> > > > implementation and agree we should land the REST Catalog work > >>> first, then > >>> > > > extend to the Management API. > >>> > > > > >>> > > > I addressed your inline comments and added a small, > >>> backend-agnostic > >>> > > > Idempotency Persistence API > >>> (reserve/load/heartbeat/finalize/purge) so it > >>> > > > works across all storage backends (Postgres first). > >>> > > > > >>> > > > On the async tasks framework: agreed — there are synergies. I’ll > >>> keep > >>> > > this > >>> > > > in mind and align the idempotency store semantics with the async > >>> tasks > >>> > > > model. > >>> > > > Best, > >>> > > > Huaxin > >>> > > > > >>> > > > On Tue, Nov 25, 2025 at 12:21 PM Dmitri Bourlatchkov < > >>> [email protected]> > >>> > > > wrote: > >>> > > > > >>> > > > > Hi Huaxin, > >>> > > > > > >>> > > > > Thanks for resuming this proposal! > >>> > > > > > >>> > > > > In general, I suppose the intention is to implement the recently > >>> > > approved > >>> > > > > Iceberg REST Catalog spec change for Idempotency Keys. With that > >>> in > >>> > > mind, I > >>> > > > > believe the Polaris proposal probably needs to be more focused > >>> on the > >>> > > > > server side implementation now that the API spec has been > >>> finalized. I > >>> > > do > >>> > > > > not think Polaris needs any other API changes in the REST > >>> Catalog on > >>> > > top of > >>> > > > > the Iceberg spec. > >>> > > > > > >>> > > > > I'd propose to deal with the REST Catalog API first and then > >>> extend to > >>> > > the > >>> > > > > Management API (for the sake of simplicity). > >>> > > > > > >>> > > > > I added some more specific comments in the doc, but overall, I > >>> believe > >>> > > we > >>> > > > > need to consider what needs to be changed in the java > >>> Persistence API > >>> > > in > >>> > > > > Polaris because the idempotency feature probably applies to all > >>> > > backends. > >>> > > > > > >>> > > > > Also, as I commented [1] in earlier emails about this proposal, I > >>> > > believe > >>> > > > > some synergies can be found with the async tasks framework [2]. > >>> The > >>> > > main > >>> > > > > point here is orchestrating request execution among a set of > >>> > > distributed > >>> > > > > server nodes. > >>> > > > > > >>> > > > > [1] > >>> https://lists.apache.org/thread/28hx9kl4qmm5sho8jxmjlt6t0cd0hn6d > >>> > > > > > >>> > > > > [2] > >>> https://lists.apache.org/thread/gg0kn89vmblmjgllxn7jkn8ky2k28f5l > >>> > > > > > >>> > > > > Cheers, > >>> > > > > Dmitri. > >>> > > > > > >>> > > > > > >>> > > > > On Sat, Nov 22, 2025 at 7:50 PM huaxin gao < > >>> [email protected]> > >>> > > wrote: > >>> > > > > > >>> > > > > > Hi all, > >>> > > > > > I would like to restart the discussion on Idempotency-Key > >>> support in > >>> > > > > > Polaris. This proposal focuses on Polaris server-side behavior > >>> and > >>> > > > > > implementation details, with the Iceberg spec as the baseline > >>> API > >>> > > > > contract. > >>> > > > > > Thanks for your review and feedback. > >>> > > > > > > >>> > > > > > Polaris Idempotency Key Proposal > >>> > > > > > < > >>> > > > > > > >>> > > > > > >>> > > > >>> https://docs.google.com/document/d/1ToMMziFIa7DNJ6CxR5RSEg1dgJSS1zFzZfbngDz-EeU/edit?tab=t.0#heading=h.ecn4cggb6uy7 > >>> > > > > > > > >>> > > > > > > >>> > > > > > Iceberg Idempotency Key Proposal > >>> > > > > > < > >>> > > > > > > >>> > > > > > >>> > > > >>> https://docs.google.com/document/d/1WyiIk08JRe8AjWh63txIP4i2xcIUHYQWFrF_1CCS3uw/edit?tab=t.0#heading=h.jfecktgonj1i > >>> > > > > > > > >>> > > > > > > >>> > > > > > Best, > >>> > > > > > Huaxin > >>> > > > > > > >>> > > > > > >>> > > > >>> > >>
