Hi Huaxin, yeah, the handler-level approach is the right call, I'm glad you went with it.
To be explicit about what your proposal aims to resolve: - Authorization bypass is fixed. The check now runs after resource-level authorization. - No credentials in the database. Fresh credentials re-vended on replay via the normal vending path. - Phase 2 architecture problem gone. The handler has the principal, storage config, and IAM access that the filter never could have. - Caller identity is in the binding. Three things still need adressing: First, and this is a hard prerequisite before I look at any implementation code: Q3 from April 5 is still open. The store is being reused as-is, so the overhead can be measured right now without new handler code. 3+ DB round-trips per mutation on the catalog connection pool, write-hot table, polling loops under duplicate traffic - those numbers need to come from a realistic setup: networked database under concurrent load, not a local Postgres. The latency concerns are specifically about Aurora and CockroachDB, and the polling storm only shows up under concurrent duplicate traffic. Please share the methodology so the results are reproducible. Second: please open a GitHub issue for the createTableStaged deferral. Third: since the filter is going away entirely and the critical sections move to the handlers, does it make sense to close #3803 and open a fresh PR? The old diff is mostly code that's being deleted, and it carries a lot of review history. A clean PR would be easier for everyone to review - just reference the old one for context. Happy to do a fresh review once the new implementation is up. Best, Robert On Tue, Apr 14, 2026 at 1:44 AM huaxin gao <[email protected]> wrote: > Hi Robert, > > Here is a revised design proposal for the idempotency replay mechanism. I'd > appreciate your feedback. Thanks! > > *Moving idempotency to handler level* > > The idempotency check moves from the HTTP filter into the handler methods, > after authorization. The filter is removed. The caller's principal hash is > included in the idempotency binding and checked on replay. > > *Credential-bearing mutations (createTable)* > > On replay, instead of returning a stored response, the handler calls > buildLoadTableResponseWithDelegationCredentials with the existing table's > metadata. This vends fresh credentials for the current caller through the > normal credential vending path. No credentials are stored in the database — > the idempotency record only stores the key, principal hash, table > identifier, and status code. > > *Non-credential mutations (dropTable, renameTable, namespace/view > operations)* > > These do not vend credentials, so there is no leakage concern. On replay, > the handler returns the stored status code and minimal response body. > > *createTableStaged* > > Staged tables are not persisted to the catalog, so the method can safely > re-run on retry. Idempotency support for this endpoint can be deferred. > > *What is reused from the current implementation* > > The IdempotencyStore interface, database schema, configuration, and > background purge logic are reused. The main change is moving the > check/finalize logic from the filter into the handler. > > Best, > > Huaxin > > On Sun, Apr 5, 2026 at 11:53 AM huaxin gao <[email protected]> wrote: > > > Hi Robert, > > > > Thanks for the follow-up. > > > > I appreciate the clarification. I read the listed items as fix options > and > > picked option 2, and the Phase 1/Phase 2 split was purely to keep the PR > at > > a reviewable size, not a design choice. > > > > I agree that the caller's identity should be part of the binding. That is > > straightforward to add. > > > > On the architecture question, whether re-vending can work within the > > filter or needs a handler-level approach, I want to give this proper > > thought rather than rush an answer. I may not be able to get back to you > on > > this until next week (the week of 4/13). > > > > Thanks, > > > > Huaxin > > > > On Sat, Apr 4, 2026 at 11:42 PM Robert Stupp <[email protected]> wrote: > > > >> Hi Huaxin, > >> > >> Thanks for responding. A few corrections and clarifications. > >> > >> My PR review described three consequences of the missing caller > >> identity binding and the credential leakage attack vector. Those > >> were not suggestions for a Phase 1/Phase 2 split. The offline > >> decision to split into phases, and what each phase contains, was > >> yours [1]. > >> > >> On the phasing: the concern is not PR size. The concern is that > >> Phase 1 alone ships a full authorization bypass. The filter replays > >> cached responses before Polaris's resource-level authorization > >> runs. Stripping credentials does not fix that. Any authenticated > >> user with a valid idempotency key can receive a cached response > >> for a resource they are not authorized to access. That is the case > >> with or without credentials in the response. > >> > >> "We will not release it until entire idempotency coding is > >> finished" is reassuring, but the architecture question remains: > >> can the generic filter approach support re-vending at all? My > >> email explained why I believe it cannot, since it would require > >> the filter to authorize the caller, resolve storage locations, > >> call cloud IAM, and manage credential caching. That is the catalog > >> handler's job, not a filter's. > >> > >> The questions from my original email that still need answers: > >> > >> How does the binding prevent cross-principal cache hits without > >> including the caller's identity? > >> > >> How does Phase 2 re-vend credentials from a generic HTTP filter > >> without reimplementing the catalog handler's authorization and > >> credential vending logic? > >> > >> Has the operational overhead (3 DB round-trips per mutation on > >> the catalog's connection pool) been benchmarked on any backend? > >> > >> I'd like to resolve these on this thread before more code is > >> written. > >> > >> Best, > >> Robert > >> > >> [1] https://github.com/apache/polaris/pull/3803#issuecomment-4180969838 > >> > >> On Sun, Apr 5, 2026 at 7:55 AM huaxin gao <[email protected]> > wrote: > >> > >> > Hi Robert, > >> > > >> > Thanks for raising this on the mailing list. I acknowledge the design > >> > concerns you listed. > >> > > >> > To clarify: I picked your option 2 (strip credentials and re-vend on > >> > replay) you suggested in the PR review. The offline discussion with > >> Yufei > >> > and Prashant was simply to verify that I picked the best option from > all > >> > the three suggestions you listed in the review. > >> > > >> > The reason I split it into two phases is that the current PR is > already > >> > around 3000 lines. Adding re-vending logic on top would make it very > >> hard > >> > to review. This is a big feature and we need to do it piece by piece. > We > >> > will not release it until entire idempotency coding is finished. > >> > > >> > I'm happy to discuss the architecture further on this thread, whether > >> the > >> > filter approach can support re-vending, or whether we need a > >> handler-level > >> > design. I may be slow to respond next week but I'll follow up after > >> that. > >> > > >> > Thanks, > >> > > >> > Huaxin > >> > > >> > On Sat, Apr 4, 2026 at 9:34 AM Robert Stupp <[email protected]> wrote: > >> > > >> > > Hi all, > >> > > > >> > > I need to raise some fundamental design concerns with the current > >> > > Idempotency-Key implementation in PR #3803 [1]. Some of these were > >> > > raised in November 2025 [2] and were acknowledged [3] but never > >> > > addressed in the implementation. > >> > > > >> > > I raised the caller identity concern on the > >> > > mailing list in November 2025 [2]. In December 2025 [4], I > >> > > reiterated that all existing implementations I could find do > >> > > request fingerprinting including the request body, and warned that > >> > > the idempotency subsystem itself must not become a source of > >> > > failures. The PR was opened in February 2026 without addressing > >> > > either point. > >> > > > >> > > The credential leakage consequences surfaced during code review > >> > > [5]. The response was an offline decision between three > contributors, > >> > > without mailing list discussion [6]. The decision was to split > >> > > into Phase 1 (strip credentials, which produces broken responses > >> > > but does not fix the authorization bypass) and Phase 2 (re-vend > >> > > on replay, which would require a complete redesign and > >> > > reimplementation). Details below. > >> > > > >> > > I think we need to resolve these at the design level before more > >> > > code is written. > >> > > > >> > > I will not do further code reviews on this PR until these design > >> > > concerns are resolved. > >> > > > >> > > > >> > > 1. Caller identity is not part of the idempotency binding > >> > > > >> > > The caller's identity is not part of the binding. The > >> > > filter replays cached responses before Polaris's resource-level > >> > > authorization runs. Any authenticated user with a valid idempotency > >> > > key can receive a cached response for a resource they are not > >> > > authorized to access, including vended storage credentials. This > >> > > is a full authorization bypass. > >> > > > >> > > Idempotency keys are HTTP headers that end up in access logs, > >> > > observability tooling, debug output. They cannot be a security > >> > > boundary. I described the attack vector in my PR review [5]. > >> > > > >> > > The proposed Phase 2 [6] says it will "re-authorize the user and > >> > > re-vend fresh credentials on replay." That addresses credentials > >> > > but not the binding itself. > >> > > > >> > > > >> > > 2. Credential stripping breaks Polaris's deployment model > >> > > > >> > > Without mailing list discussion, the decision was made offline > >> > > to strip credentials before storing ("Phase 1") [6]. > >> > > > >> > > Polaris's purpose is to vend down-scoped credentials. Clients have > >> > > no ambient storage access. > >> > > > >> > > A stripped response returns a successful response ("table created") > >> > > to the client, which then fails hard because it cannot access the > >> > > object storage. Query engines (Spark, Trino, etc.) will fail on > >> > > the first storage operation against that table. Query engine users > >> > > have no way to resolve this, they don't control the REST catalog > >> > > interaction. For stage-create, the client gets a successful > >> > > response telling it to write data but has no credentials to write > >> > > with. No fallback exists. > >> > > > >> > > The stripped response is spec-compliant, but spec-compliant and > >> > > operationally correct are not the same thing. > >> > > > >> > > > >> > > 3. Phase 2 is architecturally incompatible with Phase 1 > >> > > > >> > > The filter is a generic HTTP filter below the application layer. It > >> > > treats responses as opaque JSON. Credential vending happens deep > >> > > inside IcebergCatalogHandler, scoped to specific table locations > >> > > and principals, through cloud IAM calls (STS AssumeRole, GCS token > >> > > exchange). > >> > > > >> > > Re-vending on replay would require the filter to resolve the > >> > > table's storage location, authorize the caller, call cloud IAM to > >> > > mint scoped credentials, and manage credential caching. That is > >> > > not a filter, that is reimplementing the catalog handler's > >> > > authorization and credential vending logic. > >> > > > >> > > Phase 1 and Phase 2 need to be designed and shipped together. > >> > > > >> > > > >> > > 4. Operational overhead > >> > > > >> > > Every idempotent mutation adds 3 database round-trips (INSERT, > UPDATE, > >> > > heartbeat UPDATEs), each borrowing its own connection > >> > > (autoCommit=true, no reuse), sharing the catalog's connection pool > >> > > and RDBMS. This added latency increases the probability of client > >> > > timeouts, which are exactly the transient failures that idempotency > >> > > is supposed to prevent. > >> > > > >> > > The idempotency table is high-churn and write-hot, competing with > >> > > catalog operations for connection pool capacity. The overhead gets > >> > > worse on managed and distributed database backends. Duplicate > >> > > requests trigger polling loops with repeated database reads. > >> > > > >> > > No benchmark data has been presented for any backend. > >> > > > >> > > > >> > > What needs to change > >> > > > >> > > The current generic-filter approach cannot support these > >> > > requirements. This needs a different design. > >> > > > >> > > A new design must address the following: > >> > > > >> > > a) The idempotency binding MUST include the caller's principal > >> > > identity. > >> > > b) Credential-bearing responses must not be stored as-is. On > >> > > replay, the handler must build a fresh response with fresh > >> > > credentials for the current caller. > >> > > c) The operational overhead needs to be benchmarked. > >> > > > >> > > I am happy to review code once the design addresses these points. > >> > > > >> > > Best, > >> > > Robert > >> > > > >> > > [1] https://github.com/apache/polaris/pull/3803 > >> > > [2] > https://lists.apache.org/thread/qqrgr2bzsp69629jdj8kf39m10pzwy6l > >> > > (Nov 24, 2025: "Building a service-internal key from the client > >> > > provided idempotency key plus more data from the request gives > >> > > better uniqueness. That more data can come from: operation id, > >> > > all operation parameters, including the whole payload, quite a > >> > > few HTTP request headers like user-agent, authorization, host, > >> > > accept-* and also the client's address.") > >> > > [3] > https://lists.apache.org/thread/7vy44mxsq2cgtp8gsj0r8pk8blp37h5b > >> > > (Dec 8, 2025: "I agree that a bare Idempotency-Key + entity > >> > > identifier is not enough to protect against buggy or malicious > >> > > clients; fingerprinting the full request would be stronger.") > >> > > [4] > https://lists.apache.org/thread/jt3gmpnjh6z490dvxorjs6ms00kvo264 > >> > > (Dec 8, 2025: "We should avoid failing Polaris if this subsystem > >> > > fails, or letting this subsystem be a reason for its existence > >> > > (aka retry due to timeouts because the idempotent-request > >> > > subsystem hangs).") > >> > > [5] > >> > > > >> > https://github.com/apache/polaris/pull/3803#pullrequestreview-4051690055 > >> > > [6] > >> https://github.com/apache/polaris/pull/3803#issuecomment-4180969838 > >> > > > >> > > >> > > >
