Hi all, thanks for creating the doc and for splitting the discussion into pull and push mode.
I think that terminology is useful and helps to separate two very different cases. I agree that pull and push are useful options to discuss. I also think this is the right time to clarify whether push mode should be release documentation already, and what contract would be behind it. I am not objecting to the direction. I am objecting to publishing push mode as release documentation before we have defined its contract. Pull mode mostly looks like a normal REST/OAuth client pattern. I am not sure that needs a separate Delegation Service specification. I think pull mode is a good fit when the external service owns the workflow. When Polaris exposes the operation as Polaris behavior, for example DROP TABLE PURGE or server-side scan planning, Polaris owns the contract. For purge, that means durable state and eventual completion. For scan planning, that means bounded request behavior: timeouts, cancellation, resource limits, result-size limits, fallback behavior, and cache ownership. After that, pull vs push is mostly about where execution runs. Remote push mode is still different operationally: Polaris needs to coordinate with another separately deployed service that can fail independently, but users will still hold Polaris responsible for the correct result. That means the contract must define retry, failure handling, credentials, status, and operator controls. It also crosses security and service boundaries. The contract needs to define who the worker acts as, which credentials it gets, and how those credentials are scoped. It also needs to define how Polaris and the worker safely talk to each other across Kubernetes service, network, and proxy boundaries. Once documented as release behavior, users will expect Polaris to define what happens when Polaris, the worker, the object store, or the network fails. I do not think that contract exists yet. So I think this should either stay a design/proposal note for now, or the release documentation should clearly say that the push-mode contract is still TBD. I think the good news is that the "Asynchronous & Reliable Tasks" proposal already gives us a simpler foundation: Polaris should own the durable task state, meaning the persistent record of what work exists, whether it finished, and what needs retry. With that, the default deployment can stay simple, and remote execution can still be added later as an optional executor backend. I also think we should separate the advanced deployment option from the common user path. A remote push-mode Delegation Service can be useful for deployments that already have the operational machinery for separate worker services. But for many self-hosted users it also means another service to deploy, secure, monitor, scale, upgrade, and debug. So I would prefer that the common path stays simple first: Polaris owns the durable task state, and operators can run the worker in the same deployment or same image. Remote execution can then be added as an optional executor backend without making it the baseline model for everyone. The failure cases below are why I think this matters. They are not a request to solve every detail in this PR. For example: * What happens if the user-visible drop succeeds, but the purge task is not recorded yet? This matters when entities and tasks are served by different SPIs or backends. Atomicity across those writes cannot then be assumed. * What happens if a worker deletes some files and then crashes? Who owns retry? Where is progress recorded? Can another node safely resume a crashed node's work? * What happens if the worker needs to call Polaris after the table is already hidden or dropped from the normal API surface? This creates a cyclic dependency unless the task contains the information needed to continue without rediscovering the table through loadTable. * Server-side scan planning is also not a simple service call. It either needs a query engine, or the relevant planning parts of one. At minimum, the contract needs request budgets: timeouts, cancellation, backpressure, result-size limits, fallback behavior, and cache ownership. The existing proposals already contain most of the useful building blocks. For me, the safer order is to define the guarantees first, then document the deployment modes on top. One possible path could roughly look like this: 1. Define how destructive operations persist the intent for DROP TABLE PURGE. The important part is that the user-visible drop and the purge intent are recorded atomically. 2. Building on the "Asynchronous & Reliable Tasks" work for the durable Polaris task control plane gives us deterministic task IDs, task state, retry/lost-task recovery, and admin-visible status. 3. Using the "Object store functionality" work as the execution library for purge/file cleanup gives us streaming file discovery, bulk deletes, rate limiting, stats, and lower heap pressure. 4. Wire DROP TABLE PURGE to a reliable task behavior using those object store operations. Once Polaris returns success, the table is hidden from normal catalog APIs and the purge intent is durable. File deletion can continue asynchronously and survive process restarts. 5. Then consider deployment variants. A same-image task runner gives self-hosted operators isolation and separate scaling without a second protocol or persistence model. A remote Delegation Service can still be added later as an optional executor backend if SaaS deployments need that shape. This is not meant to block pull/push terminology. It is also not meant to rule out remote execution. I am mostly trying to avoid publishing push mode as supported release behavior before the task, security, request-budget, and operational contracts are defined. So I would prefer to keep this PR as a design/proposal note for now, or make the released documentation explicit that push mode is still TBD. My worry is that otherwise we ship a simple-looking doc that commits the project to a surprisingly complex distributed-systems design. Robert On Wed, May 13, 2026 at 11:50 PM Yufei Gu <[email protected]> wrote: > Hi folks, > > Sharing a few updates regarding the delegation service design doc. JB and I > will be co-authoring the document, and the PR has been updated accordingly. > > Please take a look at the latest changes here: > https://github.com/apache/polaris/pull/3990 > > Yufei > > > On Tue, Apr 14, 2026 at 1:56 PM Yufei Gu <[email protected]> wrote: > > > Hi everyone, > > > > We had a productive discussion on the delegation service during the > > Polaris Sprint on April 7, thanks all for the great input. > > > > As a quick summary, the current direction is to condense the design > doc[1] > > and focus on the two options the community seems to prefer moving forward > > with: pull mode and push mode. The goal is to keep the doc concise and > > briefly describe these two modes. > > > > Please let me know if I missed anything. And Looking forward your > feedback. > > > > 1. https://github.com/apache/polaris/pull/3990 > > > > Thanks, > > Yufei > > >
