Hey Dmitri,

There's a section in the email above and the linked doc that talks about
the linked proposal. See "Relationship to the "Asynchronous & Reliable
Tasks" Proposal".

As for pulling away from a REST API in favor of driving things directly
from persistence, there's a lot to discuss here. Bear in mind that the
design goes into detail about one proposed "TaskExecutor" implementation;
maybe another TaskExecutor could work exactly like you describe. But the
reason that this implementation proposes to be driven by a REST API is that
there's a lot of interesting future work -- see the "Future Work" section
of the doc for some examples -- that can be added on to the REST API. In
particular, table maintenance actions like compaction.

--EM

On Mon, Jun 23, 2025 at 2:31 PM Dmitri Bourlatchkov <di...@apache.org>
wrote:

> Hi All,
>
> A previous proposal by Robert [1] from May 9 appears to be related. I think
> we should consider both at the same time, possibly as alternatives, but
> perhaps also sharing / reusing their respective ideas.
>
> A few notes after a quick review:
>
> * Separate scaling for task executors seems reasonable at first glance, but
> it adds deployment complexity. If we go with this approach, I believe it
> would be worth making this deployment strategy optional. In other words let
> admin users decide whether they want to have extra nodes dedicated to
> specific tasks or whether they are ok with having uniform nodes.
>
> * I'm not sure a separate rich REST API for submitting tasks is really
> necessary. Proper synchronization among multiple nodes will
> probably require roundtrips to Persistence anyway, so task submission could
> probably be done via Persistence.
>
> [1] https://lists.apache.org/thread/gg0kn89vmblmjgllxn7jkn8ky2k28f5l
>
> Thanks,
> Dmitri.
>
>
> On Mon, Jun 23, 2025 at 3:12 PM William Hyun <will...@apache.org> wrote:
>
> > Hello Polaris Community,
> >
> > I would like to share my proposal for a new service, the Polaris
> > Delegation Service, and to share the design document for discussion
> > and feedback. The Delegation Service is intended to optionally be
> > deployed alongside Polaris to handle the execution of certain
> > long-running tasks.
> >
> > 1. Motivation
> > The Polaris Catalog is optimized for low-latency metadata operations.
> > However, certain tasks such as purging data files for dropped tables
> > are resource-intensive and can impact its core performance. The
> > motivation for this new service is to decouple these I/O-heavy
> > background tasks from the main catalog, ensuring it remains highly
> > responsive while allowing the task execution workload to be managed
> > and scaled independently.
> >
> > 2. Proposal
> > We propose an optional, independent Delegation Service responsible for
> > executing these offloaded operations.
> > The MVP will focus on synchronously handling the data file deletion
> > process for DROP TABLE WITH PURGE commands.
> >
> > 3. Relationship to the "Asynchronous & Reliable Tasks" Proposal
> > This proposal is designed to be highly synergistic with the existing
> > "Asynchronous & Reliable Tasks" proposal.
> >
> > The Asynchronous Task proposal describes a general internal framework
> > for reliably scheduling and managing the lifecycle of any task within
> > Polaris. On the other hand, this proposal defines a specific, external
> > worker service optimized for executing a particular class of I/O-heavy
> > tasks.
> >
> > The Delegation Service does not alter the core Polaris task schema.
> > This allows it to seamlessly act as a specialized "backend" worker
> > that can execute tasks scheduled and managed by the more advanced
> > Asynchronous Task Framework, which would serve as the reliable
> > "frontend." This relationship is explored further in section 10.2 of
> > the document.
> >
> > Please find the detailed design document here for review:
> > -
> >
> https://docs.google.com/document/d/1AhR-cZ6WW6M-z8v53txOfcWvkDXvS-0xcMe3zjLMLj8/edit?usp=sharing
> >
> > Best Regards,
> > William
> >
>

Reply via email to