RIght, the idea is to have a "common abstraction" first. I'm actively looking into exactly that at the moment. WIll come up with a couple PRs to enable this. Some of it is implicitly covered by the work that Christopher's contributing, although it's rather orthogonal.
On Fri, Aug 1, 2025 at 6:54 PM Eric Maynard <eric.w.mayn...@gmail.com> wrote: > > I agree with Robert that the current implementation is not good and should > be ripped out ASAP. However, I see this effort as complementary to Will's > refactor, not as a dependency. We should first add a layer of abstraction > between the business logic in Polaris and the task execution -- once that's > in place, we can replace the existing task implementation behind that > abstraction. At the same time, adding this abstraction will unlock the > ability for us to implement remote task execution as well. > > --EM > > On Fri, Aug 1, 2025 at 6:31 AM Yufei Gu <flyrain...@gmail.com> wrote: > > > Thanks for the async task proposal. I think it's the right direction > > for async light tasks. Meanwhile, we will still need other models: > > 1. A scalable way to execute synchronous tasks > > 2. A scalable way to execute heavy async tasks, e.g., table maintenance > > tasks. > > > > The delegation service[1] is a good candidate for that. > > > > 1. > > > > https://docs.google.com/document/d/1AhR-cZ6WW6M-z8v53txOfcWvkDXvS-0xcMe3zjLMLj8/edit?tab=t.0#heading=h.xjibr7sfbv6a > > > > Yufei > > > > > > On Thu, Jul 31, 2025 at 11:37 AM Russell Spitzer < > > russellspit...@apache.org> > > wrote: > > > > > I'm fine with the plan although I think we should probably change step 4 > > > to allow both the current implementation and the new implementation to > > > exist at the same time with a flag for switching over to the new task > > > implementation. While the new implementation may be much better, it is a > > > pretty significant behavior change that I think should be opt in until > > it's > > > been in Polaris for a release or two. After that we could force all users > > > to switch once it's been out in the wild for a bit. > > > > > > On 2025/07/30 01:30:43 William Hyun wrote: > > > > > > > > > > Considering the current issues, I don't think it's worth the effort > > to > > > > > keep the current implementation. > > > > > > > > > > > > It seems risky to me to not support the current implementation at least > > > for > > > > the period where the new tasks implementation is unstable. > > > > > > > > Bests, > > > > William > > > > > > > > On Tue, Jul 29, 2025 at 3:49 AM Robert Stupp <sn...@snazy.de> wrote: > > > > > > > > > Hi, > > > > > > > > > > (starting w/ a recap for everybody watching this thread) > > > > > The goal of this is to have a mechanism to guarantee the _eventual_ > > > > > execution of a task. That may happen immediately on the same node or > > > > > at a later time on another node. > > > > > This particular "async reliable tasks" is to ensure that tasks run > > > > > eventually in any Polaris node. The related "Delegation Service" > > > > > proposal is to let tasks run in a separate, different remote service. > > > > > But it requires a "local fallback" in case the remote service is not > > > > > available, which would be provided by this proposal. > > > > > > > > > > Currently, all scheduled and running tasks are "lost", if Polaris is > > > > > stopped, killed or crashed. So I'd prefer to get this proposal in > > > > > first to address the current issues and have a reliable fallback for > > > > > the Delegation Service. > > > > > > > > > > Considering the current issues, I don't think it's worth the effort > > to > > > > > keep the current implementation. > > > > > > > > > > Both, this proposal and the Delegation Service, shouldn't rely on > > > > > Polaris entities but rather have targeted definitions for the tasks > > to > > > > > execute, which contain exactly (and not more) what the tasks need to > > > > > be executed. > > > > > > > > > > So I think the following steps (approx 1 PR for each) would be: > > > > > 1. Add the tasks API (the draft PR [1]) > > > > > 2. Add the tasks implementation, w/o any persistence integration but > > > > > with mock testing > > > > > 3. Add persistence integration > > > > > 4. Replace current task implementation with the new one > > > > > > > > > > I'll probably have more details soon-ish. > > > > > > > > > > Robert > > > > > > > > > > [1] https://github.com/apache/polaris/pull/2180 > > > > > > > > > > > > > > > > > > > > On Mon, Jul 28, 2025 at 6:22 AM William Hyun <will...@apache.org> > > > wrote: > > > > > > > > > > > > Hey Robert! > > > > > > > > > > > > Thank you for the draft PR. > > > > > > I have taken a look and the general approach seems good to me. > > > > > > However, one of my concerns would be the timeline to deliver this > > new > > > > > > task framework refactoring as this could be intrusive due to the > > > scope > > > > > > of the change. > > > > > > What do you plan as the ETA for delivering this change? > > > > > > > > > > > > It seems we need to support both the pre-existing (v1) and new task > > > > > > framework (v2) until we are sure that v2 is stabilized so that we > > can > > > > > > delete v1. > > > > > > With the Delegation Service proposal being a new feature for > > users, I > > > > > > am proposing to include it within the 1.1 release as a small, > > > optional > > > > > > extension and also support it in v2 by reusing via implementing > > v2's > > > > > > SPI module as we previously discussed. > > > > > > I also have opened a PR demonstrating what the Delegation Service > > > > > > looks like here: > > > > > > > > > > > > - https://github.com/apache/polaris/pull/2193 > > > > > > > > > > > > WDYT? > > > > > > > > > > > > Bests, > > > > > > William > > > > > > > > > > > > On Thu, Jul 24, 2025 at 11:18 AM Robert Stupp <sn...@snazy.de> > > > wrote: > > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > As discussed on the Polaris Community Sync today, we're aligned > > > that > > > > > > > the current tasks handling needs some refactoring. > > > > > > > > > > > > > > This proposal focuses on the "eventual execution" of a task. > > > > > > > Implementations for would follow. > > > > > > > The "Delegation Service" [1] proposal focuses on the execution > > of > > > > > > > tasks "outside" of Polaris. > > > > > > > > > > > > > > I've pushed a draft PR [2] with the Java interfaces and value > > types > > > > > > > for the API, the SPI (behavior implementation) and store (used by > > > > > > > tasks implementations). > > > > > > > > > > > > > > The only entry point is the `org.apache.polaris.tasks.api.Tasks` > > > > > > > interface with a function defining the behavior and providing a > > > > > > > parameter object (if necessary), returning a `TaskSubmission`. > > Call > > > > > > > sites _may_ subscribe to a `CompletionStage`, but the idea is > > that > > > > > > > it's rather "fire and forget" and the task behavior does > > > "everything > > > > > > > that's needed". This allows the task to be executed on any node. > > > > > > > There's no guarantee in any form that a task will run "locally" > > or > > > any > > > > > > > other specific node. Every Polaris node can handle task execution > > > and > > > > > > > perform failure/retry handling. Polaris nodes may use a "server" > > > > > > > implementation or a "client" implementation or a "remote" > > > > > > > implementation - that's defined upon deployment or by > > configuration > > > > > > > (TBD). > > > > > > > > > > > > > > I think that we can get to a Polaris internal API/SPI that can be > > > > > > > leveraged by both proposals. > > > > > > > This proposal is implementation and persistence backend agnostic. > > > > > > > There could be a "server" implementation that can run tasks, a > > > > > > > "client" implementation that can only submit tasks (think: from > > the > > > > > > > polaris-admin tool), and an implementation for the delegation > > > service > > > > > > > to execute tasks remotely. > > > > > > > > > > > > > > I do have a working implementation sitting around locally that's > > > > > > > passing tests exercising concurrency, multi-node and failure > > > > > > > scenarios. Since there's only a store-implementation for NoSQL, I > > > > > > > haven't pushed that yet. Adding a store-implementation that > > solely > > > > > > > uses `BasePersistence``(JDBC) is not such a big deal. > > > > > > > > > > > > > > If we're okay with the approach in general, I can follow up with > > a > > > > > > > more concrete implementation including the "purge table" use case > > > and > > > > > > > maybe another example use case. > > > > > > > > > > > > > > Robert > > > > > > > > > > > > > > [1] > > > https://lists.apache.org/thread/ph10th4ocjczpf5gz17mqys4fkp5qrzw > > > > > > > [2] https://github.com/apache/polaris/pull/2180 > > > > > > > > > > > > > > On Mon, May 19, 2025 at 12:05 PM Robert Stupp <sn...@snazy.de> > > > wrote: > > > > > > > > > > > > > > > > Yes, each "task behavior" has an ID. I've chosen the term "task > > > > > > > > behavior" over "type", because it doesn't only define "what's > > > done" > > > > > but > > > > > > > > also "when" it's done (delay) and "how it behaves" (retries on > > > > > failures). > > > > > > > > > > > > > > > > On 14.05.25 04:25, Adnan Hemani wrote: > > > > > > > > > Hi Robert, > > > > > > > > > > > > > > > > > > Firstly, thanks for this document. One quick question: is the > > > > > `behavior ID` basically the task type? This part was slightly unclear > > > to me. > > > > > > > > > > > > > > > > > > Best, > > > > > > > > > Adnan Hemani > > > > > > > > > > > > > > > > > >> On May 9, 2025, at 6:07 AM, Robert Stupp <sn...@snazy.de> > > > wrote: > > > > > > > > >> > > > > > > > > >> Hi, > > > > > > > > >> > > > > > > > > >> Polaris is a service, which has to eventually perform > > > operations > > > > > asynchronously. Polaris is also meant to be backed by multiple server > > > > > instances (think: high-availability & load-balancing setups). > > > > > > > > >> > > > > > > > > >> During runtime, things can go sideways in many ways. Server > > > > > instances may crash, be killed or whatever... Task executions may > > fail, > > > > > because some other remote service fails, configuration values (and > > > > > credentials) may be wrong or other error situations. > > > > > > > > >> > > > > > > > > >> Task execution should be resilient to both kinds of > > scenarios: > > > > > being able to eventually recover from a "dead/lost node" scenario and > > > to > > > > > retry failed tasks. > > > > > > > > >> > > > > > > > > >> Each individual task should also be executed only once. > > > > > > > > >> > > > > > > > > >> There are also different kinds of tasks with different > > > behaviors: > > > > > the "function" being executed and the retry behavior. > > > > > > > > >> > > > > > > > > >> Proposal doc for this: > > > > > > > > > > https://www.google.com/url?q=https://docs.google.com/document/d/17D28E2ne5dzOHWc9DJ91Yz3lnQOtgmWaA_TBNdXv0sY/edit?tab%3Dt.0&source=gmail-imap&ust=1747400861000000&usg=AOvVaw3x56ChuB1ga0MelG6URxxi > > > > > > > > >> > > > > > > > > >> Robert > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> -- > > > > > > > > >> Robert Stupp > > > > > > > > >> @snazy > > > > > > > > >> > > > > > > > > -- > > > > > > > > Robert Stupp > > > > > > > > @snazy > > > > > > > > > > > > > > > > > > > > > >