Hi,

Polaris is a service, which has to eventually perform operations asynchronously. Polaris is also meant to be backed by multiple server instances (think: high-availability & load-balancing setups).

During runtime, things can go sideways in many ways. Server instances may crash, be killed or whatever... Task executions may fail, because some other remote service fails, configuration values (and credentials) may be wrong or other error situations.

Task execution should be resilient to both kinds of scenarios: being able to eventually recover from a "dead/lost node" scenario and to retry failed tasks.

Each individual task should also be executed only once.

There are also different kinds of tasks with different behaviors: the "function" being executed and the retry behavior.

Proposal doc for this: https://docs.google.com/document/d/17D28E2ne5dzOHWc9DJ91Yz3lnQOtgmWaA_TBNdXv0sY/edit?tab=t.0

Robert


--
Robert Stupp
@snazy

Reply via email to