Hi,
Polaris is a service, which has to eventually perform operations
asynchronously. Polaris is also meant to be backed by multiple server
instances (think: high-availability & load-balancing setups).
During runtime, things can go sideways in many ways. Server instances
may crash, be killed or whatever... Task executions may fail, because
some other remote service fails, configuration values (and credentials)
may be wrong or other error situations.
Task execution should be resilient to both kinds of scenarios: being
able to eventually recover from a "dead/lost node" scenario and to retry
failed tasks.
Each individual task should also be executed only once.
There are also different kinds of tasks with different behaviors: the
"function" being executed and the retry behavior.
Proposal doc for this:
https://docs.google.com/document/d/17D28E2ne5dzOHWc9DJ91Yz3lnQOtgmWaA_TBNdXv0sY/edit?tab=t.0
Robert
--
Robert Stupp
@snazy