Hi everyone! It's been awhile.  I've got a feature I'd love to get some
feedback on and contribute to the community, if it's acceptable.  I need to
brush up on the proper process (did read CONTRIBUTING.md). I should have
discussed this *before* implementation, for sure, but since this is
something I've already got I figured I'd use it to go through the process
and refresh myself on the latest.

https://github.com/apache/cloudstack/pull/5552

If you're familiar with the KVM agent, I needed to provide a way for
long-running jobs to be able to react and clean up their work when the
agent or management server is stopped, or they just lose connectivity with
each other.  Currently, if the management server is restarted while the
agent is working on something, the agent-side work could continue on and
complete, but the management server would fail the job.  This is ok in many
circumstances, but sometimes this can lead to cruft like copied files that
are never used.

I'm not entirely happy with this as there's potential for race,
particularly in de-registration of the hook, but it seems like a reasonable
start. It just requires the coder of the hook to understand that a rollback
could be attempted even if the bulk of their task has completed, or not
started yet, and account for that, whether they pass a lock, or do a "try",
or something else.

Reply via email to