Thanks, Daan, I appreciate the feedback and I hope it is useful.  I wish I
could comment more on the other hypervisors and how something similar might
work for them.

On Wed, Oct 6, 2021 at 2:49 AM Daan Hoogland <daan.hoogl...@gmail.com>
wrote:

> thanks Marcus,
>
> On Tue, Oct 5, 2021 at 7:32 PM Marcus <shadow...@gmail.com> wrote:
>
> > Hi everyone! It's been awhile.  I've got a feature I'd love to get some
> > feedback on and contribute to the community, if it's acceptable.  I need
> to
> > brush up on the proper process (did read CONTRIBUTING.md).
>
> Not a lot has changed. The technical discussions have tended to move to
> github issues but there is of course a push back on that as it doesn't
> comply with apache bylaws.
>
> > I should have
> > discussed this *before* implementation, for sure, but since this is
> > something I've already got I figured I'd use it to go through the process
> > and refresh myself on the latest.
> >
> > https://github.com/apache/cloudstack/pull/5552
>
> I like the concept you describe.
>
>
> >
> >
> > If you're familiar with the KVM agent, I needed to provide a way for
> > long-running jobs to be able to react and clean up their work when the
> > agent or management server is stopped, or they just lose connectivity
> with
> > each other.  Currently, if the management server is restarted while the
> > agent is working on something, the agent-side work could continue on and
> > complete, but the management server would fail the job.  This is ok in
> many
> > circumstances, but sometimes this can lead to cruft like copied files
> that
> > are never used.
> >
> as stated on the PR, it is not only a problem for KVM (but one thing at a
> time)
>
>
> >
> > I'm not entirely happy with this as there's potential for race,
> > particularly in de-registration of the hook, but it seems like a
> reasonable
> > start. It just requires the coder of the hook to understand that a
> rollback
> > could be attempted even if the bulk of their task has completed, or not
> > started yet, and account for that, whether they pass a lock, or do a
> "try",
> > or something else.
> >
> reconnect could include a 'status of work' dialog, I imagine this is what
> you mean by "requires the coder .. to understand". If the MS forgot about a
> job, the agent will probably not and can send the information back.
>
> let's see your first work though first (as in +1)
>
>
> --
> Daan
>

Reply via email to