On Tue, Sep 03, 2013 at 10:53:14PM +0000, Kelven Yang wrote: > This is a design issue that we need to improve in general. However, a > simple roll back logic does not solve the problem, since abnormal > terminate can happen at any time, which means it can happen in the middle > of job cancellation process as well. > > Under current architecture, the cleanup work is handled in VM sync > process, we allow jobs to cancel or fail at anytime, this design decision > may leave temporary failures to operations that are currently carried in > the stopping/crashed management server, VM sync process will do > self-healing and carry back of the consistency of system data. This design > choice itself is still acceptable to a certain level, unfortunately, this > process is buggy in current CloudStack releases. The example Marcus gave > falls in the category of having bug in re-sync VM in migrating state > (basically to fail it and allow user to re-issue the command). > > I've refactored the modeling used by VM sync process but wasn't able to > merge into the main branch for 4.2 release due to concerns from community > about its late readiness time for architecture changes. Will reiterate the > merge effort after 4.2 release.
Now would be a good time to consider merging into master...