On Tue, 4 Oct 2011 14:04:45 +0200 Paolo Bonzini <pbonz...@redhat.com> wrote:
> Trying to migrate a paused machine fails. The reason is that > the RSTATE_PRE_MIGRATE is reached with vm_stop, and this > transition is eaten when the vm is already paused. This patch > fixes the problem by always going through runstate_set and > always notifying the new state. Let's start over, this time CC'ing Jan, Anthony and Avi. Basically, what Paolo is describing above is this: 1. The user issues the stop command. vm_stop() will set the state to RSTATE_PAUSED 2. The user starts a migration. migrate_fd_put_ready() will call vm_stop(RSTATE_PRE_MIGRATE). However, the VM is already stopped so vm_stop() just returns (IOW, the state is still RSTATE_PAUSED) 3. The migration process completes. migrate_fd_put_ready() will now call runstate_set(RSTATE_POST_MIGRATE), which in turn causes the transition RSTATE_PAUSED -> RSTATE_POST_MIGRATE, which is invalid and the world of qemu ends Now, we have three options to fix this but I don't know which one to choose: 1. We could just add the transition RSTATE_PAUSED -> RSTATE_POST_MIGRATE as valid. Not sure this is a good thing to do though, as it seems a silly workaround for the fact that the transition to RSTATE_PRE_MIGRATE has never occurred 2. This patch makes vm_stop() do the state transition even if the VM is already stopped. Seems good enough, except that I fear two things. First, today we know that vm_stop() is a no-op if the VM is already stopped, so there's a semantic change that could turn out to be trap. Second, I also fear people using vm_stop() as a way to change the VM status, just like runstate_set() (which can also become an horrible trap) 3. Avi suggested we should keep a reference count, so that states are not discarded: http://lists.gnu.org/archive/html/qemu-devel/2011-08/msg00595.html That solution seemed to be the perfect one, except for one important detail: how should we implement vm_start() (and thus 'cont')? In order to maintain how we behave with the external world, the only option is that vm_start() will set the stop count to 0. Ie, doesn't matter if we have stopped the VM 500 times at some point, a vm_start() call will discard all stored states. Not sure if that's what you expected, but the first time I read Avi's idea I had the impression that it would be a good idea that vm_start() decremented the ref count only once, ie. vm_stop() and vm_start() calls have to match. > > Signed-off-by: Paolo Bonzini <pbonz...@redhat.com> > --- > cpus.c | 2 ++ > 1 files changed, 2 insertions(+), 0 deletions(-) > > diff --git a/cpus.c b/cpus.c > index 8978779..eab8ff6 100644 > --- a/cpus.c > +++ b/cpus.c > @@ -128,6 +128,8 @@ static void do_vm_stop(RunState state) > qemu_aio_flush(); > bdrv_flush_all(); > monitor_protocol_event(QEVENT_STOP, NULL); > + } else { > + runstate_set(state); > } > } >