On 6/21/2023 4:28 PM, Peter Xu wrote: > On Wed, Jun 21, 2023 at 03:15:42PM -0400, Steven Sistare wrote: >> On 6/20/2023 5:46 PM, Peter Xu wrote: >>> On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote: >>>> Migration of a guest in the suspended state is broken. The incoming >>>> migration code automatically tries to wake the guest, which IMO is >>>> wrong -- the guest should end migration in the same state it started. >>>> Further, the wakeup is done by calling qemu_system_wakeup_request(), which >>>> bypasses vm_start(). The guest appears to be in the running state, but >>>> it is not. >>>> >>>> To fix, leave the guest in the suspended state, but call >>>> qemu_system_start_on_wakeup_request() so the guest is properly resumed >>>> later, when the client sends a system_wakeup command. >>>> >>>> Signed-off-by: Steve Sistare <steven.sist...@oracle.com> >>>> --- >>>> migration/migration.c | 11 ++++------- >>>> softmmu/runstate.c | 1 + >>>> 2 files changed, 5 insertions(+), 7 deletions(-) >>>> >>>> diff --git a/migration/migration.c b/migration/migration.c >>>> index 17b4b47..851fe6d 100644 >>>> --- a/migration/migration.c >>>> +++ b/migration/migration.c >>>> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void >>>> *opaque) >>>> vm_start(); >>>> } else { >>>> runstate_set(global_state_get_runstate()); >>>> + if (runstate_check(RUN_STATE_SUSPENDED)) { >>>> + /* Force vm_start to be called later. */ >>>> + qemu_system_start_on_wakeup_request(); >>>> + } >>> >>> Is this really needed, along with patch 1? >>> >>> I have a very limited knowledge on suspension, so I'm prone to making >>> mistakes.. >>> >>> But from what I read this, qemu_system_wakeup_request() (existing one, not >>> after patch 1 applied) will setup wakeup_reason and kick the main thread >>> using qemu_notify_event(). Then IIUC the e.g. vcpu wakeups will be done in >>> the main thread later on after qemu_wakeup_requested() returns true. >> >> Correct, here: >> >> if (qemu_wakeup_requested()) { >> pause_all_vcpus(); >> qemu_system_wakeup(); >> notifier_list_notify(&wakeup_notifiers, &wakeup_reason); >> wakeup_reason = QEMU_WAKEUP_REASON_NONE; >> resume_all_vcpus(); >> qapi_event_send_wakeup(); >> } >> >> However, that is not sufficient, because vm_start() was never called on the >> incoming >> side. vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among >> other things. >> >> >> Without my fixes, it "works" because the outgoing migration automatically >> wakes a suspended >> guest, which sets the state to running, which is saved in global state: >> >> void migration_completion(MigrationState *s) >> qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); >> global_state_store() >> >> Then the incoming migration calls vm_start here: >> >> migration/migration.c >> if (!global_state_received() || >> global_state_get_runstate() == RUN_STATE_RUNNING) { >> if (autostart) { >> vm_start(); >> >> vm_start must be called for correctness. > > I see. Though I had a feeling that this is still not the right way to do, > at least not as clean. > > One question is, would above work for postcopy when VM is suspended during > the switchover?
Good catch, that is broken. I added qemu_system_start_on_wakeup_request to loadvm_postcopy_handle_run_bh and now it works. if (global_state_get_runstate() == RUN_STATE_RUNNING) { if (autostart) { vm_start(); } else { runstate_set(RUN_STATE_PAUSED); } } else { runstate_set(global_state_get_runstate()); if (runstate_check(RUN_STATE_SUSPENDED)) { qemu_system_start_on_wakeup_request(); } } > I think I see your point that vm_start() (mostly vm_prepare_start()) > contains a bunch of operations that maybe we must have before starting the > VM, but then.. should we just make that vm_start() unconditional when > loading VM completes? I just don't see anything won't need it (besides > -S), even COLO. > > So I'm wondering about something like this: > > ===8<=== > --- a/migration/migration.c > +++ b/migration/migration.c > @@ -481,19 +481,28 @@ static void process_incoming_migration_bh(void *opaque) > > dirty_bitmap_mig_before_vm_start(); > > - if (!global_state_received() || > - global_state_get_runstate() == RUN_STATE_RUNNING) { > - if (autostart) { > - vm_start(); > - } else { > - runstate_set(RUN_STATE_PAUSED); > - } > - } else if (migration_incoming_colo_enabled()) { > + if (migration_incoming_colo_enabled()) { > migration_incoming_disable_colo(); > + /* COLO should always have autostart=1 or we can enforce it here */ > + } > + > + if (autostart) { > + RunState run_state = global_state_get_runstate(); > vm_start(); This will resume the guest for a brief time, against the user's wishes. Not OK IMO. > + switch (run_state) { > + case RUN_STATE_RUNNING: > + break; > + case RUN_STATE_SUSPENDED: > + qemu_system_suspend(); qemu_system_suspend will not cause the guest to suspend. It is qemu's response after the guest initiates a suspend. > + break; > + default: > + runstate_set(run_state); > + break; > + } > } else { > - runstate_set(global_state_get_runstate()); > + runstate_set(RUN_STATE_PAUSED); > } > ===8<=== > > IIUC this can drop qemu_system_start_on_wakeup_request() along with the > other global var. Would something like it work for us? Afraid not. start_on_wake is the only correct solution I can think of. - Steve