On 6/23/2023 2:25 PM, Steven Sistare wrote: > On 6/21/2023 4:28 PM, Peter Xu wrote: >> On Wed, Jun 21, 2023 at 03:15:42PM -0400, Steven Sistare wrote: >>> On 6/20/2023 5:46 PM, Peter Xu wrote: >>>> On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote: >>>>> Migration of a guest in the suspended state is broken. The incoming >>>>> migration code automatically tries to wake the guest, which IMO is >>>>> wrong -- the guest should end migration in the same state it started. >>>>> Further, the wakeup is done by calling qemu_system_wakeup_request(), which >>>>> bypasses vm_start(). The guest appears to be in the running state, but >>>>> it is not. >>>>> >>>>> To fix, leave the guest in the suspended state, but call >>>>> qemu_system_start_on_wakeup_request() so the guest is properly resumed >>>>> later, when the client sends a system_wakeup command. >>>>> >>>>> Signed-off-by: Steve Sistare <steven.sist...@oracle.com> >>>>> --- >>>>> migration/migration.c | 11 ++++------- >>>>> softmmu/runstate.c | 1 + >>>>> 2 files changed, 5 insertions(+), 7 deletions(-) >>>>> >>>>> diff --git a/migration/migration.c b/migration/migration.c >>>>> index 17b4b47..851fe6d 100644 >>>>> --- a/migration/migration.c >>>>> +++ b/migration/migration.c >>>>> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void >>>>> *opaque) >>>>> vm_start(); >>>>> } else { >>>>> runstate_set(global_state_get_runstate()); >>>>> + if (runstate_check(RUN_STATE_SUSPENDED)) { >>>>> + /* Force vm_start to be called later. */ >>>>> + qemu_system_start_on_wakeup_request(); >>>>> + } >>>> >>>> Is this really needed, along with patch 1? >>>> >>>> I have a very limited knowledge on suspension, so I'm prone to making >>>> mistakes.. >>>> >>>> But from what I read this, qemu_system_wakeup_request() (existing one, not >>>> after patch 1 applied) will setup wakeup_reason and kick the main thread >>>> using qemu_notify_event(). Then IIUC the e.g. vcpu wakeups will be done in >>>> the main thread later on after qemu_wakeup_requested() returns true. >>> >>> Correct, here: >>> >>> if (qemu_wakeup_requested()) { >>> pause_all_vcpus(); >>> qemu_system_wakeup(); >>> notifier_list_notify(&wakeup_notifiers, &wakeup_reason); >>> wakeup_reason = QEMU_WAKEUP_REASON_NONE; >>> resume_all_vcpus(); >>> qapi_event_send_wakeup(); >>> } >>> >>> However, that is not sufficient, because vm_start() was never called on the >>> incoming >>> side. vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among >>> other things. >>> >>> >>> Without my fixes, it "works" because the outgoing migration automatically >>> wakes a suspended >>> guest, which sets the state to running, which is saved in global state: >>> >>> void migration_completion(MigrationState *s) >>> qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); >>> global_state_store() >>> >>> Then the incoming migration calls vm_start here: >>> >>> migration/migration.c >>> if (!global_state_received() || >>> global_state_get_runstate() == RUN_STATE_RUNNING) { >>> if (autostart) { >>> vm_start(); >>> >>> vm_start must be called for correctness. >> >> I see. Though I had a feeling that this is still not the right way to do, >> at least not as clean. >> >> One question is, would above work for postcopy when VM is suspended during >> the switchover? > > Good catch, that is broken. > I added qemu_system_start_on_wakeup_request to loadvm_postcopy_handle_run_bh > and now it works. > > if (global_state_get_runstate() == RUN_STATE_RUNNING) { > if (autostart) { > vm_start(); > } else { > runstate_set(RUN_STATE_PAUSED); > } > } else { > runstate_set(global_state_get_runstate()); > if (runstate_check(RUN_STATE_SUSPENDED)) { > qemu_system_start_on_wakeup_request(); > } > } > >> I think I see your point that vm_start() (mostly vm_prepare_start()) >> contains a bunch of operations that maybe we must have before starting the >> VM, but then.. should we just make that vm_start() unconditional when >> loading VM completes? I just don't see anything won't need it (besides >> -S), even COLO. >> >> So I'm wondering about something like this: >> >> ===8<=== >> --- a/migration/migration.c >> +++ b/migration/migration.c >> @@ -481,19 +481,28 @@ static void process_incoming_migration_bh(void *opaque) >> >> dirty_bitmap_mig_before_vm_start(); >> >> - if (!global_state_received() || >> - global_state_get_runstate() == RUN_STATE_RUNNING) { >> - if (autostart) { >> - vm_start(); >> - } else { >> - runstate_set(RUN_STATE_PAUSED); >> - } >> - } else if (migration_incoming_colo_enabled()) { >> + if (migration_incoming_colo_enabled()) { >> migration_incoming_disable_colo(); >> + /* COLO should always have autostart=1 or we can enforce it here */ >> + } >> + >> + if (autostart) { >> + RunState run_state = global_state_get_runstate(); >> vm_start(); > > This will resume the guest for a brief time, against the user's wishes. Not > OK IMO. > >> + switch (run_state) { >> + case RUN_STATE_RUNNING: >> + break; >> + case RUN_STATE_SUSPENDED: >> + qemu_system_suspend(); > > qemu_system_suspend will not cause the guest to suspend. It is qemu's > response after > the guest initiates a suspend. > >> + break; >> + default: >> + runstate_set(run_state); >> + break; >> + } >> } else { >> - runstate_set(global_state_get_runstate()); >> + runstate_set(RUN_STATE_PAUSED); >> } >> ===8<=== >> >> IIUC this can drop qemu_system_start_on_wakeup_request() along with the >> other global var. Would something like it work for us? > > Afraid not. start_on_wake is the only correct solution I can think of. > > - Steve
Actually, the start_on_wake concept is indeed required, but I can hide the details in softmmu/cpus.s: static bool vm_never_started = true; void vm_start(void) { if (!vm_prepare_start(false)) { vm_never_started = false; resume_all_vcpus(); } } void vm_wakeup(void) { if (vm_never_started) { vm_start(); } else { runstate_set(RUN_STATE_RUNNING); } } ... and qemu_system_wakeup_request() calls vm_wakeup(). Now the migration code simply sets the run state (eg SUSPENDED), no more calling qemu_system_start_on_wakeup_request. - Steve