On 6/26/2023 2:27 PM, Peter Xu wrote: > On Fri, Jun 23, 2023 at 02:25:05PM -0400, Steven Sistare wrote: >> On 6/21/2023 4:28 PM, Peter Xu wrote: >>> On Wed, Jun 21, 2023 at 03:15:42PM -0400, Steven Sistare wrote: >>>> On 6/20/2023 5:46 PM, Peter Xu wrote: >>>>> On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote: >>>>>> Migration of a guest in the suspended state is broken. The incoming >>>>>> migration code automatically tries to wake the guest, which IMO is >>>>>> wrong -- the guest should end migration in the same state it started. >>>>>> Further, the wakeup is done by calling qemu_system_wakeup_request(), >>>>>> which >>>>>> bypasses vm_start(). The guest appears to be in the running state, but >>>>>> it is not. >>>>>> >>>>>> To fix, leave the guest in the suspended state, but call >>>>>> qemu_system_start_on_wakeup_request() so the guest is properly resumed >>>>>> later, when the client sends a system_wakeup command. >>>>>> >>>>>> Signed-off-by: Steve Sistare <steven.sist...@oracle.com> >>>>>> --- >>>>>> migration/migration.c | 11 ++++------- >>>>>> softmmu/runstate.c | 1 + >>>>>> 2 files changed, 5 insertions(+), 7 deletions(-) >>>>>> >>>>>> diff --git a/migration/migration.c b/migration/migration.c >>>>>> index 17b4b47..851fe6d 100644 >>>>>> --- a/migration/migration.c >>>>>> +++ b/migration/migration.c >>>>>> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void >>>>>> *opaque) >>>>>> vm_start(); >>>>>> } else { >>>>>> runstate_set(global_state_get_runstate()); >>>>>> + if (runstate_check(RUN_STATE_SUSPENDED)) { >>>>>> + /* Force vm_start to be called later. */ >>>>>> + qemu_system_start_on_wakeup_request(); >>>>>> + } >>>>> >>>>> Is this really needed, along with patch 1? >>>>> >>>>> I have a very limited knowledge on suspension, so I'm prone to making >>>>> mistakes.. >>>>> >>>>> But from what I read this, qemu_system_wakeup_request() (existing one, not >>>>> after patch 1 applied) will setup wakeup_reason and kick the main thread >>>>> using qemu_notify_event(). Then IIUC the e.g. vcpu wakeups will be done >>>>> in >>>>> the main thread later on after qemu_wakeup_requested() returns true. >>>> >>>> Correct, here: >>>> >>>> if (qemu_wakeup_requested()) { >>>> pause_all_vcpus(); >>>> qemu_system_wakeup(); >>>> notifier_list_notify(&wakeup_notifiers, &wakeup_reason); >>>> wakeup_reason = QEMU_WAKEUP_REASON_NONE; >>>> resume_all_vcpus(); >>>> qapi_event_send_wakeup(); >>>> } >>>> >>>> However, that is not sufficient, because vm_start() was never called on >>>> the incoming >>>> side. vm_start calls the vm state notifiers for RUN_STATE_RUNNING, among >>>> other things. >>>> >>>> >>>> Without my fixes, it "works" because the outgoing migration automatically >>>> wakes a suspended >>>> guest, which sets the state to running, which is saved in global state: >>>> >>>> void migration_completion(MigrationState *s) >>>> qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); >>>> global_state_store() >>>> >>>> Then the incoming migration calls vm_start here: >>>> >>>> migration/migration.c >>>> if (!global_state_received() || >>>> global_state_get_runstate() == RUN_STATE_RUNNING) { >>>> if (autostart) { >>>> vm_start(); >>>> >>>> vm_start must be called for correctness. >>> >>> I see. Though I had a feeling that this is still not the right way to do, >>> at least not as clean. >>> >>> One question is, would above work for postcopy when VM is suspended during >>> the switchover? >> >> Good catch, that is broken. >> I added qemu_system_start_on_wakeup_request to loadvm_postcopy_handle_run_bh >> and now it works. >> >> if (global_state_get_runstate() == RUN_STATE_RUNNING) { >> if (autostart) { >> vm_start(); >> } else { >> runstate_set(RUN_STATE_PAUSED); >> } >> } else { >> runstate_set(global_state_get_runstate()); >> if (runstate_check(RUN_STATE_SUSPENDED)) { >> qemu_system_start_on_wakeup_request(); >> } >> } >> >>> I think I see your point that vm_start() (mostly vm_prepare_start()) >>> contains a bunch of operations that maybe we must have before starting the >>> VM, but then.. should we just make that vm_start() unconditional when >>> loading VM completes? I just don't see anything won't need it (besides >>> -S), even COLO. >>> >>> So I'm wondering about something like this: >>> >>> ===8<=== >>> --- a/migration/migration.c >>> +++ b/migration/migration.c >>> @@ -481,19 +481,28 @@ static void process_incoming_migration_bh(void >>> *opaque) >>> >>> dirty_bitmap_mig_before_vm_start(); >>> >>> - if (!global_state_received() || >>> - global_state_get_runstate() == RUN_STATE_RUNNING) { >>> - if (autostart) { >>> - vm_start(); >>> - } else { >>> - runstate_set(RUN_STATE_PAUSED); >>> - } >>> - } else if (migration_incoming_colo_enabled()) { >>> + if (migration_incoming_colo_enabled()) { >>> migration_incoming_disable_colo(); >>> + /* COLO should always have autostart=1 or we can enforce it here */ >>> + } >>> + >>> + if (autostart) { >>> + RunState run_state = global_state_get_runstate(); >>> vm_start(); >> >> This will resume the guest for a brief time, against the user's wishes. Not >> OK IMO. > > Ah okay.. > > Can we do whatever we need in vm_prepare_start(), then? I assume these > chunks are needed: > > /* > * WHPX accelerator needs to know whether we are going to step > * any CPUs, before starting the first one. > */ > if (cpus_accel->synchronize_pre_resume) { > cpus_accel->synchronize_pre_resume(step_pending); > } > > /* We are sending this now, but the CPUs will be resumed shortly later */ > qapi_event_send_resume(); > > cpu_enable_ticks(); > > While these may not be needed, but instead only needed if RUN_STATE_RUNNING > below (runstate_set() will be needed regardless): > > runstate_set(RUN_STATE_RUNNING); > vm_state_notify(1, RUN_STATE_RUNNING); > > So here's another of my attempt (this time also taking > !global_state_received() into account)... > > RunState run_state; > > if (migration_incoming_colo_enabled()) { > migration_incoming_disable_colo(); > } > > if (!global_state_received()) { > run_state = RUN_STATE_RUNNING; > } else { > run_state = global_state_get_runstate(); > } > > if (autostart) { > /* Part of vm_prepare_start(), may isolate into a helper? */ > if (cpus_accel->synchronize_pre_resume) { > cpus_accel->synchronize_pre_resume(step_pending); > } > qapi_event_send_resume(); > cpu_enable_ticks(); > /* Setup the runstate on src */ > runstate_set(run_state); > if (run_state == RUN_STATE_RUNNING) { > vm_state_notify(1, RUN_STATE_RUNNING); > } > } else { > runstate_set(RUN_STATE_PAUSED); > } > > The whole idea is still do whatever needed here besides resuming the vcpus, > rather than leaving the whole vm_start() into system wakeup. It then can > avoid touching the system wakeup code which seems cleaner.
The problem is that some actions cannot be performed at migration finish time, such as vm_state_notify RUN_STATE_RUNNING. The wakeup code called later still needs to know that vm_state_notify has not been called, and call it. I just posted a new series with a cleaner wakeup, but it still uses a global. - Steve >>> IIUC this can drop qemu_system_start_on_wakeup_request() along with the >>> other global var. Would something like it work for us? >> >> Afraid not. start_on_wake is the only correct solution I can think of. > > Please check again above, I just hope we can avoid yet another global to > QEMU if possible. > > Thanks, >