On Mon, Jun 26, 2023 at 02:27:32PM -0400, Peter Xu wrote: > On Fri, Jun 23, 2023 at 02:25:05PM -0400, Steven Sistare wrote: > > On 6/21/2023 4:28 PM, Peter Xu wrote: > > > On Wed, Jun 21, 2023 at 03:15:42PM -0400, Steven Sistare wrote: > > >> On 6/20/2023 5:46 PM, Peter Xu wrote: > > >>> On Thu, Jun 15, 2023 at 01:26:39PM -0700, Steve Sistare wrote: > > >>>> Migration of a guest in the suspended state is broken. The incoming > > >>>> migration code automatically tries to wake the guest, which IMO is > > >>>> wrong -- the guest should end migration in the same state it started. > > >>>> Further, the wakeup is done by calling qemu_system_wakeup_request(), > > >>>> which > > >>>> bypasses vm_start(). The guest appears to be in the running state, but > > >>>> it is not. > > >>>> > > >>>> To fix, leave the guest in the suspended state, but call > > >>>> qemu_system_start_on_wakeup_request() so the guest is properly resumed > > >>>> later, when the client sends a system_wakeup command. > > >>>> > > >>>> Signed-off-by: Steve Sistare <steven.sist...@oracle.com> > > >>>> --- > > >>>> migration/migration.c | 11 ++++------- > > >>>> softmmu/runstate.c | 1 + > > >>>> 2 files changed, 5 insertions(+), 7 deletions(-) > > >>>> > > >>>> diff --git a/migration/migration.c b/migration/migration.c > > >>>> index 17b4b47..851fe6d 100644 > > >>>> --- a/migration/migration.c > > >>>> +++ b/migration/migration.c > > >>>> @@ -496,6 +496,10 @@ static void process_incoming_migration_bh(void > > >>>> *opaque) > > >>>> vm_start(); > > >>>> } else { > > >>>> runstate_set(global_state_get_runstate()); > > >>>> + if (runstate_check(RUN_STATE_SUSPENDED)) { > > >>>> + /* Force vm_start to be called later. */ > > >>>> + qemu_system_start_on_wakeup_request(); > > >>>> + } > > >>> > > >>> Is this really needed, along with patch 1? > > >>> > > >>> I have a very limited knowledge on suspension, so I'm prone to making > > >>> mistakes.. > > >>> > > >>> But from what I read this, qemu_system_wakeup_request() (existing one, > > >>> not > > >>> after patch 1 applied) will setup wakeup_reason and kick the main thread > > >>> using qemu_notify_event(). Then IIUC the e.g. vcpu wakeups will be > > >>> done in > > >>> the main thread later on after qemu_wakeup_requested() returns true. > > >> > > >> Correct, here: > > >> > > >> if (qemu_wakeup_requested()) { > > >> pause_all_vcpus(); > > >> qemu_system_wakeup(); > > >> notifier_list_notify(&wakeup_notifiers, &wakeup_reason); > > >> wakeup_reason = QEMU_WAKEUP_REASON_NONE; > > >> resume_all_vcpus(); > > >> qapi_event_send_wakeup(); > > >> } > > >> > > >> However, that is not sufficient, because vm_start() was never called on > > >> the incoming > > >> side. vm_start calls the vm state notifiers for RUN_STATE_RUNNING, > > >> among other things. > > >> > > >> > > >> Without my fixes, it "works" because the outgoing migration > > >> automatically wakes a suspended > > >> guest, which sets the state to running, which is saved in global state: > > >> > > >> void migration_completion(MigrationState *s) > > >> qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL); > > >> global_state_store() > > >> > > >> Then the incoming migration calls vm_start here: > > >> > > >> migration/migration.c > > >> if (!global_state_received() || > > >> global_state_get_runstate() == RUN_STATE_RUNNING) { > > >> if (autostart) { > > >> vm_start(); > > >> > > >> vm_start must be called for correctness. > > > > > > I see. Though I had a feeling that this is still not the right way to do, > > > at least not as clean. > > > > > > One question is, would above work for postcopy when VM is suspended during > > > the switchover? > > > > Good catch, that is broken. > > I added qemu_system_start_on_wakeup_request to loadvm_postcopy_handle_run_bh > > and now it works. > > > > if (global_state_get_runstate() == RUN_STATE_RUNNING) { > > if (autostart) { > > vm_start(); > > } else { > > runstate_set(RUN_STATE_PAUSED); > > } > > } else { > > runstate_set(global_state_get_runstate()); > > if (runstate_check(RUN_STATE_SUSPENDED)) { > > qemu_system_start_on_wakeup_request(); > > } > > } > > > > > I think I see your point that vm_start() (mostly vm_prepare_start()) > > > contains a bunch of operations that maybe we must have before starting the > > > VM, but then.. should we just make that vm_start() unconditional when > > > loading VM completes? I just don't see anything won't need it (besides > > > -S), even COLO. > > > > > > So I'm wondering about something like this: > > > > > > ===8<=== > > > --- a/migration/migration.c > > > +++ b/migration/migration.c > > > @@ -481,19 +481,28 @@ static void process_incoming_migration_bh(void > > > *opaque) > > > > > > dirty_bitmap_mig_before_vm_start(); > > > > > > - if (!global_state_received() || > > > - global_state_get_runstate() == RUN_STATE_RUNNING) { > > > - if (autostart) { > > > - vm_start(); > > > - } else { > > > - runstate_set(RUN_STATE_PAUSED); > > > - } > > > - } else if (migration_incoming_colo_enabled()) { > > > + if (migration_incoming_colo_enabled()) { > > > migration_incoming_disable_colo(); > > > + /* COLO should always have autostart=1 or we can enforce it here > > > */ > > > + } > > > + > > > + if (autostart) { > > > + RunState run_state = global_state_get_runstate(); > > > vm_start(); > > > > This will resume the guest for a brief time, against the user's wishes. > > Not OK IMO. > > Ah okay.. > > Can we do whatever we need in vm_prepare_start(), then? I assume these > chunks are needed: > > /* > * WHPX accelerator needs to know whether we are going to step > * any CPUs, before starting the first one. > */ > if (cpus_accel->synchronize_pre_resume) { > cpus_accel->synchronize_pre_resume(step_pending); > } > > /* We are sending this now, but the CPUs will be resumed shortly later */ > qapi_event_send_resume(); > > cpu_enable_ticks(); > > While these may not be needed, but instead only needed if RUN_STATE_RUNNING > below (runstate_set() will be needed regardless): > > runstate_set(RUN_STATE_RUNNING); > vm_state_notify(1, RUN_STATE_RUNNING); > > So here's another of my attempt (this time also taking > !global_state_received() into account)... > > RunState run_state; > > if (migration_incoming_colo_enabled()) { > migration_incoming_disable_colo(); > } > > if (!global_state_received()) { > run_state = RUN_STATE_RUNNING; > } else { > run_state = global_state_get_runstate(); > } > > if (autostart) { > /* Part of vm_prepare_start(), may isolate into a helper? */ > if (cpus_accel->synchronize_pre_resume) { > cpus_accel->synchronize_pre_resume(step_pending); > } > qapi_event_send_resume(); > cpu_enable_ticks(); > /* Setup the runstate on src */ > runstate_set(run_state); > if (run_state == RUN_STATE_RUNNING) { > vm_state_notify(1, RUN_STATE_RUNNING);
And here I at least forgot: resume_all_vcpus(); The pesudo code just to illustrate the whole idea. Definitely not tested in any form, so sorry if it won't even compile.. > } > } else { > runstate_set(RUN_STATE_PAUSED); > } > > The whole idea is still do whatever needed here besides resuming the vcpus, > rather than leaving the whole vm_start() into system wakeup. It then can > avoid touching the system wakeup code which seems cleaner. > > > > IIUC this can drop qemu_system_start_on_wakeup_request() along with the > > > other global var. Would something like it work for us? > > > > Afraid not. start_on_wake is the only correct solution I can think of. > > Please check again above, I just hope we can avoid yet another global to > QEMU if possible. > > Thanks, > > -- > Peter Xu -- Peter Xu