On Mon, Mar 05, 2018 at 05:04:52PM +0100, Max Reitz wrote: > On 2018-03-05 16:59, Stefan Hajnoczi wrote: > > There is a race between the test's 'query-migrate' QMP command after the > > QMP 'STOP' event and completing the migration: > > > > The test case invokes 'query-migrate' upon receiving 'STOP'. At this > > point the migration thread may still be in the process of completing. > > Therefore 'query-migrate' can return 'status': 'active' for a brief > > window of time instead of 'status': 'completed'. This results in > > qemu-iotests 203 hanging. > > > > Solve the race by enabling the 'events' migration capability, which > > causes QEMU to emit migration-specific QMP events that do not suffer > > from this race condition. Wait for the QMP 'MIGRATION' event with > > 'status': 'completed'. > > > > Reported-by: Max Reitz <mre...@redhat.com> > > Signed-off-by: Stefan Hajnoczi <stefa...@redhat.com> > > --- > > tests/qemu-iotests/203 | 15 +++++++++++---- > > tests/qemu-iotests/203.out | 5 +++++ > > 2 files changed, 16 insertions(+), 4 deletions(-) > > So much for "the ppoll() dungeon"...
It was still a pain to debug :). I put a ring buffer into the QMP monitor input/output code. Then it was possible to figure out the issue via GDB on a hung QEMU: (gdb) p current_run_state RUN_STATE_POSTMIGRATE (gdb) p current_migration->status MIGRATION_STATUS_COMPLETED (gdb) p monitor_out_ring ...'STOP' event... (gdb) p monitor_in_ring ...query-migrate... <-- okay, the test checked if migration finished Then looking at the code: static void migration_completion(MigrationState *s) { ... if (s->state == MIGRATION_STATUS_ACTIVE) { qemu_mutex_lock_iothread(); s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER); s->vm_was_running = runstate_is_running(); ret = global_state_store(); if (!ret) { bool inactivate = !migrate_colo_enabled(); v---- The stop event comes from here ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE); ... } qemu_mutex_unlock_iothread(); <--- oh, no! ... if (!migrate_colo_enabled()) { migrate_set_state(&s->state, current_active_state, MIGRATION_STATUS_COMPLETED); <-- too late! } return;
signature.asc
Description: PGP signature