* Amit Shah (amit.s...@redhat.com) wrote: > From: "Denis V. Lunev" <d...@openvz.org> > > There is a possibility to hit an assert in qcow2_get_specific_info that > s->qcow_version is undefined. This happens when VM in starting from > suspended state, i.e. it processes incoming migration, and in the same > time 'info block' is called. > > The problem is that qcow2_invalidate_cache() closes the image and > memset()s BDRVQcowState in the middle. > > The patch moves processing of bdrv_invalidate_cache_all out of > coroutine context for postcopy migration to avoid that. This function > is called with the following stack: > process_incoming_migration_co > qemu_loadvm_state > qemu_loadvm_state_main > loadvm_process_command > loadvm_postcopy_handle_run > > Signed-off-by: Denis V. Lunev <d...@openvz.org> > Tested-by: Dr. David Alan Gilbert <dgilb...@redhat.com>
hmm; actually - this segs in a variety of different ways; there are two problems: a) + bh = qemu_bh_new(loadvm_postcopy_handle_run_bh, NULL); That's the easy one; that NULL should be 'mis', because the bh is expecting to use it as a MigrationIncomingState so it segs fairly reliably in the qemu_bh_delete(mis->bh) b) The harder problem is that there's a race where qemu_bh_delete segs, and I'm not 100% sure why yet - it only does it sometime (i.e. run virt-test and leave it and it occasionally does it). From the core it looks like qemu->bh is corrupt (0x10101010...) so maybe mis has been freed at that point? I'm suspecting this is the postcopy_ram_listen_thread freeing mis at the end of it, but I don't know yet. Dave > CC: Paolo Bonzini <pbonz...@redhat.com> > CC: Juan Quintela <quint...@redhat.com> > CC: Amit Shah <amit.s...@redhat.com> > Message-Id: <1455259174-3384-3-git-send-email-...@openvz.org> > Signed-off-by: Amit Shah <amit.s...@redhat.com> > --- > migration/savevm.c | 27 +++++++++++++++++---------- > 1 file changed, 17 insertions(+), 10 deletions(-) > > diff --git a/migration/savevm.c b/migration/savevm.c > index 94f2894..8415fd9 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -1496,18 +1496,10 @@ static int > loadvm_postcopy_handle_listen(MigrationIncomingState *mis) > return 0; > } > > -/* After all discards we can start running and asking for pages */ > -static int loadvm_postcopy_handle_run(MigrationIncomingState *mis) > +static void loadvm_postcopy_handle_run_bh(void *opaque) > { > - PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_RUNNING); > Error *local_err = NULL; > > - trace_loadvm_postcopy_handle_run(); > - if (ps != POSTCOPY_INCOMING_LISTENING) { > - error_report("CMD_POSTCOPY_RUN in wrong postcopy state (%d)", ps); > - return -1; > - } > - > /* TODO we should move all of this lot into postcopy_ram.c or a shared > code > * in migration.c > */ > @@ -1519,7 +1511,6 @@ static int > loadvm_postcopy_handle_run(MigrationIncomingState *mis) > bdrv_invalidate_cache_all(&local_err); > if (local_err) { > error_report_err(local_err); > - return -1; > } > > trace_loadvm_postcopy_handle_run_cpu_sync(); > @@ -1534,6 +1525,22 @@ static int > loadvm_postcopy_handle_run(MigrationIncomingState *mis) > /* leave it paused and let management decide when to start the CPU */ > runstate_set(RUN_STATE_PAUSED); > } > +} > + > +/* After all discards we can start running and asking for pages */ > +static int loadvm_postcopy_handle_run(MigrationIncomingState *mis) > +{ > + PostcopyState ps = postcopy_state_set(POSTCOPY_INCOMING_RUNNING); > + QEMUBH *bh; > + > + trace_loadvm_postcopy_handle_run(); > + if (ps != POSTCOPY_INCOMING_LISTENING) { > + error_report("CMD_POSTCOPY_RUN in wrong postcopy state (%d)", ps); > + return -1; > + } > + > + bh = qemu_bh_new(loadvm_postcopy_handle_run_bh, NULL); > + qemu_bh_schedule(bh); > > /* We need to finish reading the stream from the package > * and also stop reading anything more from the stream that loaded the > -- > 2.5.0 > > -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK