* Paolo Bonzini (pbonz...@redhat.com) wrote: > > > On 03/03/2017 13:00, Dr. David Alan Gilbert wrote: > > Ouch that's pretty nasty; I remember Paolo explaining to me a while ago that > > their were times when run_on_cpu would have to drop the BQL and I worried > > about it, > > but this is the 1st time I've seen an error due to it. > > > > Do you know what the migration state was at that point? Was it > > MIGRATION_STATUS_CANCELLING? > > I'm thinking perhaps we should stop 'cont' from continuing while migration > > is in > > MIGRATION_STATUS_CANCELLING. Do we send an event when we hit CANCELLED - > > so that > > perhaps libvirt could avoid sending the 'cont' until then? > > No, there's no event, though I thought libvirt would poll until > "query-migrate" returns the cancelled state. Of course that is a small > consolation, because a segfault is unacceptable.
I think you might get an event if you set the new migrate capability called 'events' on! void migrate_set_state(int *state, int old_state, int new_state) { if (atomic_cmpxchg(state, old_state, new_state) == old_state) { trace_migrate_set_state(new_state); migrate_generate_event(new_state); } } static void migrate_generate_event(int new_state) { if (migrate_use_events()) { qapi_event_send_migration(new_state, &error_abort); } } That event feature went in sometime after 2.3.0. > One possibility is to suspend the monitor in qmp_migrate_cancel and > resume it (with add_migration_state_change_notifier) when we hit the > CANCELLED state. I'm not sure what the latency would be between the end > of migrate_fd_cancel and finally reaching CANCELLED. I don't like suspending monitors; it can potentially take quite a significant time to do a cancel. How about making 'cont' fail if we're in CANCELLING? I'd really love to see the 'run_on_cpu' being more careful about the BQL; we really need all of the rest of the devices to stay quiesced at times. Dave > Paolo -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK