* Peter Xu (pet...@redhat.com) wrote: > On Fri, Aug 04, 2017 at 03:04:19PM +0800, Peter Xu wrote: > > On Thu, Aug 03, 2017 at 12:05:41PM +0100, Dr. David Alan Gilbert wrote: > > > > [...] > > > > > > +static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis) > > > > +{ > > > > + /* > > > > + * This means source VM is ready to resume the postcopy migration. > > > > + * It's time to switch state and release the fault thread to > > > > + * continue service page faults. > > > > + */ > > > > + migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_RECOVER, > > > > + MIGRATION_STATUS_POSTCOPY_ACTIVE); > > > > + qemu_sem_post(&mis->postcopy_pause_sem_fault); > > > > > > Is it worth sanity checking that you were in RECOVER at this point? > > > > Yeah, it never hurts. Will do. > > Not sure whether this would be good (note: I returned 0 in the if): > > diff --git a/migration/savevm.c b/migration/savevm.c > index b7843c2..b34f59b 100644 > --- a/migration/savevm.c > +++ b/migration/savevm.c > @@ -1709,6 +1709,12 @@ static int > loadvm_postcopy_handle_run(MigrationIncomingState *mis) > > static int loadvm_postcopy_handle_resume(MigrationIncomingState *mis) > { > + if (mis->state != MIGRATION_STATUS_POSTCOPY_RECOVER) { > + error_report("%s: illegal resume received", __func__); > + /* Don't fail the load, only for this. */ > + return 0; > + } > + > /* > * This means source VM is ready to resume the postcopy migration. > * It's time to switch state and release the fault thread to > > Basically I just don't want to crash the dest VM (it holds hot dirty > pages) even if it receives a faulty RESUME command.
Yes, so now that's a fun problem; effectively you then have 3 valid failure modes: a) An IO failure so we need to go into POSTCOPY_PAUSE b) A fatal migration stream problem to quit c) A non-fatal migration stream problem to go .. back into PAUSE? Dave > -- > Peter Xu -- Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK