On the destination side, we cannot wake up all the threads when we got reconnected. The first thing to do is to wake up the main load thread, so that we can continue to receive valid messages from source again and reply when needed.
At this point, we switch the destination VM state from postcopy-paused back to postcopy-recover. Now we are finally ready to do the resume logic. Signed-off-by: Peter Xu <pet...@redhat.com> --- migration/migration.c | 34 +++++++++++++++++++++++++++++++--- 1 file changed, 31 insertions(+), 3 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 3aabe11..e498fa4 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -389,10 +389,38 @@ static void process_incoming_migration_co(void *opaque) void migration_fd_process_incoming(QEMUFile *f) { - Coroutine *co = qemu_coroutine_create(process_incoming_migration_co, f); + MigrationIncomingState *mis = migration_incoming_get_current(); + Coroutine *co; + + mis->from_src_file = f; + + if (mis->state == MIGRATION_STATUS_POSTCOPY_PAUSED) { + /* Resumed migration to postcopy state */ + + /* Postcopy has standalone thread to do vm load */ + qemu_file_set_blocking(f, true); + + /* Re-configure the return path */ + mis->to_src_file = qemu_file_get_return_path(f); - qemu_file_set_blocking(f, false); - qemu_coroutine_enter(co); + /* Reset the migration status to postcopy-active */ + migrate_set_state(&mis->state, MIGRATION_STATUS_POSTCOPY_PAUSED, + MIGRATION_STATUS_POSTCOPY_RECOVER); + + /* + * Here, we only wake up the main loading thread (while the + * fault thread will still be waiting), so that we can receive + * commands from source now, and answer it if needed. The + * fault thread will be waked up afterwards until we are sure + * that source is ready to reply to page requests. + */ + qemu_sem_post(&mis->postcopy_pause_sem_dst); + } else { + /* New incoming migration */ + qemu_file_set_blocking(f, false); + co = qemu_coroutine_create(process_incoming_migration_co, f); + qemu_coroutine_enter(co); + } } /* -- 2.7.4