From: Peter Xu <pet...@redhat.com> Below process could crash qemu with postcopy recovery:
1. (hmp) migrate -d .. 2. (hmp) migrate_start_postcopy 3. [network down, postcopy paused] 4. (hmp) migrate -r $WRONG_PORT when try the recover on an invalid $WRONG_PORT, cleanup_bh will be cleared 5. (hmp) migrate -r $RIGHT_PORT [qemu crash on assert(cleanup_bh)] The thing is we shouldn't cleanup if it's postcopy resume; the error is set mostly because the channel is wrong, so we return directly waiting for the user to retry. migrate_fd_cleanup() should only be called when migration is cancelled or completed. Signed-off-by: Peter Xu <pet...@redhat.com> Message-Id: <20210708190653.252961-3-pet...@redhat.com> Reviewed-by: Dr. David Alan Gilbert <dgilb...@redhat.com> Signed-off-by: Dr. David Alan Gilbert <dgilb...@redhat.com> --- migration/migration.c | 13 ++++++++++++- 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/migration/migration.c b/migration/migration.c index 38ebc6c1ab..20c48cfff1 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -3979,7 +3979,18 @@ void migrate_fd_connect(MigrationState *s, Error *error_in) } if (error_in) { migrate_fd_error(s, error_in); - migrate_fd_cleanup(s); + if (resume) { + /* + * Don't do cleanup for resume if channel is invalid, but only dump + * the error. We wait for another channel connect from the user. + * The error_report still gives HMP user a hint on what failed. + * It's normally done in migrate_fd_cleanup(), but call it here + * explicitly. + */ + error_report_err(error_copy(s->error)); + } else { + migrate_fd_cleanup(s); + } return; } -- 2.31.1