In migration_maybe_pause() QEMU may yield BQL before waiting for a semaphore. However it yields the BQL too early, which logically gives it chance for the main thread to quickly take the BQL and modify the state to CANCELLING.
To avoid such race condition from happening at all, always update the migration states within the BQL. It'll make sure no concurrent cancellation can ever happen. With that, IIUC there's chance we can remove the extra parameter in migration_maybe_pause() to update active state, but that'll be done separately later. Signed-off-by: Peter Xu <pet...@redhat.com> --- migration/migration.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/migration/migration.c b/migration/migration.c index 13b7df0d5b..5c688059de 100644 --- a/migration/migration.c +++ b/migration/migration.c @@ -2701,14 +2701,14 @@ static int migration_maybe_pause(MigrationState *s, * wait for the 'pause_sem' semaphore. */ if (s->state != MIGRATION_STATUS_CANCELLING) { - bql_unlock(); migrate_set_state(&s->state, *current_active_state, MIGRATION_STATUS_PRE_SWITCHOVER); + bql_unlock(); qemu_sem_wait(&s->pause_sem); + bql_lock(); migrate_set_state(&s->state, MIGRATION_STATUS_PRE_SWITCHOVER, new_state); *current_active_state = new_state; - bql_lock(); } return s->state == new_state ? 0 : -EINVAL; -- 2.47.0