At this point everyone is probably familiar with the assertion: bdrv_inactivate_recurse: Assertion `!(bs->open_flags & BDRV_O_INACTIVE)' failed.
The issue is that at the end of migration, the block layer is deactivated to release the locks on the disks so that in a shared storage scenario the migration destination can take over. Trying to deactivate twice leads to the above assertion. The reason deactivation is happening twice is due to the migration capability "late-block-activate", which delays activation until the destination VM receives a qmp_cont command. The reasoning for that cap is: Activating the block devices causes the locks to be taken on the backing file. If we're running with -S and the destination libvirt hasn't started the destination with 'cont', it's expecting the locks are still untaken. Don't activate the block devices if we're not going to autostart the VM; 'cont' already will do that anyway. This change is tied to the new migration capability 'late-block-activate' that defaults to off, keeping the old behaviour by default. bz: https://bugzilla.redhat.com/show_bug.cgi?id=1560854 However, the expected qmp_cont command never happens, because the user may never decide to continue the VM and instead attempt another migration while the VM is paused. Fix this by assuming that the qmp_migrate command having been issued is enough indication that it is now ok to take the locks again (similarly to qmp_cont) and call bdrv_activate_all() at that moment. This is the least invasive fix that won't require us to add new qmp commands and go change libvirt, etc. I'm also copying the block list (for obvious reasons, but also) because Vladimir pointed out that asserts of this kind might not only happen during migration. I know of at least bdrv_co_write_req_prepare() which will assert if reached when a guest is paused after a migration that used late-block-activate. Patches 1-3 are just shuffling code; Patch 4 is the fix; Patch 5 is the test. To reproduce the issue revert patch 4 and run: QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test -p \ /x86_64/migration/precopy/ping-pong/late-block-activate References: https://gitlab.com/qemu-project/qemu/-/issues/686 https://gitlab.com/qemu-project/qemu/-/issues/2395 https://lore.kernel.org/r/20240924125611.664315-1-andrey.drobys...@virtuozzo.com CI run: https://gitlab.com/farosas/qemu/-/pipelines/1556902330 Fabiano Rosas (5): tests/qtest/migration: Move more code under only_target tests/qtest/migration: Don't use hardcoded strings for -serial tests/qtest/migration: Support cleaning up only one side of migration migration: Activate block devices if VM is paused when migrating tests/qtest/migration: Test successive migrations migration/migration.c | 19 +++ tests/qtest/migration-helpers.c | 8 + tests/qtest/migration-helpers.h | 2 + tests/qtest/migration-test.c | 252 +++++++++++++++++++++++++------- 4 files changed, 226 insertions(+), 55 deletions(-) base-commit: 134b443512825bed401b6e141447b8cdc22d2efe -- 2.35.3