Eric Blake <ebl...@redhat.com> wrote: > Consider what happens when performing a migration between two host > machines connected to an NFS server serving multiple block devices to > the guest, when the NFS server becomes unavailable. The migration > attempts to inactivate all block devices on the source (a necessary > step before the destination can take over); but if the NFS server is > non-responsive, the attempt to inactivate can itself fail. When that > happens, the destination fails to get the migrated guest (good, > because the source wasn't able to flush everything properly): > > (qemu) qemu-kvm: load of migration failed: Input/output error > > at which point, our only hope for the guest is for the source to take > back control. With the current code base, the host outputs a message, but > then appears to resume: > > (qemu) qemu-kvm: qemu_savevm_state_complete_precopy_non_iterable: > bdrv_inactivate_all() failed (-1) > > (src qemu)info status > VM status: running > > but a second migration attempt now asserts: > > (src qemu) qemu-kvm: ../block.c:6738: int > bdrv_inactivate_recurse(BlockDriverState *): Assertion `!(bs->open_flags & > BDRV_O_INACTIVE)' failed. > > Whether the guest is recoverable on the source after the first failure > is debatable, but what we do not want is to have qemu itself fail due > to an assertion. It looks like the problem is as follows: > > In migration.c:migration_completion(), the source sets 'inactivate' to > true (since COLO is not enabled), then tries > savevm.c:qemu_savevm_state_complete_precopy() with a request to > inactivate block devices. In turn, this calls > block.c:bdrv_inactivate_all(), which fails when flushing runs up > against the non-responsive NFS server. With savevm failing, we are > now left in a state where some, but not all, of the block devices have > been inactivated; the 'fail_invalidate' label of > migration_completion() then wants to reclaim those disks by calling > bdrv_activate_all() - but this too can fail, yet nothing takes note of > that failure. > > Thus, we have reached a state where the migration engine has forgotten > all state about whether a block device is inactive, because we did not > set s->block_inactive; so migration allows the source to reach > vm_start() and resume execution, violating the block layer invariant > that the guest CPUs should not be restarted while a device is > inactive. Note that the code in migration.c:migrate_fd_cancel() will > also try to reactivate all block devices if s->block_inactive was set, > but because we failed to set that flag after the first failure, the > source assumes it has reclaimed all devices, even though it still has > remaining inactivated devices and does not try again. Normally, > qmp_cont() will also try to reactivate all disks (or correctly fail if > the disks are not reclaimable because NFS is not yet back up), but the > auto-resumption of the source after a migration failure does not go > through qmp_cont(). And because we have left the block layer in an > inconsistent state with devices still inactivated, the later migration > attempt is hitting the assertion failure. > > Since it is important to not resume the source with inactive disks, > this patch tries harder at tracking whether migration attempted to > inactivate any devices, in order to prevent any vm_start() until it > has successfully reactivated all devices. > > See also https://bugzilla.redhat.com/show_bug.cgi?id=2058982 > > Signed-off-by: Eric Blake <ebl...@redhat.com>
Wow Such a big comment for such a small patch. And then people think that there is not "nuance" anymore :-) > @@ -3518,6 +3519,7 @@ fail_invalidate: > bdrv_activate_all(&local_err); > if (local_err) { > error_report_err(local_err); > + s->block_inactive = inactivate; Why not just put here: s->block_inactive = true; And call it a day? if bdrv_activate_all() fails, we can't continue anyways. Or I am missing something? Later, Juan. > } else { > s->block_inactive = false; > } > > base-commit: f1426881a827a6d3f31b65616c4a8db1e9e7c45e