Peter Xu <pet...@redhat.com> writes: > Both dump-guest-memory and live migration have vm state cached internally. > Allowing them to happen together means the vm state can be messed up. Simply > block live migration for dump-guest-memory. > > One trivial thing to mention is we should still allow dump-guest-memory even > if > -only-migratable is specified, because that flag should majorly be used to > guarantee not adding devices that will block migration by accident. Dump > guest > memory is not like that - it'll only block for the seconds when it's dumping.
I recently ran into a similarly unusual use of migration blockers: Subject: -only-migrate and the two different uses of migration blockers (was: spapr_events: Sure we may ignore migrate_add_blocker() failure?) Date: Mon, 19 Jul 2021 13:00:20 +0200 (5 weeks, 1 day, 20 hours ago) Message-ID: <87sg0amuuz.fsf...@dusky.pond.sub.org> We appear to use migration blockers in two ways: (1) Prevent migration for an indefinite time, typically due to use of some feature that isn't compatible with migration. (2) Delay migration for a short time. Option -only-migrate is designed for (1). It interferes with (2). Example for (1): device "x-pci-proxy-dev" doesn't support migration. It adds a migration blocker on realize, and deletes it on unrealize. With -only-migrate, device realize fails. Works as designed. Example for (2): spapr_mce_req_event() makes an effort to prevent migration degrate the reporting of FWNMIs. It adds a migration blocker when it receives one, and deletes it when it's done handling it. This is a best effort; if migration is already in progress by the time FWNMI is received, we simply carry on, and that's okay. However, option -only-migrate sabotages the best effort entirely. While this isn't exactly terrible, it may be a weakness in our thinking and our infrastructure. I'm bringing it up so the people in charge are aware :) https://lists.nongnu.org/archive/html/qemu-devel/2021-07/msg04723.html Downthread there, Dave Gilbert opined It almost feels like they need a way to temporarily hold off 'completion' of migratio - i.e. the phase where we stop the CPU and write the device data; mind you you'd also probably want it to stop cold-migrates/snapshots?