Hi, I found that it's possible to remove a RAMBlock during migration. E.g. device hot-unplugging initiated by a guest (how to reproduce is below). And I want to clarify whether RAMBlock removing (or even adding) during migration is valid operation or it's a bug.
Currently, it may cause some race conditions with migration thread and migration may fail because of them. For instance, vmstate_unregister_ram function which is called during PCIe device removing does these: - Memset idstr -> target may receive unknown/zeroed idstr -> migration fail - Set RAMBlock flags as non-migratable -> migration fail RAMBlock removing itself seems safe for migration thread because of RCU. But it seems to me there are other possible race conditions (didn't test it): - qemu_put_buffer_async -> saves pointer to RAMBlock's memory -> block will be freed out of RCU (between ram save iterations) -> qemu_fflush -> access to freed memory. So, I have the following questions: 1. Is RAMBlock removing/adding OK during migration? 2. If yes then what should we do with vmstate_unregister_ram? - Just remove vmstate_unregister_ram (my RFC patch) - Refcount RAMBlock's migratable/non-migratable state - Something else? 3. If it mustn't be possible, so may be assert(migration_is_idle()) in qemu_ram_free? P.S. I'm working on a fix of below problem and trying to choose better way: allow device removing and fix all problem like this or fix a particular device. -------- How to reproduce device removing during migration: 1. Source QEMU command line (target is similar) $ x86_64-softmmu/qemu-system-x86_64 \ -nodefaults -no-user-config -m 1024 -M q35 \ -qmp unix:./src.sock,server,nowait \ -drive file=./image,format=raw,if=virtio \ -device ioh3420,id=pcie.1 \ -device virtio-net,bus=pcie.1 2. Start migration with slow speed (to simplify reproducing) 3. Power off a device on the hotplug pcie.1 bus: $ echo 0 > /sys/bus/pci/slots/0/power 4. Increase migration speed and wait until fail Most likely you will get something like this: qemu-system-x86_64: get_pci_config_device: Bad config data: i=0xaa read: 0 device: 40 cmask: ff wmask: 0 w1cmask:19 qemu-system-x86_64: Failed to load PCIDevice:config qemu-system-x86_64: Failed to load ioh-3240-express-root-port:parent_obj.parent_obj.parent_obj qemu-system-x86_64: error while loading state for instance 0x0 of device '0000:00:03.0/ioh-3240-express-root-port' qemu-system-x86_64: load of migration failed: Invalid argument This error is just an illustration of the removing device possibility, but not actually an illustration of the race conditions for removing RAMBlock. Regards, Yury Yury Kotov (1): migration: Remove vmstate_unregister_ram hw/block/pflash_cfi01.c | 1 - hw/block/pflash_cfi02.c | 1 - hw/mem/pc-dimm.c | 5 ----- hw/misc/ivshmem.c | 2 -- hw/pci/pci.c | 1 - include/migration/vmstate.h | 1 - migration/savevm.c | 6 ------ 7 files changed, 17 deletions(-) -- 2.24.0