On Fri, Feb 14, 2025 at 03:12:22PM -0500, Steven Sistare wrote: > On 1/30/2025 12:01 PM, Peter Xu wrote: > > On Wed, Jan 29, 2025 at 01:20:13PM -0500, Steven Sistare wrote: > > > On 1/17/2025 6:57 PM, Peter Xu wrote: > > > > On Fri, Jan 17, 2025 at 02:10:14PM -0500, Steven Sistare wrote: > > > > > On 1/17/2025 1:16 PM, Peter Xu wrote: > > > > > > On Fri, Jan 17, 2025 at 09:46:11AM -0800, Steve Sistare wrote: > > > > > > > +/* > > > > > > > + * Return true if ram contents would be lost during CPR. > > > > > > > + * Return false for ram_device because it is remapped in new > > > > > > > QEMU. Do not > > > > > > > + * exclude rom, even though it is readonly, because the rom file > > > > > > > could change > > > > > > > + * in new QEMU. Return false for non-migratable blocks. They > > > > > > > are either > > > > > > > + * re-created in new QEMU, or are handled specially, or are > > > > > > > covered by a > > > > > > > + * device-level CPR blocker. Return false for an fd, because it > > > > > > > is visible and > > > > > > > + * can be remapped in new QEMU. > > > > > > > + */ > > > > > > > +static bool ram_is_volatile(RAMBlock *rb) > > > > > > > +{ > > > > > > > + MemoryRegion *mr = rb->mr; > > > > > > > + > > > > > > > + return mr && > > > > > > > + memory_region_is_ram(mr) && > > > > > > > + !memory_region_is_ram_device(mr) && > > > > > > > + (!qemu_ram_is_shared(rb) || !qemu_ram_is_named_file(rb)) > > > > > > > && > > > > > > > + qemu_ram_is_migratable(rb) && > > > > > > > + rb->fd < 0; > > > > > > > +} > > > > > > > > > > > > Blocking guest_memfd looks ok, but comparing to add one more block > > > > > > notifier, can we check all ramblocks once in migrate_prepare(), and > > > > > > fail > > > > > > that command directly if it fails the check? > > > > > > > > > > In an upcoming patch, I will be adding an option analogous to > > > > > only-migratable which > > > > > prevents QEMU from starting if anything would block cpr-transfer. > > > > > That option > > > > > will be checked when blockers are added, like for only-migratable. > > > > > migrate_prepare > > > > > is too late. > > > > > > > > > > > OTOH, is there any simpler way to simplify the check conditions? > > > > > > It'll be > > > > > > at least nice to break these checks into smaller if conditions for > > > > > > readability.. > > > > > > > > > > I thought the function header comments made it clear, but I could > > > > > move each > > > > > comment next to each condition: > > > > > > > > > > ... > > > > > /* > > > > > * Return false for an fd, because it is visible and can be > > > > > remapped in > > > > > * new QEMU. > > > > > */ > > > > > if (rb->fd >= 0) { > > > > > return false; > > > > > } > > > > > ... > > > > > > > > > > > I wonder if we could stick with looping over all ramblocks, then > > > > > > make sure > > > > > > each of them is on the cpr saved fd list. It may need to make > > > > > > cpr_save_fd() always register with the name of ramblock to do such > > > > > > lookup, > > > > > > or maybe we could also cache the ramblock pointer in CprFd, then > > > > > > the lookup > > > > > > will be a pointer match check. > > > > > > > > > > Some ramblocks are not on the list, such as named files. Plus > > > > > looping in > > > > > migrate_prepare is too late as noted above. > > > > > > > > > > IMO what I have already implemented using blockers is clean and > > > > > elegant. > > > > > > > > OK if we need to fail it early at boot, then yes blockers are probably > > > > better. > > > > > > > > We'll need one more cmdline parameter. I've no objection, but I don't > > > > know > > > > how to judge when it's ok to add, when it's better not.. I'll leave > > > > others > > > > to comment on this. > > > > > > > > But still, could we check it when ramblocks are created? So in that way > > > > whatever is forbidden is clear in its own path, I feel like that could > > > > be > > > > clearer (like what you did with gmemfd). > > > > > > When the ramblock is created, we don't yet know if it is migratable. A > > > ramblock that is not migratable does not block cpr. Migratable is not > > > known > > > until vmstate_register_ram calls qemu_ram_set_migratable. Hence that is > > > where I evaluate conditions and install a blocker. > > > > > > Because that is the only place where ram_block_add_cpr_blocker is called, > > > the test qemu_ram_is_migratable() inside ram_block_add_cpr_blocker is > > > redundant, and I should delete it. > > > > Hmm.. sounds reasonable. > > > > > > > > > For example, if I start to convert some of your requirements above, then > > > > memory_region_is_ram_device() implies RAM_PREALLOC. Actually, > > > > ram_device > > > > is not the only RAM_PREALLOC user.. Say, would it also not work with > > > > all > > > > memory_region_init_ram_ptr() users (even if they're not ram_device)? An > > > > example is, looks like virtio-gpu can create random ramblocks on the fly > > > > with prealloced buffers. I am not sure whether they can be pinned by > > > > VFIO > > > > too. You may know better. > > > > > > That memory is not visible to the guest. It is not part of system_memory, > > > and is not marked migratable. > > > > I _think_ that can still be visible at least for the virtio-gpu use case, > > which hangs under VirtIOGPUBase.hostmem. Relevant code for reference: > > > > virtio_gpu_virgl_map_resource_blob: > > memory_region_init_ram_ptr(mr, OBJECT(mr), "blob", size, data); > > memory_region_add_subregion(&b->hostmem, offset, mr); > > memory_region_set_enabled(mr, true);
[1] > > > > virtio_gpu_pci_base_realize: > > memory_region_init(&g->hostmem, OBJECT(g), "virtio-gpu-hostmem", > > g->conf.hostmem); > > pci_register_bar(&vpci_dev->pci_dev, 4, > > PCI_BASE_ADDRESS_SPACE_MEMORY | > > PCI_BASE_ADDRESS_MEM_PREFETCH | > > PCI_BASE_ADDRESS_MEM_TYPE_64, > > &g->hostmem); [2] > > > > pci_update_mappings: > > memory_region_add_subregion_overlap(r->address_space, > > r->addr, r->memory, 1); [3] > > > > but indeed I'm not sure how it work with migration so far, because it > > doesn't have RAM_MIGRATABLE set. So at least cpr didn't make it more > > special. I assume this isn't something we must figure out as of now > > then.. but if you do, please kindly share. > > AFAICT this memory cannot be pinned, because it is not attached to the > "system_memory" mr and the "address_space_memory" address space. The > listener than maps/pins is attached to address_space_memory. I still think it's part of system_memory - every PCI bar that got mapped as MMIO regions could be part of guest system memory, and IIUC could be pinned. Normally these MMIOs are IO regions so "pinning" may not make much difference IIUC, but here looks like it's real RAM backing it when emulated, even though they're "MMIO regions".. Above code clips are paths that I think how they got attached to system_memory. It isn't more special than a generic PCI device's bars got mapped to system_memory, though: - Firstly, at [1], the blob is part of VirtIOGPUBase.hostmem as subregion - Then, at [2], the VirtIOGPUBase.hostmem is registered as a pci bar, in this case, it's BAR4. - Then, it's the same as other pci device BARs where they can logically be mapped into guest physical address space and be part of system_memory. Above [3] was when the bar memory got added into PCIIORegion.address_space. Then if to fill up the last missing piece: taking i440fx as example, ultimately the pci bus memory address space will be attached to system_memory here: i440fx_pcihost_realize: /* setup pci memory mapping */ pc_pci_as_mapping_init(s->system_memory, s->pci_address_space); Again, not requesting to resolve this immediately: I have no idea how migration works at all with random new ramblocks being added.. but I'd still like to check we're on the same page on this specific case, though. -- Peter Xu