On Thu, Jan 02, 2025 at 02:21:13PM -0500, Steven Sistare wrote: > On 12/24/2024 2:24 PM, Peter Xu wrote: > > On Tue, Dec 24, 2024 at 08:17:00AM -0800, Steve Sistare wrote: > > > Add the cpr-transfer migration mode, which allows the user to transfer > > > a guest to a new QEMU instance on the same host with minimal guest pause > > > time, by preserving guest RAM in place, albeit with new virtual addresses > > > in new QEMU, and by preserving device file descriptors. Pages that were > > > locked in memory for DMA in old QEMU remain locked in new QEMU, because > > > the > > > descriptor of the device that locked them remains open. > > > > > > cpr-transfer preserves memory and devices descriptors by sending them to > > > new QEMU over a unix domain socket using SCM_RIGHTS. Such CPR state > > > cannot > > > be sent over the normal migration channel, because devices and backends > > > are created prior to reading the channel, so this mode sends CPR state > > > over a second "cpr" migration channel. New QEMU reads the cpr channel > > > prior to creating devices or backends. The user specifies the cpr channel > > > in the channel arguments on the outgoing side, and in a second -incoming > > > command-line parameter on the incoming side. > > > > > > The user must start old QEMU with the the '-machine aux-ram-share=on' > > > option, > > > which allows anonymous memory to be transferred in place to the new > > > process > > > by transferring a memory descriptor for each ram block. Memory-backend > > > objects must have the share=on attribute, but memory-backend-epc is not > > > supported. > > > > > > The user starts new QEMU on the same host as old QEMU, with command-line > > > arguments to create the same machine, plus the -incoming option for the > > > main migration channel, like normal live migration. In addition, the user > > > adds a second -incoming option with channel type "cpr". The CPR channel > > > address must be a type, such as unix socket, that supports SCM_RIGHTS. > > > > > > To initiate CPR, the user issues a migrate command to old QEMU, adding > > > a second migration channel of type "cpr" in the channels argument. > > > Old QEMU stops the VM, saves state to the migration channels, and enters > > > the postmigrate state. New QEMU mmap's memory descriptors, and execution > > > resumes. > > > > > > The implementation splits qmp_migrate into start and finish functions. > > > Start sends CPR state to new QEMU, which responds by closing the CPR > > > channel. Old QEMU detects the HUP then calls finish, which connects the > > > main migration channel. > > > > > > In summary, the usage is: > > > > > > qemu-system-$arch -machine aux-ram-share=on ... > > > > > > start new QEMU with "-incoming <main-uri> -incoming <cpr-channel>" > > > > > > Issue commands to old QEMU: > > > migrate_set_parameter mode cpr-transfer > > > > > > {"execute": "migrate", ... > > > {"channel-type": "main"...}, {"channel-type": "cpr"...} ... } > > > > > > Signed-off-by: Steve Sistare <steven.sist...@oracle.com> > > > > Feel free to take: > > > > Reviewed-by: Peter Xu <pet...@redhat.com> > > > > I still have a few trivial comments. > > > > [...] > > > > > diff --git a/migration/cpr.c b/migration/cpr.c > > > index 87bcfdb..584b0b9 100644 > > > --- a/migration/cpr.c > > > +++ b/migration/cpr.c > > > @@ -45,7 +45,7 @@ static const VMStateDescription vmstate_cpr_fd = { > > > VMSTATE_UINT32(namelen, CprFd), > > > VMSTATE_VBUFFER_ALLOC_UINT32(name, CprFd, 0, NULL, namelen), > > > VMSTATE_INT32(id, CprFd), > > > > Could you remind me again on when id!=0 will start to be used? > > Each of vfio, iommufd, chardev, and tap will use id != 0.
I don't remember the details of the planned future series, but just to mention that using integer ID can be error prone on device hot plug/unplug. QEMU has a known bug even now on some device (e.g. slirp network backends) that if the src QEMU originally has two devices (e.g. id=1,2), unplug device id=1 (leaving id=2), then migrate, it could fail seeing dest only has id=1 (dest QEMU starts with only one device), seeing a mismatched ID. I recall PCIe frontend devices are not prone to such issue, that should depend on whoever has ->get_id() (qdev_get_dev_path?) properly implemented to generate a global unique ID that is not affected by order of device realized / created. It could boil down to how the IDs are allocated, anything that can be allocated on the fly may not work well if there's no solid topology information to fetch. I wonder if CPR can be prone to this too when using IDs, just FYI. It might be a good idea if ID integers can be avoided somehow. But you'll definitely have the best picture of the whole thing, so it may or may not apply. Thanks, -- Peter Xu