On Tue, Dec 24, 2024 at 08:17:00AM -0800, Steve Sistare wrote: > Add the cpr-transfer migration mode, which allows the user to transfer > a guest to a new QEMU instance on the same host with minimal guest pause > time, by preserving guest RAM in place, albeit with new virtual addresses > in new QEMU, and by preserving device file descriptors. Pages that were > locked in memory for DMA in old QEMU remain locked in new QEMU, because the > descriptor of the device that locked them remains open. > > cpr-transfer preserves memory and devices descriptors by sending them to > new QEMU over a unix domain socket using SCM_RIGHTS. Such CPR state cannot > be sent over the normal migration channel, because devices and backends > are created prior to reading the channel, so this mode sends CPR state > over a second "cpr" migration channel. New QEMU reads the cpr channel > prior to creating devices or backends. The user specifies the cpr channel > in the channel arguments on the outgoing side, and in a second -incoming > command-line parameter on the incoming side. > > The user must start old QEMU with the the '-machine aux-ram-share=on' option, > which allows anonymous memory to be transferred in place to the new process > by transferring a memory descriptor for each ram block. Memory-backend > objects must have the share=on attribute, but memory-backend-epc is not > supported. > > The user starts new QEMU on the same host as old QEMU, with command-line > arguments to create the same machine, plus the -incoming option for the > main migration channel, like normal live migration. In addition, the user > adds a second -incoming option with channel type "cpr". The CPR channel > address must be a type, such as unix socket, that supports SCM_RIGHTS. > > To initiate CPR, the user issues a migrate command to old QEMU, adding > a second migration channel of type "cpr" in the channels argument. > Old QEMU stops the VM, saves state to the migration channels, and enters > the postmigrate state. New QEMU mmap's memory descriptors, and execution > resumes. > > The implementation splits qmp_migrate into start and finish functions. > Start sends CPR state to new QEMU, which responds by closing the CPR > channel. Old QEMU detects the HUP then calls finish, which connects the > main migration channel. > > In summary, the usage is: > > qemu-system-$arch -machine aux-ram-share=on ... > > start new QEMU with "-incoming <main-uri> -incoming <cpr-channel>" > > Issue commands to old QEMU: > migrate_set_parameter mode cpr-transfer > > {"execute": "migrate", ... > {"channel-type": "main"...}, {"channel-type": "cpr"...} ... } > > Signed-off-by: Steve Sistare <steven.sist...@oracle.com>
Feel free to take: Reviewed-by: Peter Xu <pet...@redhat.com> I still have a few trivial comments. [...] > diff --git a/migration/cpr.c b/migration/cpr.c > index 87bcfdb..584b0b9 100644 > --- a/migration/cpr.c > +++ b/migration/cpr.c > @@ -45,7 +45,7 @@ static const VMStateDescription vmstate_cpr_fd = { > VMSTATE_UINT32(namelen, CprFd), > VMSTATE_VBUFFER_ALLOC_UINT32(name, CprFd, 0, NULL, namelen), > VMSTATE_INT32(id, CprFd), Could you remind me again on when id!=0 will start to be used? > - VMSTATE_INT32(fd, CprFd), > + VMSTATE_FD(fd, CprFd), > VMSTATE_END_OF_LIST() > } > }; [...] > @@ -320,6 +328,7 @@ void migration_cancel(const Error *error) > qmp_cancel_vcpu_dirty_limit(false, -1, NULL); > } > migrate_fd_cancel(current_migration); > + migrate_hup_delete(current_migration); migrate_fd_cancel() has one of such, not sure whether it's needed here. > } -- Peter Xu