On Tue, Feb 04, 2025 at 01:40:34PM +0000, Peter Maydell wrote: > On Wed, 29 Jan 2025 at 16:11, Fabiano Rosas <faro...@suse.de> wrote: > > > > From: Steve Sistare <steven.sist...@oracle.com> > > > > Add the cpr-transfer migration mode, which allows the user to transfer > > a guest to a new QEMU instance on the same host with minimal guest pause > > time, by preserving guest RAM in place, albeit with new virtual addresses > > in new QEMU, and by preserving device file descriptors. Pages that were > > locked in memory for DMA in old QEMU remain locked in new QEMU, because the > > descriptor of the device that locked them remains open. > > > > cpr-transfer preserves memory and devices descriptors by sending them to > > new QEMU over a unix domain socket using SCM_RIGHTS. Such CPR state cannot > > be sent over the normal migration channel, because devices and backends > > are created prior to reading the channel, so this mode sends CPR state > > over a second "cpr" migration channel. New QEMU reads the cpr channel > > prior to creating devices or backends. The user specifies the cpr channel > > in the channel arguments on the outgoing side, and in a second -incoming > > command-line parameter on the incoming side. > > > > The user must start old QEMU with the the '-machine aux-ram-share=on' > > option, > > which allows anonymous memory to be transferred in place to the new process > > by transferring a memory descriptor for each ram block. Memory-backend > > objects must have the share=on attribute, but memory-backend-epc is not > > supported. > > > > The user starts new QEMU on the same host as old QEMU, with command-line > > arguments to create the same machine, plus the -incoming option for the > > main migration channel, like normal live migration. In addition, the user > > adds a second -incoming option with channel type "cpr". This CPR channel > > must support file descriptor transfer with SCM_RIGHTS, i.e. it must be a > > UNIX domain socket. > > > > To initiate CPR, the user issues a migrate command to old QEMU, adding > > a second migration channel of type "cpr" in the channels argument. > > Old QEMU stops the VM, saves state to the migration channels, and enters > > the postmigrate state. New QEMU mmap's memory descriptors, and execution > > resumes. > > > > The implementation splits qmp_migrate into start and finish functions. > > Start sends CPR state to new QEMU, which responds by closing the CPR > > channel. Old QEMU detects the HUP then calls finish, which connects the > > main migration channel. > > > > In summary, the usage is: > > > > qemu-system-$arch -machine aux-ram-share=on ... > > > > start new QEMU with "-incoming <main-uri> -incoming <cpr-channel>" > > > > Issue commands to old QEMU: > > migrate_set_parameter mode cpr-transfer > > > > {"execute": "migrate", ... > > {"channel-type": "main"...}, {"channel-type": "cpr"...} ... } > > > > Signed-off-by: Steve Sistare <steven.sist...@oracle.com> > > Reviewed-by: Peter Xu <pet...@redhat.com> > > Acked-by: Markus Armbruster <arm...@redhat.com> > > Link: > > https://lore.kernel.org/r/1736967650-129648-17-git-send-email-steven.sist...@oracle.com > > Signed-off-by: Fabiano Rosas <faro...@suse.de> > > Hi; this commit includes some code that has confused > Coverity (CID 1590980) and it also confused me, so maybe > it could be usefully made clearer? > > > > void qmp_migrate(const char *uri, bool has_channels, > > MigrationChannelList *channels, bool has_detach, bool > > detach, > > bool has_resume, bool resume, Error **errp) > > @@ -2056,6 +2118,7 @@ void qmp_migrate(const char *uri, bool has_channels, > > g_autoptr(MigrationChannel) channel = NULL; > > MigrationAddress *addr = NULL; > > MigrationChannel *channelv[MIGRATION_CHANNEL_TYPE__MAX] = { NULL }; > > + MigrationChannel *cpr_channel = NULL; > > > > /* > > * Having preliminary checks for uri and channel > > @@ -2076,6 +2139,7 @@ void qmp_migrate(const char *uri, bool has_channels, > > } > > channelv[type] = channels->value; > > } > > + cpr_channel = channelv[MIGRATION_CHANNEL_TYPE_CPR]; > > addr = channelv[MIGRATION_CHANNEL_TYPE_MAIN]->addr; > > if (!addr) { > > error_setg(errp, "Channel list has no main entry"); > > @@ -2096,12 +2160,52 @@ void qmp_migrate(const char *uri, bool has_channels, > > return; > > } > > > > + if (s->parameters.mode == MIG_MODE_CPR_TRANSFER && !cpr_channel) { > > + error_setg(errp, "missing 'cpr' migration channel"); > > + return; > > + } > > Here in qmp_migrate() we bail out if cpr_channel is NULL, > provided that s->parameters.mode is MIG_MODE_CPR_TRANSFER... > > > + > > resume_requested = has_resume && resume; > > if (!migrate_prepare(s, resume_requested, errp)) { > > /* Error detected, put into errp */ > > return; > > } > > > > + if (cpr_state_save(cpr_channel, &local_err)) { > > ...but in cpr_state_save() when we decide whether to dereference > cpr_channel or not, we aren't checking s->parameters.mode, > we call migrate_mode() and check the result of that. > And migrate_mode() isn't completely trivial: it calls > cpr_get_incoming_mode(), so it's not obvious that it's > necessarily going to be the same value as s->parameters.mode. > So Coverity complains that it sees a code path where we > might dereference cpr_channel even when it's NULL. > > Could this be made a bit clearer somehow, do you think?
That migrate_mode() is indeed tricky, and should only be needed for incoming side QEMU to workaround current limitation that the migration parameter "mode" cannot be set as early as when cpr_state_load() happens.. I think we could check s->parameters.mode here before doing cpr_state_save(), it can also be more readable. Steve, do you want to send a patch? -- Peter Xu