On Tue, Feb 04, 2025 at 01:40:34PM +0000, Peter Maydell wrote:
> On Wed, 29 Jan 2025 at 16:11, Fabiano Rosas <faro...@suse.de> wrote:
> >
> > From: Steve Sistare <steven.sist...@oracle.com>
> >
> > Add the cpr-transfer migration mode, which allows the user to transfer
> > a guest to a new QEMU instance on the same host with minimal guest pause
> > time, by preserving guest RAM in place, albeit with new virtual addresses
> > in new QEMU, and by preserving device file descriptors.  Pages that were
> > locked in memory for DMA in old QEMU remain locked in new QEMU, because the
> > descriptor of the device that locked them remains open.
> >
> > cpr-transfer preserves memory and devices descriptors by sending them to
> > new QEMU over a unix domain socket using SCM_RIGHTS.  Such CPR state cannot
> > be sent over the normal migration channel, because devices and backends
> > are created prior to reading the channel, so this mode sends CPR state
> > over a second "cpr" migration channel.  New QEMU reads the cpr channel
> > prior to creating devices or backends.  The user specifies the cpr channel
> > in the channel arguments on the outgoing side, and in a second -incoming
> > command-line parameter on the incoming side.
> >
> > The user must start old QEMU with the the '-machine aux-ram-share=on' 
> > option,
> > which allows anonymous memory to be transferred in place to the new process
> > by transferring a memory descriptor for each ram block.  Memory-backend
> > objects must have the share=on attribute, but memory-backend-epc is not
> > supported.
> >
> > The user starts new QEMU on the same host as old QEMU, with command-line
> > arguments to create the same machine, plus the -incoming option for the
> > main migration channel, like normal live migration.  In addition, the user
> > adds a second -incoming option with channel type "cpr".  This CPR channel
> > must support file descriptor transfer with SCM_RIGHTS, i.e. it must be a
> > UNIX domain socket.
> >
> > To initiate CPR, the user issues a migrate command to old QEMU, adding
> > a second migration channel of type "cpr" in the channels argument.
> > Old QEMU stops the VM, saves state to the migration channels, and enters
> > the postmigrate state.  New QEMU mmap's memory descriptors, and execution
> > resumes.
> >
> > The implementation splits qmp_migrate into start and finish functions.
> > Start sends CPR state to new QEMU, which responds by closing the CPR
> > channel.  Old QEMU detects the HUP then calls finish, which connects the
> > main migration channel.
> >
> > In summary, the usage is:
> >
> >   qemu-system-$arch -machine aux-ram-share=on ...
> >
> >   start new QEMU with "-incoming <main-uri> -incoming <cpr-channel>"
> >
> >   Issue commands to old QEMU:
> >     migrate_set_parameter mode cpr-transfer
> >
> >     {"execute": "migrate", ...
> >         {"channel-type": "main"...}, {"channel-type": "cpr"...} ... }
> >
> > Signed-off-by: Steve Sistare <steven.sist...@oracle.com>
> > Reviewed-by: Peter Xu <pet...@redhat.com>
> > Acked-by: Markus Armbruster <arm...@redhat.com>
> > Link: 
> > https://lore.kernel.org/r/1736967650-129648-17-git-send-email-steven.sist...@oracle.com
> > Signed-off-by: Fabiano Rosas <faro...@suse.de>
> 
> Hi; this commit includes some code that has confused
> Coverity (CID 1590980) and it also confused me, so maybe
> it could be usefully made clearer?
> 
> 
> >  void qmp_migrate(const char *uri, bool has_channels,
> >                   MigrationChannelList *channels, bool has_detach, bool 
> > detach,
> >                   bool has_resume, bool resume, Error **errp)
> > @@ -2056,6 +2118,7 @@ void qmp_migrate(const char *uri, bool has_channels,
> >      g_autoptr(MigrationChannel) channel = NULL;
> >      MigrationAddress *addr = NULL;
> >      MigrationChannel *channelv[MIGRATION_CHANNEL_TYPE__MAX] = { NULL };
> > +    MigrationChannel *cpr_channel = NULL;
> >
> >      /*
> >       * Having preliminary checks for uri and channel
> > @@ -2076,6 +2139,7 @@ void qmp_migrate(const char *uri, bool has_channels,
> >              }
> >              channelv[type] = channels->value;
> >          }
> > +        cpr_channel = channelv[MIGRATION_CHANNEL_TYPE_CPR];
> >          addr = channelv[MIGRATION_CHANNEL_TYPE_MAIN]->addr;
> >          if (!addr) {
> >              error_setg(errp, "Channel list has no main entry");
> > @@ -2096,12 +2160,52 @@ void qmp_migrate(const char *uri, bool has_channels,
> >          return;
> >      }
> >
> > +    if (s->parameters.mode == MIG_MODE_CPR_TRANSFER && !cpr_channel) {
> > +        error_setg(errp, "missing 'cpr' migration channel");
> > +        return;
> > +    }
> 
> Here in qmp_migrate() we bail out if cpr_channel is NULL,
> provided that s->parameters.mode is MIG_MODE_CPR_TRANSFER...
> 
> > +
> >      resume_requested = has_resume && resume;
> >      if (!migrate_prepare(s, resume_requested, errp)) {
> >          /* Error detected, put into errp */
> >          return;
> >      }
> >
> > +    if (cpr_state_save(cpr_channel, &local_err)) {
> 
> ...but in cpr_state_save() when we decide whether to dereference
> cpr_channel or not, we aren't checking s->parameters.mode,
> we call migrate_mode() and check the result of that.
> And migrate_mode() isn't completely trivial: it calls
> cpr_get_incoming_mode(), so it's not obvious that it's
> necessarily going to be the same value as s->parameters.mode.
> So Coverity complains that it sees a code path where we
> might dereference cpr_channel even when it's NULL.
> 
> Could this be made a bit clearer somehow, do you think?

That migrate_mode() is indeed tricky, and should only be needed for
incoming side QEMU to workaround current limitation that the migration
parameter "mode" cannot be set as early as when cpr_state_load() happens..

I think we could check s->parameters.mode here before doing
cpr_state_save(), it can also be more readable.

Steve, do you want to send a patch?

-- 
Peter Xu


Reply via email to