On Wed, 29 Jan 2025 at 16:11, Fabiano Rosas <faro...@suse.de> wrote:
>
> From: Steve Sistare <steven.sist...@oracle.com>
>
> Add the cpr-transfer migration mode, which allows the user to transfer
> a guest to a new QEMU instance on the same host with minimal guest pause
> time, by preserving guest RAM in place, albeit with new virtual addresses
> in new QEMU, and by preserving device file descriptors.  Pages that were
> locked in memory for DMA in old QEMU remain locked in new QEMU, because the
> descriptor of the device that locked them remains open.
>
> cpr-transfer preserves memory and devices descriptors by sending them to
> new QEMU over a unix domain socket using SCM_RIGHTS.  Such CPR state cannot
> be sent over the normal migration channel, because devices and backends
> are created prior to reading the channel, so this mode sends CPR state
> over a second "cpr" migration channel.  New QEMU reads the cpr channel
> prior to creating devices or backends.  The user specifies the cpr channel
> in the channel arguments on the outgoing side, and in a second -incoming
> command-line parameter on the incoming side.
>
> The user must start old QEMU with the the '-machine aux-ram-share=on' option,
> which allows anonymous memory to be transferred in place to the new process
> by transferring a memory descriptor for each ram block.  Memory-backend
> objects must have the share=on attribute, but memory-backend-epc is not
> supported.
>
> The user starts new QEMU on the same host as old QEMU, with command-line
> arguments to create the same machine, plus the -incoming option for the
> main migration channel, like normal live migration.  In addition, the user
> adds a second -incoming option with channel type "cpr".  This CPR channel
> must support file descriptor transfer with SCM_RIGHTS, i.e. it must be a
> UNIX domain socket.
>
> To initiate CPR, the user issues a migrate command to old QEMU, adding
> a second migration channel of type "cpr" in the channels argument.
> Old QEMU stops the VM, saves state to the migration channels, and enters
> the postmigrate state.  New QEMU mmap's memory descriptors, and execution
> resumes.
>
> The implementation splits qmp_migrate into start and finish functions.
> Start sends CPR state to new QEMU, which responds by closing the CPR
> channel.  Old QEMU detects the HUP then calls finish, which connects the
> main migration channel.
>
> In summary, the usage is:
>
>   qemu-system-$arch -machine aux-ram-share=on ...
>
>   start new QEMU with "-incoming <main-uri> -incoming <cpr-channel>"
>
>   Issue commands to old QEMU:
>     migrate_set_parameter mode cpr-transfer
>
>     {"execute": "migrate", ...
>         {"channel-type": "main"...}, {"channel-type": "cpr"...} ... }
>
> Signed-off-by: Steve Sistare <steven.sist...@oracle.com>
> Reviewed-by: Peter Xu <pet...@redhat.com>
> Acked-by: Markus Armbruster <arm...@redhat.com>
> Link: 
> https://lore.kernel.org/r/1736967650-129648-17-git-send-email-steven.sist...@oracle.com
> Signed-off-by: Fabiano Rosas <faro...@suse.de>

Hi; this commit includes some code that has confused
Coverity (CID 1590980) and it also confused me, so maybe
it could be usefully made clearer?


>  void qmp_migrate(const char *uri, bool has_channels,
>                   MigrationChannelList *channels, bool has_detach, bool 
> detach,
>                   bool has_resume, bool resume, Error **errp)
> @@ -2056,6 +2118,7 @@ void qmp_migrate(const char *uri, bool has_channels,
>      g_autoptr(MigrationChannel) channel = NULL;
>      MigrationAddress *addr = NULL;
>      MigrationChannel *channelv[MIGRATION_CHANNEL_TYPE__MAX] = { NULL };
> +    MigrationChannel *cpr_channel = NULL;
>
>      /*
>       * Having preliminary checks for uri and channel
> @@ -2076,6 +2139,7 @@ void qmp_migrate(const char *uri, bool has_channels,
>              }
>              channelv[type] = channels->value;
>          }
> +        cpr_channel = channelv[MIGRATION_CHANNEL_TYPE_CPR];
>          addr = channelv[MIGRATION_CHANNEL_TYPE_MAIN]->addr;
>          if (!addr) {
>              error_setg(errp, "Channel list has no main entry");
> @@ -2096,12 +2160,52 @@ void qmp_migrate(const char *uri, bool has_channels,
>          return;
>      }
>
> +    if (s->parameters.mode == MIG_MODE_CPR_TRANSFER && !cpr_channel) {
> +        error_setg(errp, "missing 'cpr' migration channel");
> +        return;
> +    }

Here in qmp_migrate() we bail out if cpr_channel is NULL,
provided that s->parameters.mode is MIG_MODE_CPR_TRANSFER...

> +
>      resume_requested = has_resume && resume;
>      if (!migrate_prepare(s, resume_requested, errp)) {
>          /* Error detected, put into errp */
>          return;
>      }
>
> +    if (cpr_state_save(cpr_channel, &local_err)) {

...but in cpr_state_save() when we decide whether to dereference
cpr_channel or not, we aren't checking s->parameters.mode,
we call migrate_mode() and check the result of that.
And migrate_mode() isn't completely trivial: it calls
cpr_get_incoming_mode(), so it's not obvious that it's
necessarily going to be the same value as s->parameters.mode.
So Coverity complains that it sees a code path where we
might dereference cpr_channel even when it's NULL.

Could this be made a bit clearer somehow, do you think?

thanks
-- PMM

Reply via email to