Re: [PATCH V4 14/19] migration: cpr-transfer mode

Steven Sistare Wed, 11 Dec 2024 14:06:47 -0800

On 12/10/2024 7:26 AM, Markus Armbruster wrote:

Steve Sistare <steven.sist...@oracle.com> writes:

Add the cpr-transfer migration mode.  Usage:

   qemu-system-$arch -machine aux-ram-share=on ...

   start new QEMU with "-incoming <main-uri> -incoming <cpr-channel>"

   Issue commands to old QEMU:
     migrate_set_parameter mode cpr-transfer

     {"execute": "migrate", ...
         {"channel-type": "main"...}, {"channel-type": "cpr"...} ... }


Much technical detail here that won't make sense to the reader until
further down, but next to nothing on what the thing actually
accomplishes.  Makes the commit message unnecessarily hard to
understand.  But please read on.

The migrate command stops the VM, saves CPR state to cpr-channel, saves
normal migration state to main-uri, and old QEMU enters the postmigrate
state.  The user starts new QEMU on the same host as old QEMU, with the
same arguments as old QEMU,


Any additional requirements over traditional migration?

There, "same arguments" is sufficient, but not necessary.  For instance,
changing certain backends is quite possible.


No additional requirements over traditional migration.
AFAIK there is no user documentation on what arguments must be specified
to new QEMU during a migration.  No words about "same arguments", or even
"same VM".  I am trying to give some guidance where none currently exists,
in this commit message and in QAPI for CPR.

Perhaps this is better:
  The user starts new QEMU on the same host as old QEMU, with command-line
  arguments to create the same machine, plus the -incoming option for the
  main migration channel, like normal live migration.  In addition, the
  user adds a second -incoming option with channel type "cpr", which matches
  the cpr channel of the migrate command issued to old QEMU.

                             plus two -incoming options.


Two -incoming options to define two migration channels, the traditional
one of MigrationChannelType "main", and an another one of
MigrationChannelType "cpr"?


Yes.  I will elaborate.

                                                          Guest RAM is
preserved in place, albeit with new virtual addresses in new QEMU.

This mode requires a second migration channel of type "cpr", in the
channel arguments on the outgoing side, and in a second -incoming
command-line parameter on the incoming side.

Memory-backend objects must have the share=on attribute, but
memory-backend-epc is not supported.  The VM must be started with
the '-machine aux-ram-share=on' option, which allows anonymous
memory to be transferred in place to the new process.  The memfds
are kept open by sending the descriptors to new QEMU via the CPR
channel, which must support SCM_RIGHTS, and they are mmap'd in new QEMU.

The implementation splits qmp_migrate into start and finish functions.
Start sends CPR state to new QEMU, which responds by closing the CPR
channel.  Old QEMU detects the HUP then calls finish, which connects
the main migration channel.

Signed-off-by: Steve Sistare <steven.sist...@oracle.com>


I'd lead with a brief explanation of the feature and its benefits.
Could steam from the cover letter like this:

   New migration mode cpr-transfer mode enables transferring a guest to a
   new QEMU instance on the same host with minimal guest pause time, by
   preserving guest RAM in place, albeit with new virtual addresses in
   new QEMU, and by preserving device file descriptors.

Then talk about required special setup.  I see aux-ram-share=on.
Anything else?  Any differences between source and destination QEMU
there?

Then talk about the two channels.  First what they do, second how to
create their destination end with -incoming, third how to create their
source end with "migrate".

Finally mention whatever technical detail you believe needs mentioning
here.


I'll work on it.

[...]

diff --git a/qapi/migration.json b/qapi/migration.json
index a26960b..1bc963f 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -614,9 +614,44 @@
  #     or COLO.
  #
  #     (since 8.2)
+#
+# @cpr-transfer: This mode allows the user to transfer a guest to a
+#     new QEMU instance on the same host with minimal guest pause
+#     time, by preserving guest RAM in place, albeit with new virtual
+#     addresses in new QEMU.
+#
+#     The user starts new QEMU on the same host as old QEMU, with the
+#     the same arguments as old QEMU, plus the -incoming option.


Two of them?


Yes, I will say more.

+#                                                                 The
+#     user issues the migrate command to old QEMU, which stops the VM,
+#     saves state to the migration channels, and enters the
+#     postmigrate state.  Execution resumes in new QEMU.


The commit message also mentions file descriptors are migrared over.
Worth mentioning here, too?


IMO no.  The user cannot observe that aspect, so they don't need to know.
It's an implementation detail.

+#
+#     This mode requires a second migration channel type "cpr" in the
+#     channel arguments on the outgoing side.  The channel must be a
+#     type, such as unix socket, that supports SCM_RIGHTS.  However,


This is vague.  Would anything but a UNIX domain socket work?


I debated what to say here. One could specify an "exec" type, in which the
executed command creates a unix domain socket.  However, that is only likely to
occur to a small fraction of clever users.  I could simplify the description,
and let the clever ones realize they can fudge it using exec.

Applies to both source and destination end?


Yes.  It is generally understood that the same specification for a migration
channel applies to both ends.  But not documented anywhere AFAIK.  And again a
clever user could specify a socket URI on one side and an exec URI on the
other whose command connects to the socket.  All true for normal migration.

+#     the cpr channel cannot be added to the list of channels for a
+#     migrate-incoming command, because it must be read before new
+#     QEMU opens a monitor.


Ugh!  Remind me, why is that the case?


The cpr channel (containing preserved file descriptors) must be read before
objects are initialized, which occurs before the monitor is opened.

+#                            Instead, the user passes the channel as a
+#     second -incoming command-line argument to new QEMU using JSON
+#     syntax.
+#
+#     Memory-backend objects must have the share=on attribute, but
+#     memory-backend-epc is not supported.  The VM must be started
+#     with the '-machine aux-ram-share=on' option.


What happens when the conditions aren't met?  migrate command fails
with a useful error message?


Yes, via a migration blocker.

+#
+#     The incoming migration channel cannot be a file type, and for
+#     the tcp type, the port cannot be 0 (meaning dynamically choose
+#     a port).


Which of the two channels are you discussing?


main.  I will clarify.

+#
+#     When using -incoming defer, you must issue the migrate command
+#     to old QEMU before issuing any monitor commands to new QEMU.


I'm confused.  Not even qmp_capabilities?  Why?


Because of the ordering dependency.  Must load CPR state fd's, before device 
initialization,
which occurs before monitor initialization.  The migrate command sends CPR fds 
which releases
all the above.

- Steve

+#     However, new QEMU does not open and read the migration stream
+#     until you issue the migrate incoming command.
+#
+#     (since 10.0)
  ##
  { 'enum': 'MigMode',
-  'data': [ 'normal', 'cpr-reboot' ] }
+  'data': [ 'normal', 'cpr-reboot', 'cpr-transfer' ] }

##

  # @ZeroPageDetection:


[...]

Re: [PATCH V4 14/19] migration: cpr-transfer mode

Reply via email to