On 2/19/25 21:34, Maciej S. Szmigiero wrote:
From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com>

Update the VFIO documentation at docs/devel/migration describing the
changes brought by the multifd device state transfer.

Signed-off-by: Maciej S. Szmigiero <maciej.szmigi...@oracle.com>
---
  docs/devel/migration/vfio.rst | 80 +++++++++++++++++++++++++++++++----
  1 file changed, 71 insertions(+), 9 deletions(-)

diff --git a/docs/devel/migration/vfio.rst b/docs/devel/migration/vfio.rst
index c49482eab66d..d9b169d29921 100644
--- a/docs/devel/migration/vfio.rst
+++ b/docs/devel/migration/vfio.rst
@@ -16,6 +16,37 @@ helps to reduce the total downtime of the VM. VFIO devices 
opt-in to pre-copy
  support by reporting the VFIO_MIGRATION_PRE_COPY flag in the
  VFIO_DEVICE_FEATURE_MIGRATION ioctl.

Please add a new "multifd" documentation subsection at the end of the file
with this part :

+Starting from QEMU version 10.0 there's a possibility to transfer VFIO device
+_STOP_COPY state via multifd channels. This helps reduce downtime - especially
+with multiple VFIO devices or with devices having a large migration state.
+As an additional benefit, setting the VFIO device to _STOP_COPY state and
+saving its config space is also parallelized (run in a separate thread) in
+such migration mode.
+
+The multifd VFIO device state transfer is controlled by
+"x-migration-multifd-transfer" VFIO device property. This property defaults to
+AUTO, which means that VFIO device state transfer via multifd channels is
+attempted in configurations that otherwise support it.
+

I was expecting a much more detailed explanation on the design too  :

 * in the cover letter
 * in the hw/vfio/migration-multifd.c
 * in some new file under docs/devel/migration/



This section :

+Since the target QEMU needs to load device state buffers in-order it needs to
+queue incoming buffers until they can be loaded into the device.
+This means that a malicious QEMU source could theoretically cause the target
+QEMU to allocate unlimited amounts of memory for such buffers-in-flight.
+
+The "x-migration-max-queued-buffers" property allows capping the maximum count
+of these VFIO device state buffers queued at the destination.
+
+Because a malicious QEMU source causing OOM on the target is not expected to be
+a realistic threat in most of VFIO live migration use cases and the right value
+depends on the particular setup by default this queued buffers limit is
+disabled by setting it to UINT64_MAX.

should be in patch 34. It is not obvious it will be merged.


This section :

+Some host platforms (like ARM64) require that VFIO device config is loaded only
+after all iterables were loaded.
+Such interlocking is controlled by "x-migration-load-config-after-iter" VFIO
+device property, which in its default setting (AUTO) does so only on platforms
+that actually require it.

Should be in 35. Same reason.


  When pre-copy is supported, it's possible to further reduce downtime by
  enabling "switchover-ack" migration capability.
  VFIO migration uAPI defines "initial bytes" as part of its pre-copy data 
stream
@@ -67,14 +98,39 @@ VFIO implements the device hooks for the iterative approach 
as follows:
  * A ``switchover_ack_needed`` function that checks if the VFIO device uses
    "switchover-ack" migration capability when this capability is enabled.
-* A ``save_state`` function to save the device config space if it is present.
-
-* A ``save_live_complete_precopy`` function that sets the VFIO device in
-  _STOP_COPY state and iteratively copies the data for the VFIO device until
-  the vendor driver indicates that no data remains.
-
-* A ``load_state`` function that loads the config section and the data
-  sections that are generated by the save functions above.
+* A ``switchover_start`` function that in the multifd mode starts a thread that
+  reassembles the multifd received data and loads it in-order into the device.
+  In the non-multifd mode this function is a NOP.
+
+* A ``save_state`` function to save the device config space if it is present
+  in the non-multifd mode.
+  In the multifd mode it just emits either a dummy EOS marker or
+  "all iterables were loaded" flag for configurations that need to defer
+  loading device config space after them.
+
+* A ``save_live_complete_precopy`` function that in the non-multifd mode sets
+  the VFIO device in _STOP_COPY state and iteratively copies the data for the
+  VFIO device until the vendor driver indicates that no data remains.
+  In the multifd mode it just emits a dummy EOS marker.
+
+* A ``save_live_complete_precopy_thread`` function that in the multifd mode
+  provides thread handler performing multifd device state transfer.
+  It sets the VFIO device to _STOP_COPY state, iteratively reads the data
+  from the VFIO device and queues it for multifd transmission until the vendor
+  driver indicates that no data remains.
+  After that, it saves the device config space and queues it for multifd
+  transfer too.
+  In the non-multifd mode this thread is a NOP.
+
+* A ``load_state`` function that loads the data sections that are generated
+  by the main migration channel save functions above.
+  In the non-multifd mode it also loads the config section, while in the
+  multifd mode it handles the optional "all iterables were loaded" flag if
+  it is in use.
+
+* A ``load_state_buffer`` function that loads the device state and the device
+  config that arrived via multifd channels.
+  It's used only in the multifd mode.

Please move the documentation of the new migration handlers in the
patch introducing them.


Thanks,

C.



* ``cleanup`` functions for both save and load that perform any migration
    related cleanup.
@@ -176,8 +232,11 @@ Live migration save path
                  Then the VFIO device is put in _STOP_COPY state
                       (FINISH_MIGRATE, _ACTIVE, _STOP_COPY)
           .save_live_complete_precopy() is called for each active device
-      For the VFIO device, iterate in .save_live_complete_precopy() until
+              For the VFIO device: in the non-multifd mode iterate in
+                        .save_live_complete_precopy() until
                                 pending data is 0
+                 In the multifd mode this iteration is done in
+                 .save_live_complete_precopy_thread() instead.
                                        |
                       (POSTMIGRATE, _COMPLETED, _STOP_COPY)
              Migraton thread schedules cleanup bottom half and exits
@@ -194,6 +253,9 @@ Live migration resume path
                            (RESTORE_VM, _ACTIVE, _STOP)
                                        |
       For each device, .load_state() is called for that device section data
+                 transmitted via the main migration channel.
+     For data transmitted via multifd channels .load_state_buffer() is called
+                                   instead.
                          (RESTORE_VM, _ACTIVE, _RESUMING)
                                        |
    At the end, .load_cleanup() is called for each device and vCPUs are started



Reply via email to