On 27.02.2025 07:59, Cédric Le Goater wrote:
On 2/19/25 21:34, Maciej S. Szmigiero wrote:
From: "Maciej S. Szmigiero" <maciej.szmigi...@oracle.com>

Update the VFIO documentation at docs/devel/migration describing the
changes brought by the multifd device state transfer.

Signed-off-by: Maciej S. Szmigiero <maciej.szmigi...@oracle.com>
---
  docs/devel/migration/vfio.rst | 80 +++++++++++++++++++++++++++++++----
  1 file changed, 71 insertions(+), 9 deletions(-)

diff --git a/docs/devel/migration/vfio.rst b/docs/devel/migration/vfio.rst
index c49482eab66d..d9b169d29921 100644
--- a/docs/devel/migration/vfio.rst
+++ b/docs/devel/migration/vfio.rst
@@ -16,6 +16,37 @@ helps to reduce the total downtime of the VM. VFIO devices 
opt-in to pre-copy
  support by reporting the VFIO_MIGRATION_PRE_COPY flag in the
  VFIO_DEVICE_FEATURE_MIGRATION ioctl.

Please add a new "multifd" documentation subsection at the end of the file
with this part :

+Starting from QEMU version 10.0 there's a possibility to transfer VFIO device
+_STOP_COPY state via multifd channels. This helps reduce downtime - especially
+with multiple VFIO devices or with devices having a large migration state.
+As an additional benefit, setting the VFIO device to _STOP_COPY state and
+saving its config space is also parallelized (run in a separate thread) in
+such migration mode.
+
+The multifd VFIO device state transfer is controlled by
+"x-migration-multifd-transfer" VFIO device property. This property defaults to
+AUTO, which means that VFIO device state transfer via multifd channels is
+attempted in configurations that otherwise support it.
+

Done - I also moved the parts about x-migration-max-queued-buffers
and x-migration-load-config-after-iter description there since
obviously they wouldn't make sense being left alone in the top section.

I was expecting a much more detailed explanation on the design too  :

  * in the cover letter
  * in the hw/vfio/migration-multifd.c
  * in some new file under docs/devel/migration/


I'm not sure what descriptions you exactly want in these places, but since
that's just documentation (not code) it could be added after the code freeze...


This section :

+Since the target QEMU needs to load device state buffers in-order it needs to
+queue incoming buffers until they can be loaded into the device.
+This means that a malicious QEMU source could theoretically cause the target
+QEMU to allocate unlimited amounts of memory for such buffers-in-flight.
+
+The "x-migration-max-queued-buffers" property allows capping the maximum count
+of these VFIO device state buffers queued at the destination.
+
+Because a malicious QEMU source causing OOM on the target is not expected to be
+a realistic threat in most of VFIO live migration use cases and the right value
+depends on the particular setup by default this queued buffers limit is
+disabled by setting it to UINT64_MAX.

should be in patch 34. It is not obvious it will be merged.


...which brings us to this point.

I think by this point in time (less then 2 weeks to code freeze) we should
finally decide what is going to be included in the patch set.

This way this patch set could be well tested in its final form rather than
having significant parts taken out of it at the eleventh hour.

If the final form is known also the documentation can be adjusted accordingly
and user/admin documentation eventually written once the code is considered
okay.

I though we discussed a few times the rationale behind both
x-migration-max-queued-buffers and x-migration-load-config-after-iter properties
but if you still have some concerns there please let me know before I prepare
the next version of this patch set so I know whether to include these.

This section :

+Some host platforms (like ARM64) require that VFIO device config is loaded only
+after all iterables were loaded.
+Such interlocking is controlled by "x-migration-load-config-after-iter" VFIO
+device property, which in its default setting (AUTO) does so only on platforms
+that actually require it.

Should be in 35. Same reason.


  When pre-copy is supported, it's possible to further reduce downtime by
  enabling "switchover-ack" migration capability.
  VFIO migration uAPI defines "initial bytes" as part of its pre-copy data 
stream
@@ -67,14 +98,39 @@ VFIO implements the device hooks for the iterative approach 
as follows:
  * A ``switchover_ack_needed`` function that checks if the VFIO device uses
    "switchover-ack" migration capability when this capability is enabled.
-* A ``save_state`` function to save the device config space if it is present.
-
-* A ``save_live_complete_precopy`` function that sets the VFIO device in
-  _STOP_COPY state and iteratively copies the data for the VFIO device until
-  the vendor driver indicates that no data remains.
-
-* A ``load_state`` function that loads the config section and the data
-  sections that are generated by the save functions above.
+* A ``switchover_start`` function that in the multifd mode starts a thread that
+  reassembles the multifd received data and loads it in-order into the device.
+  In the non-multifd mode this function is a NOP.
+
+* A ``save_state`` function to save the device config space if it is present
+  in the non-multifd mode.
+  In the multifd mode it just emits either a dummy EOS marker or
+  "all iterables were loaded" flag for configurations that need to defer
+  loading device config space after them.
+
+* A ``save_live_complete_precopy`` function that in the non-multifd mode sets
+  the VFIO device in _STOP_COPY state and iteratively copies the data for the
+  VFIO device until the vendor driver indicates that no data remains.
+  In the multifd mode it just emits a dummy EOS marker.
+
+* A ``save_live_complete_precopy_thread`` function that in the multifd mode
+  provides thread handler performing multifd device state transfer.
+  It sets the VFIO device to _STOP_COPY state, iteratively reads the data
+  from the VFIO device and queues it for multifd transmission until the vendor
+  driver indicates that no data remains.
+  After that, it saves the device config space and queues it for multifd
+  transfer too.
+  In the non-multifd mode this thread is a NOP.
+
+* A ``load_state`` function that loads the data sections that are generated
+  by the main migration channel save functions above.
+  In the non-multifd mode it also loads the config section, while in the
+  multifd mode it handles the optional "all iterables were loaded" flag if
+  it is in use.
+
+* A ``load_state_buffer`` function that loads the device state and the device
+  config that arrived via multifd channels.
+  It's used only in the multifd mode.

Please move the documentation of the new migration handlers in the
patch introducing them.


Thanks,

C.


Thanks,
Maciej


Reply via email to