On 18.07.23 16:25, Stefan Hajnoczi wrote:
On Tue, Jul 11, 2023 at 05:52:23PM +0200, Hanna Czenczek wrote:
When stopping the VM, qemu wants all devices to fully cease any
operation, too. Currently, we can only have vhost-user back-ends stop
processing vrings, but not background operations. Add the SUSPEND and
RESUME commands from vDPA, which allow the front-end (qemu) to tell
back-ends to cease all operations, including those running in the
background.
qemu's current work-around for this is to reset the back-end instead of
suspending it, which will not work for back-ends that have internal
state that must be preserved across e.g. stop/cont.
Note that the given specification requires the back-end to delay
processing kicks (i.e. starting vrings) until the device is resumed,
instead of requiring the front-end not to send kicks while suspended.
qemu starts devices (and would just resume them) only when the VM is in
a running state, so it would be difficult to have qemu delay kicks until
the device is resumed, which is why this patch specifies handling of
kicks as it does.
Signed-off-by: Hanna Czenczek <hre...@redhat.com>
---
docs/interop/vhost-user.rst | 35 +++++++++++++++++++++++++++++++++--
1 file changed, 33 insertions(+), 2 deletions(-)
diff --git a/docs/interop/vhost-user.rst b/docs/interop/vhost-user.rst
index 5a070adbc1..ac6be34c4c 100644
--- a/docs/interop/vhost-user.rst
+++ b/docs/interop/vhost-user.rst
@@ -381,6 +381,10 @@ readable) on the descriptor specified by
``VHOST_USER_SET_VRING_KICK``
or receiving the in-band message ``VHOST_USER_VRING_KICK`` if negotiated,
and stop ring upon receiving ``VHOST_USER_GET_VRING_BASE``.
+While the back-end is suspended (via ``VHOST_USER_SUSPEND``), it must
+never process rings, and thus also delay handling kicks until the
If you respin this series, I suggest replacing "never" with "not" to
emphasize that ring processing is only skipped while the device is
suspended (rather than forever). "Never" feels too strong to use when
describing a temporary state.
Sure.
+back-end is resumed again.
+
Rings can be enabled or disabled by ``VHOST_USER_SET_VRING_ENABLE``.
If ``VHOST_USER_F_PROTOCOL_FEATURES`` has not been negotiated, the
@@ -479,8 +483,9 @@ supplied by ``VhostUserLog``.
ancillary data, it may be used to inform the front-end that the log has
been modified.
-Once the source has finished migration, rings will be stopped by the
-source. No further update must be done before rings are restarted.
+Once the source has finished migration, the device will be suspended and
+its rings will be stopped by the source. No further update must be done
+before the device and its rings are resumed.
This paragraph is abstract and doesn't directly identify the mechanisms
or who does what:
- "the device will be suspended" via VHOST_USER_SUSPEND (or reset when
VHOST_USER_SUSPEND is not supported?) or automatically by the device
itself or some other mechanism?
OK, I’ll add VHOST_USER_SUSPEND.
So far I hadn’t considered making a note of resetting as a fallback
right in the specification. I don’t think I would want it in the
specification, but on the other hand, it is probably important for
back-end authors to know that if they do not implement SUSPEND, their
device is going to be reset by qemu.
Can we make that a ”may”, i.e.:
```
Once the source has finished migration, the device will be suspended via
VHOST_USER_SUSPEND and its rings will be stopped by the source. No
further update must be done before the device and its rings are resumed.
If and only if the back-end does not support VHOST_USER_SUSPEND, the
front-end may reset it instead (via VHOST_USER_SET_STATUS,
VHOST_USER_RESET_DEVICE, or VHOST_USER_RESET_OWNER).
```
I’m unsure about the “If and only if” – older qemu versions will break
this, but I feel like we must promise back-end writers that if they
implement SUSPEND, their back-end is not going to be reset; if it is,
and something breaks because of it, it’s the front-end that must be
updated to match the specification.
- "before the device and its rings are resumed" via VHOST_USER_RESUME?
And is this referring to the source device?
Yes, via VHOST_USER_RESUME, and restarting the rings by starting them
(i.e. a kick).
Whether this is referring to the source device… Well, the text as it
was before begs the exact same question, so honestly, I don’t know for
sure. “Restarting” only makes sense if the rings were stopped before,
so I assume it’s referring to the source, e.g. for the case of a failed
migration. RESUME at least definitely will only happen after a prior
SUSPEND, so this one will definitely only apply on the source side.
Please rephrase the paragraph to identify the vhost-user messages
involved.
In postcopy migration the back-end is started before all the memory has
been received from the source host, and care must be taken to avoid
@@ -885,6 +890,7 @@ Protocol features
#define VHOST_USER_PROTOCOL_F_CONFIGURE_MEM_SLOTS 15
#define VHOST_USER_PROTOCOL_F_STATUS 16
#define VHOST_USER_PROTOCOL_F_XEN_MMAP 17
+ #define VHOST_USER_PROTOCOL_F_SUSPEND 18
Front-end message types
-----------------------
@@ -1440,6 +1446,31 @@ Front-end message types
query the back-end for its device status as defined in the Virtio
specification.
+``VHOST_USER_SUSPEND``
+ :id: 41
+ :equivalent ioctl: VHOST_VDPA_SUSPEND
+ :request payload: N/A
+ :reply payload: N/A
+
+ When the ``VHOST_USER_PROTOCOL_F_SUSPEND`` protocol feature has been
+ successfully negotiated, this message is submitted by the front-end to
+ have the back-end cease all operations except for handling vhost-user
+ requests. The back-end must stop processing all virt queues, and it
+ must not perform any background operations. It may not resume until a
"background operations" are not defined. What does it mean:
- Anything that writes to memory slots
- Anything that changes the visible state of the device
- Anything that changes the non-visible internal state of the device
- etc
?
My best answer (honestly) is: You tell me. This series is introducing
SUSPEND/RESUME because qemu wants to reset devices to make them stop
“background operations”, and this would break virtiofsd if any form of
reset were actually implemented. The implementation of SUSPEND/RESUME
in virtiofsd on the other hand is supposed to basically be a no-op
(besides delaying ring processing until a RESUME, but even if we
processed them before, i.e. really make SUSPEND/RESUME no-ops, it would
most likely work out fine), so I have no idea what kind of background
operations we are even talking about, or whether any such actually exist
in practice.
I don’t know what anyone had in mind when introducing the RESET. It
comes from vDPA (c3716f260bf moved it from vdpa into the common code),
and exists since the code was originally added (108a64818e6), so there’s
no comment explaining why it exists. I can’t explain what the back-end
is supposed to stop doing, because so far it isn’t explained anywhere in
qemu, its git log, or in any documentation (there basically is no vdpa
documentation).
I can only say what I just completely naïvely assumed it to mean so far:
That the back-end basically should stop doing anything besides handling
and replying to vhost-user messages. If such a message requires
changing any state, visible or not, then this state change is permissible.
+ subsequent ``VHOST_USER_RESUME`` call.
+
+``VHOST_USER_RESUME``
+ :id: 42
+ :equivalent ioctl: VHOST_VDPA_RESUME
+ :request payload: N/A
+ :reply payload: N/A
+
+ When the ``VHOST_USER_PROTOCOL_F_SUSPEND`` protocol feature has been
+ successfully negotiated, this message is submitted by the front-end to
+ allow the back-end to resume operations after having been suspended
+ before. The back-end must again begin processing rings that are not
This can be made more concrete by referencing the vhost-user message
used to suspend the device:
"suspended before" -> "suspended with VHOST_USER_SUSPEND"
Alright.
Hanna
+ stopped, and it may resume background operations.
+
Back-end message types
----------------------
--
2.41.0