On 01.04.25 20:07, Vladimir Sementsov-Ogievskiy wrote:
For now we only log the vhost device error, when virtqueue is actually
stopped. Let's add a QAPI event, which makes possible:
  - collect statistics of such errors
  - make immediate actions: take core dumps or do some other debugging
  - inform the user through a management API or UI, so that (s)he can
   react somehow, e.g. reset the device driver in the guest or even
   build up some automation to do so

Note that basically every inconsistency discovered during virtqueue
processing results in a silent virtqueue stop.  The guest then just
sees the requests getting stuck somewhere in the device for no visible
reason.  This event provides a means to inform the management layer of
this situation in a timely fashion.

The event could be reused for some other virtqueue problems (not only
for vhost devices) in future. For this it gets a generic name and
structure.

We keep original VHOST_OPS_DEBUG(), to keep original debug output as is
here, it's not the only call to VHOST_OPS_DEBUG in the file.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsement...@yandex-team.ru>
Reviewed-by: Denis Plotnikov <den-plotni...@yandex-team.ru>
---

v5: resend, update version in QAPI to 10.1
     drop a-b by Markus (too much time passed, the context could
     changed. Markus, is the patch still OK?)

  hw/virtio/vhost.c | 12 +++++++++---
  monitor/monitor.c | 10 ++++++++++
  qapi/qdev.json    | 32 ++++++++++++++++++++++++++++++++
  3 files changed, 51 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 6aa72fd434..0b205cef73 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -15,6 +15,7 @@
#include "qemu/osdep.h"
  #include "qapi/error.h"
+#include "qapi/qapi-events-qdev.h"
  #include "hw/virtio/vhost.h"
  #include "qemu/atomic.h"
  #include "qemu/range.h"
@@ -1442,11 +1443,16 @@ static void 
vhost_virtqueue_error_notifier(EventNotifier *n)
      struct vhost_virtqueue *vq = container_of(n, struct vhost_virtqueue,
                                                error_notifier);
      struct vhost_dev *dev = vq->dev;
-    int index = vq - dev->vqs;
if (event_notifier_test_and_clear(n) && dev->vdev) {
-        VHOST_OPS_DEBUG(-EINVAL,  "vhost vring error in virtqueue %d",
-                        dev->vq_index + index);
+        int ind = vq - dev->vqs + dev->vq_index;
+        DeviceState *ds = &dev->vdev->parent_obj;
+
+        VHOST_OPS_DEBUG(-EINVAL,  "vhost vring error in virtqueue %d", ind);
+        qapi_event_send_virtqueue_error(ds->id, ds->canonical_path, ind,
+                                        VIRTQUEUE_ERROR_VHOST_VRING_ERROR,
+                                        "vhost reported failure through vring "
+                                        "error fd");
      }
  }
diff --git a/monitor/monitor.c b/monitor/monitor.c
index c5a5d30877..1296a9207e 100644
--- a/monitor/monitor.c
+++ b/monitor/monitor.c
@@ -313,6 +313,7 @@ static MonitorQAPIEventConf 
monitor_qapi_event_conf[QAPI_EVENT__MAX] = {
      [QAPI_EVENT_BALLOON_CHANGE]    = { 1000 * SCALE_MS },
      [QAPI_EVENT_QUORUM_REPORT_BAD] = { 1000 * SCALE_MS },
      [QAPI_EVENT_QUORUM_FAILURE]    = { 1000 * SCALE_MS },
+    [QAPI_EVENT_VIRTQUEUE_ERROR]   = { 1000 * SCALE_MS },
      [QAPI_EVENT_VSERPORT_CHANGE]   = { 1000 * SCALE_MS },
      [QAPI_EVENT_MEMORY_DEVICE_SIZE_CHANGE] = { 1000 * SCALE_MS },
      [QAPI_EVENT_HV_BALLOON_STATUS_REPORT] = { 1000 * SCALE_MS },
@@ -499,6 +500,10 @@ static unsigned int qapi_event_throttle_hash(const void 
*key)
          hash += g_str_hash(qdict_get_str(evstate->data, "qom-path"));
      }
+ if (evstate->event == QAPI_EVENT_VIRTQUEUE_ERROR) {
+        hash += g_str_hash(qdict_get_str(evstate->data, "device"));
Hmm we should use "qom-path" here, right?

+    }
+
      return hash;
  }
@@ -527,6 +532,11 @@ static gboolean qapi_event_throttle_equal(const void *a, const void *b)
                         qdict_get_str(evb->data, "qom-path"));
      }
+ if (eva->event == QAPI_EVENT_VIRTQUEUE_ERROR) {
+        return !strcmp(qdict_get_str(eva->data, "device"),
+                       qdict_get_str(evb->data, "device"));
and here

+    }
+
      return TRUE;
  }
diff --git a/qapi/qdev.json b/qapi/qdev.json
index 25cbcf977b..2d20f4777e 100644
--- a/qapi/qdev.json
+++ b/qapi/qdev.json
@@ -187,3 +187,35 @@
  { 'command': 'device-sync-config',
    'features': [ 'unstable' ],
    'data': {'id': 'str'} }
+
+##
+# @VirtqueueError:
+#
+# @vhost-vring-error: Vhost device reported failure through
+#     through vring error fd.
+#
+# Since: 10.1
+##
+{ 'enum': 'VirtqueueError',
+  'data': [ 'vhost-vring-error' ] }
+
+##
+# @VIRTQUEUE_ERROR:
+#
+# Emitted when a device virtqueue fails in runtime.
+#
+# @device: the device's ID if it has one
+#
+# @path: the device's QOM path
+#
+# @virtqueue: virtqueue index
+#
+# @error: error identifier
+#
+# @description: human readable description
+#
+# Since: 10.1
+##
+{ 'event': 'VIRTQUEUE_ERROR',
+ 'data': { '*device': 'str', 'path': 'str', 'virtqueue': 'int',
+            'error': 'VirtqueueError', 'description': 'str'} }
--
Best regards,
Vladimir


Reply via email to