> On Jul 25, 2023, at 6:19 AM, Li Feng <fen...@smartx.com> wrote: > > Thanks for your comments. > >> 2023年7月25日 上午1:21,Raphael Norwitz <raphael.norw...@nutanix.com> 写道: >> >> Very excited to see this. High level looks good modulo a few small things. >> >> My major concern is around existing vhost-user-scsi backends which don’t >> support VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD. IMO we should hide the >> reconnect behavior behind a VHOST_USER_PROTOCOL_F_INFLIGHT_SHMFD check. We >> may want to do the same for vhost-user-blk. >> >> The question is then what happens if the check is false. IIUC without an >> inflight FD, if a device processes requests out of order, it’s not safe to >> continue execution on reconnect, as there’s no way for the backend to know >> how to replay IO. Should we permanently wedge the device or have QEMU fail >> out? May be nice to have a toggle for this. > > Based on what MST said, is there anything else I need to do?
I don’t think so. >> >>> On Jul 21, 2023, at 6:51 AM, Li Feng <fen...@smartx.com> wrote: >>> >>> If the backend crashes and restarts, the device is broken. >>> This patch adds reconnect for vhost-user-scsi. >>> >>> Tested with spdk backend. >>> >>> Signed-off-by: Li Feng <fen...@smartx.com> >>> --- >>> hw/block/vhost-user-blk.c | 2 - >>> hw/scsi/vhost-scsi-common.c | 27 ++--- >>> hw/scsi/vhost-user-scsi.c | 163 +++++++++++++++++++++++++--- >>> include/hw/virtio/vhost-user-scsi.h | 3 + >>> include/hw/virtio/vhost.h | 2 + >>> 5 files changed, 165 insertions(+), 32 deletions(-) >>> >>> diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c >>> index eecf3f7a81..f250c740b5 100644 >>> --- a/hw/block/vhost-user-blk.c >>> +++ b/hw/block/vhost-user-blk.c >>> @@ -32,8 +32,6 @@ >>> #include "sysemu/sysemu.h" >>> #include "sysemu/runstate.h" >>> >>> -#define REALIZE_CONNECTION_RETRIES 3 >>> - >>> static const int user_feature_bits[] = { >>> VIRTIO_BLK_F_SIZE_MAX, >>> VIRTIO_BLK_F_SEG_MAX, >>> diff --git a/hw/scsi/vhost-scsi-common.c b/hw/scsi/vhost-scsi-common.c >> >> Why can’t all the vhost-scsi-common stuff be moved to a separate change? > > I will move this code to separate patch. >> >> Especially the stuff introduced for vhost-user-blk in >> 1b0063b3048af65dfaae6422a572c87db8575a92 should be moved out. > OK. > >> >>> index a06f01af26..08801886b8 100644 >>> --- a/hw/scsi/vhost-scsi-common.c >>> +++ b/hw/scsi/vhost-scsi-common.c >>> @@ -52,16 +52,22 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc) >>> >>> vsc->dev.acked_features = vdev->guest_features; >>> >>> - assert(vsc->inflight == NULL); >>> - vsc->inflight = g_new0(struct vhost_inflight, 1); >>> - ret = vhost_dev_get_inflight(&vsc->dev, >>> - vs->conf.virtqueue_size, >>> - vsc->inflight); >>> + ret = vhost_dev_prepare_inflight(&vsc->dev, vdev); >>> if (ret < 0) { >>> - error_report("Error get inflight: %d", -ret); >>> + error_report("Error setting inflight format: %d", -ret); >>> goto err_guest_notifiers; >>> } >>> >>> + if (!vsc->inflight->addr) { >>> + ret = vhost_dev_get_inflight(&vsc->dev, >>> + vs->conf.virtqueue_size, >>> + vsc->inflight); >>> + if (ret < 0) { >>> + error_report("Error get inflight: %d", -ret); >>> + goto err_guest_notifiers; >>> + } >>> + } >>> + >>> ret = vhost_dev_set_inflight(&vsc->dev, vsc->inflight); >>> if (ret < 0) { >>> error_report("Error set inflight: %d", -ret); >>> @@ -85,9 +91,6 @@ int vhost_scsi_common_start(VHostSCSICommon *vsc) >>> return ret; >>> >>> err_guest_notifiers: >>> - g_free(vsc->inflight); >>> - vsc->inflight = NULL; >>> - >>> k->set_guest_notifiers(qbus->parent, vsc->dev.nvqs, false); >>> err_host_notifiers: >>> vhost_dev_disable_notifiers(&vsc->dev, vdev); >>> @@ -111,12 +114,6 @@ void vhost_scsi_common_stop(VHostSCSICommon *vsc) >>> } >>> assert(ret >= 0); >>> >> >> In the vhost-scsi (kernel backend) path, what will cleanup vsc->inflight now? > OK, we should check the vsc->inflight if it is null, the vhost-scsi doesn’t > allocate the > inflight object memory. Are you saying vhost-scsi never allocates inflight so we don’t need to check for it? > >> >>> - if (vsc->inflight) { >>> - vhost_dev_free_inflight(vsc->inflight); >>> - g_free(vsc->inflight); >>> - vsc->inflight = NULL; >>> - } >>> - >>> vhost_dev_disable_notifiers(&vsc->dev, vdev); >>> } >>> >>> diff --git a/hw/scsi/vhost-user-scsi.c b/hw/scsi/vhost-user-scsi.c >>> index ee99b19e7a..e0e88b0c42 100644 >>> --- a/hw/scsi/vhost-user-scsi.c >>> +++ b/hw/scsi/vhost-user-scsi.c >>> @@ -89,14 +89,126 @@ static void vhost_dummy_handle_output(VirtIODevice >>> *vdev, VirtQueue *vq) >>> { >>> } >>> >>> +static int vhost_user_scsi_connect(DeviceState *dev, Error **errp) >>> +{ >>> + VirtIODevice *vdev = VIRTIO_DEVICE(dev); >>> + VHostUserSCSI *s = VHOST_USER_SCSI(vdev); >>> + VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s); >>> + VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev); >>> + int ret = 0; >>> + >>> + if (s->connected) { >>> + return 0; >>> + } >>> + s->connected = true; >>> + >>> + vsc->dev.num_queues = vs->conf.num_queues; >>> + vsc->dev.nvqs = VIRTIO_SCSI_VQ_NUM_FIXED + vs->conf.num_queues; >>> + vsc->dev.vqs = s->vhost_vqs; >>> + vsc->dev.vq_index = 0; >>> + vsc->dev.backend_features = 0; >>> + >>> + ret = vhost_dev_init(&vsc->dev, &s->vhost_user, >>> VHOST_BACKEND_TYPE_USER, 0, >>> + errp); >>> + if (ret < 0) { >>> + return ret; >>> + } >>> + >>> + /* restore vhost state */ >> >> Should this use virtio_device_should_start like vhost_user_blk? > I will change this. >> >>> + if (virtio_device_started(vdev, vdev->status)) { >>> + ret = vhost_scsi_common_start(vsc); >>> + if (ret < 0) { >>> + return ret; >>> + } >>> + } >>> + >>> + return 0; >>> +} >>> + >>> +static void vhost_user_scsi_event(void *opaque, QEMUChrEvent event); >>> + >>> +static void vhost_user_scsi_disconnect(DeviceState *dev) >>> +{ >>> + VirtIODevice *vdev = VIRTIO_DEVICE(dev); >>> + VHostUserSCSI *s = VHOST_USER_SCSI(vdev); >>> + VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s); >>> + VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev); >>> + >> >> I don’t think we want to execute vhost_scsi_common_stop() if the device >> hasn’t been started. I remember that caused a number of races with the >> vhost_user_blk connecting/disconnecting on startup. >> >> Let’s add a similar started_vu check? > I will add it. >> >>> + if (!s->connected) { >>> + return; >>> + } >>> + s->connected = false; >>> + >>> + vhost_scsi_common_stop(vsc); >>> + >>> + vhost_dev_cleanup(&vsc->dev); >>> + >>> + /* Re-instate the event handler for new connections */ >>> + qemu_chr_fe_set_handlers(&vs->conf.chardev, NULL, NULL, >>> + vhost_user_scsi_event, NULL, dev, NULL, true); >>> +} >>> + >>> +static void vhost_user_scsi_event(void *opaque, QEMUChrEvent event) >>> +{ >>> + DeviceState *dev = opaque; >>> + VirtIODevice *vdev = VIRTIO_DEVICE(dev); >>> + VHostUserSCSI *s = VHOST_USER_SCSI(vdev); >>> + VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s); >>> + VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev); >>> + Error *local_err = NULL; >>> + >>> + switch (event) { >>> + case CHR_EVENT_OPENED: >>> + if (vhost_user_scsi_connect(dev, &local_err) < 0) { >>> + error_report_err(local_err); >>> + qemu_chr_fe_disconnect(&vs->conf.chardev); >>> + return; >>> + } >>> + break; >>> + case CHR_EVENT_CLOSED: >>> + /* defer close until later to avoid circular close */ >>> + vhost_user_async_close(dev, &vs->conf.chardev, &vsc->dev, >>> + vhost_user_scsi_disconnect); >>> + break; >>> + case CHR_EVENT_BREAK: >>> + case CHR_EVENT_MUX_IN: >>> + case CHR_EVENT_MUX_OUT: >>> + /* Ignore */ >>> + break; >>> + } >>> +} >>> + >>> +static int vhost_user_scsi_realize_connect(VHostUserSCSI *s, Error **errp) >>> +{ >>> + DeviceState *dev = &s->parent_obj.parent_obj.parent_obj.parent_obj; >>> + VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev); >>> + int ret; >>> + >>> + s->connected = false; >>> + >>> + ret = qemu_chr_fe_wait_connected(&vs->conf.chardev, errp); >>> + if (ret < 0) { >>> + return ret; >>> + } >>> + >>> + ret = vhost_user_scsi_connect(dev, errp); >>> + if (ret < 0) { >>> + qemu_chr_fe_disconnect(&vs->conf.chardev); >>> + return ret; >>> + } >>> + assert(s->connected); >>> + >>> + return 0; >>> +} >>> + >>> static void vhost_user_scsi_realize(DeviceState *dev, Error **errp) >>> { >>> VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev); >>> VHostUserSCSI *s = VHOST_USER_SCSI(dev); >>> VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s); >>> - struct vhost_virtqueue *vqs = NULL; >>> Error *err = NULL; >>> int ret; >>> + int retries = REALIZE_CONNECTION_RETRIES; >>> >>> if (!vs->conf.chardev.chr) { >>> error_setg(errp, "vhost-user-scsi: missing chardev"); >>> @@ -112,21 +224,31 @@ static void vhost_user_scsi_realize(DeviceState *dev, >>> Error **errp) >>> } >>> >>> if (!vhost_user_init(&s->vhost_user, &vs->conf.chardev, errp)) { >> >> Why execute vhost_user_cleanup() if vhost_user_init() fails? > OK, move this line up in v2. > >> >>> - goto free_virtio; >>> + goto free_vhost; >>> } >>> >>> - vsc->dev.nvqs = VIRTIO_SCSI_VQ_NUM_FIXED + vs->conf.num_queues; >>> - vsc->dev.vqs = g_new0(struct vhost_virtqueue, vsc->dev.nvqs); >>> - vsc->dev.vq_index = 0; >>> - vsc->dev.backend_features = 0; >>> - vqs = vsc->dev.vqs; >>> + vsc->inflight = g_new0(struct vhost_inflight, 1); >>> + s->vhost_vqs = g_new0(struct vhost_virtqueue, >>> + VIRTIO_SCSI_VQ_NUM_FIXED + vs->conf.num_queues); >>> + >>> + assert(!*errp); >>> + do { >>> + if (*errp) { >>> + error_prepend(errp, "Reconnecting after error: "); >>> + error_report_err(*errp); >>> + *errp = NULL; >>> + } >>> + ret = vhost_user_scsi_realize_connect(s, errp); >>> + } while (ret < 0 && retries--); >>> >>> - ret = vhost_dev_init(&vsc->dev, &s->vhost_user, >>> - VHOST_BACKEND_TYPE_USER, 0, errp); >>> if (ret < 0) { >>> - goto free_vhost; >>> + goto free_vqs; >>> } >>> >>> + /* we're fully initialized, now we can operate, so add the handler */ >>> + qemu_chr_fe_set_handlers(&vs->conf.chardev, NULL, NULL, >>> + vhost_user_scsi_event, NULL, (void *)dev, >>> + NULL, true); >>> /* Channel and lun both are 0 for bootable vhost-user-scsi disk */ >>> vsc->channel = 0; >>> vsc->lun = 0; >>> @@ -134,10 +256,15 @@ static void vhost_user_scsi_realize(DeviceState *dev, >>> Error **errp) >>> >>> return; >>> >>> +free_vqs: >>> + g_free(s->vhost_vqs); >>> + s->vhost_vqs = NULL; >>> + g_free(vsc->inflight); >>> + vsc->inflight = NULL; >>> + >>> free_vhost: >>> vhost_user_cleanup(&s->vhost_user); >>> - g_free(vqs); >>> -free_virtio: >>> + >>> virtio_scsi_common_unrealize(dev); >>> } >>> >>> @@ -146,16 +273,22 @@ static void vhost_user_scsi_unrealize(DeviceState >>> *dev) >>> VirtIODevice *vdev = VIRTIO_DEVICE(dev); >>> VHostUserSCSI *s = VHOST_USER_SCSI(dev); >>> VHostSCSICommon *vsc = VHOST_SCSI_COMMON(s); >>> - struct vhost_virtqueue *vqs = vsc->dev.vqs; >>> + VirtIOSCSICommon *vs = VIRTIO_SCSI_COMMON(dev); >>> >>> /* This will stop the vhost backend. */ >>> vhost_user_scsi_set_status(vdev, 0); >>> + qemu_chr_fe_set_handlers(&vs->conf.chardev, NULL, NULL, NULL, NULL, >>> NULL, >>> + NULL, false); >>> >>> vhost_dev_cleanup(&vsc->dev); >>> - g_free(vqs); >> >> Nit: Why not put vhost_dev_free_inflight next to the remaining inflight >> cleanup? > OK. >> >>> + vhost_dev_free_inflight(vsc->inflight); >>> + g_free(s->vhost_vqs); >>> + s->vhost_vqs = NULL; >>> + g_free(vsc->inflight); >>> + vsc->inflight = NULL; >>> >> >> Curiosity - why reorder here? Is something in vhost_user_cleanup() dependent >> on state freed in virtio_scsi_common_unrealize()? >> >> If so, should that go as a standalone fix? > > Because in vhost_user_scsi_realize, we initialize in order: > virtio_scsi_common_realize > vhost_user_init > > And in the error handler of vhost_user_scsi_realize, the uninitialize in > order: > vhost_user_cleanup > virtio_scsi_common_unrealize > > I think in vhost_user_scsi_unrealize we should keep it the same order, right? I’m not saying it’s wrong. If there’s no dependency (i.e. this is not fixing a bug, just a stylistic improvement) it can stay in the same change. > >> >>> - virtio_scsi_common_unrealize(dev); >>> vhost_user_cleanup(&s->vhost_user); >>> + virtio_scsi_common_unrealize(dev); >>> } >>> >>> static Property vhost_user_scsi_properties[] = { >>> diff --git a/include/hw/virtio/vhost-user-scsi.h >>> b/include/hw/virtio/vhost-user-scsi.h >>> index 521b08e559..c66acc68b7 100644 >>> --- a/include/hw/virtio/vhost-user-scsi.h >>> +++ b/include/hw/virtio/vhost-user-scsi.h >>> @@ -29,6 +29,9 @@ OBJECT_DECLARE_SIMPLE_TYPE(VHostUserSCSI, VHOST_USER_SCSI) >>> struct VHostUserSCSI { >>> VHostSCSICommon parent_obj; >>> VhostUserState vhost_user; >> >> See above - we should probably have started_vu here/ > I will add it. >> >> Maybe we should have some shared struct with vhost_user_blk for connectivity >> params? > > In the future vhost-user-blk/scsi can be refactored to share the same code. Sure - this can be done at some point in the future. >> >>> + bool connected; >>> + >>> + struct vhost_virtqueue *vhost_vqs; >>> }; >>> >>> #endif /* VHOST_USER_SCSI_H */ >>> diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h >>> index 6a173cb9fa..b904346fe1 100644 >>> --- a/include/hw/virtio/vhost.h >>> +++ b/include/hw/virtio/vhost.h >>> @@ -8,6 +8,8 @@ >>> #define VHOST_F_DEVICE_IOTLB 63 >>> #define VHOST_USER_F_PROTOCOL_FEATURES 30 >>> >> >> Should the macro name indicate that this is for vhost-user? >> >> VU_REALIZE_CONN_RETRIES? > I will rename it in v2. > >> >>> +#define REALIZE_CONNECTION_RETRIES 3 >>> + >>> /* Generic structures common for any vhost based device. */ >>> >>> struct vhost_inflight { >>> -- >>> 2.41.0