On Thu, 14 Mar 2019 at 19:18, Michael S. Tsirkin <m...@redhat.com> wrote: > > On Wed, Mar 13, 2019 at 10:47:08AM +0800, Yongji Xie wrote: > > On Wed, 13 Mar 2019 at 09:16, Michael S. Tsirkin <m...@redhat.com> wrote: > > > > > > On Thu, Feb 28, 2019 at 04:53:54PM +0800, elohi...@gmail.com wrote: > > > > From: Xie Yongji <xieyon...@baidu.com> > > > > > > > > Since we now support the message VHOST_USER_GET_INFLIGHT_FD > > > > and VHOST_USER_SET_INFLIGHT_FD. The backend is able to restart > > > > safely because it can track inflight I/O in shared memory. > > > > This patch allows qemu to reconnect the backend after > > > > connection closed. > > > > > > > > Signed-off-by: Xie Yongji <xieyon...@baidu.com> > > > > Signed-off-by: Ni Xun <ni...@baidu.com> > > > > Signed-off-by: Zhang Yu <zhangy...@baidu.com> > > > > --- > > > > hw/block/vhost-user-blk.c | 205 +++++++++++++++++++++++------ > > > > include/hw/virtio/vhost-user-blk.h | 4 + > > > > 2 files changed, 167 insertions(+), 42 deletions(-) > > > > > > > > diff --git a/hw/block/vhost-user-blk.c b/hw/block/vhost-user-blk.c > > > > index 9682df1a7b..539ea2e571 100644 > > > > --- a/hw/block/vhost-user-blk.c > > > > +++ b/hw/block/vhost-user-blk.c > > > > @@ -103,7 +103,7 @@ const VhostDevConfigOps blk_ops = { > > > > .vhost_dev_config_notifier = vhost_user_blk_handle_config_change, > > > > }; > > > > > > > > -static void vhost_user_blk_start(VirtIODevice *vdev) > > > > +static int vhost_user_blk_start(VirtIODevice *vdev) > > > > { > > > > VHostUserBlk *s = VHOST_USER_BLK(vdev); > > > > BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev))); > > > > @@ -112,13 +112,13 @@ static void vhost_user_blk_start(VirtIODevice > > > > *vdev) > > > > > > > > if (!k->set_guest_notifiers) { > > > > error_report("binding does not support guest notifiers"); > > > > - return; > > > > + return -ENOSYS; > > > > } > > > > > > > > ret = vhost_dev_enable_notifiers(&s->dev, vdev); > > > > if (ret < 0) { > > > > error_report("Error enabling host notifiers: %d", -ret); > > > > - return; > > > > + return ret; > > > > } > > > > > > > > ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, true); > > > > @@ -157,12 +157,13 @@ static void vhost_user_blk_start(VirtIODevice > > > > *vdev) > > > > vhost_virtqueue_mask(&s->dev, vdev, i, false); > > > > } > > > > > > > > - return; > > > > + return ret; > > > > > > > > err_guest_notifiers: > > > > k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false); > > > > err_host_notifiers: > > > > vhost_dev_disable_notifiers(&s->dev, vdev); > > > > + return ret; > > > > } > > > > > > > > static void vhost_user_blk_stop(VirtIODevice *vdev) > > > > @@ -181,7 +182,6 @@ static void vhost_user_blk_stop(VirtIODevice *vdev) > > > > ret = k->set_guest_notifiers(qbus->parent, s->dev.nvqs, false); > > > > if (ret < 0) { > > > > error_report("vhost guest notifier cleanup failed: %d", ret); > > > > - return; > > > > } > > > > > > > > vhost_dev_disable_notifiers(&s->dev, vdev); > > > > @@ -191,21 +191,43 @@ static void > > > > vhost_user_blk_set_status(VirtIODevice *vdev, uint8_t status) > > > > { > > > > VHostUserBlk *s = VHOST_USER_BLK(vdev); > > > > bool should_start = status & VIRTIO_CONFIG_S_DRIVER_OK; > > > > + int ret; > > > > > > > > if (!vdev->vm_running) { > > > > should_start = false; > > > > } > > > > > > > > - if (s->dev.started == should_start) { > > > > + if (s->should_start == should_start) { > > > > + return; > > > > + } > > > > + > > > > + if (!s->connected || s->dev.started == should_start) { > > > > + s->should_start = should_start; > > > > return; > > > > } > > > > > > > > if (should_start) { > > > > - vhost_user_blk_start(vdev); > > > > + s->should_start = true; > > > > + /* > > > > + * make sure vhost_user_blk_handle_output() ignores fake > > > > + * guest kick by vhost_dev_enable_notifiers() > > > > + */ > > > > + barrier(); > > > > + ret = vhost_user_blk_start(vdev); > > > > + if (ret < 0) { > > > > + error_report("vhost-user-blk: vhost start failed: %s", > > > > + strerror(-ret)); > > > > + qemu_chr_fe_disconnect(&s->chardev); > > > > + } > > > > } else { > > > > vhost_user_blk_stop(vdev); > > > > + /* > > > > + * make sure vhost_user_blk_handle_output() ignore fake > > > > + * guest kick by vhost_dev_disable_notifiers() > > > > + */ > > > > + barrier(); > > > > + s->should_start = false; > > > > } > > > > > > I don't understand the comment about ignoring fake guest kicks. > > > Please add an explanation about what does barrier do here. > > > > We may get stack like this: > > > > vhost_user_blk_stop() > > vhost_dev_disable_notifiers() > > virtio_bus_cleanup_host_notifier() > > virtio_queue_host_notifier_read() > > virtio_queue_notify_vq() /* fake guest kick */ > > vhost_user_blk_handle_output() > > > > vhost_user_blk_handle_output() will re-start vhost-user if > > should_start is false. This is not what we expect. The same to > > vhost_dev_enable_notifiers(). > > Well what does a compiler barrier have to do with it? > > Anyway, it's there to avoid losing kicks e.g. across migration. If you > have something else that prevents losing kicks (e.g. migrate a "started" > flag) then add code to skip the notify - don't work around it. >
OK, I see. Thanks. IIUC, vhost-user-blk device actually does not need this kind of kick because dataplane is not in qemu side, right? > > > > And maybe a detailed explanation about what "fake kick" > > > is in the commit log. > > > > > > > Sure. > > > > > > > - > > > > } > > > > > > > > static uint64_t vhost_user_blk_get_features(VirtIODevice *vdev, > > > > @@ -237,13 +259,22 @@ static uint64_t > > > > vhost_user_blk_get_features(VirtIODevice *vdev, > > > > static void vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue > > > > *vq) > > > > { > > > > VHostUserBlk *s = VHOST_USER_BLK(vdev); > > > > - int i; > > > > + int i, ret; > > > > > > > > if (!(virtio_host_has_feature(vdev, VIRTIO_F_VERSION_1) && > > > > !virtio_vdev_has_feature(vdev, VIRTIO_F_VERSION_1))) { > > > > return; > > > > } > > > > > > > > + if (s->should_start) { > > > > + return; > > > > + } > > > > + s->should_start = true; > > > > + > > > > + if (!s->connected) { > > > > + return; > > > > + } > > > > + > > > > if (s->dev.started) { > > > > return; > > > > } > > > > @@ -251,7 +282,13 @@ static void > > > > vhost_user_blk_handle_output(VirtIODevice *vdev, VirtQueue *vq) > > > > /* Some guests kick before setting VIRTIO_CONFIG_S_DRIVER_OK so > > > > start > > > > * vhost here instead of waiting for .set_status(). > > > > */ > > > > - vhost_user_blk_start(vdev); > > > > + ret = vhost_user_blk_start(vdev); > > > > + if (ret < 0) { > > > > + error_report("vhost-user-blk: vhost start failed: %s", > > > > + strerror(-ret)); > > > > + qemu_chr_fe_disconnect(&s->chardev); > > > > + return; > > > > + } > > > > > > > > /* Kick right away to begin processing requests already in vring */ > > > > for (i = 0; i < s->dev.nvqs; i++) { > > > > @@ -271,13 +308,106 @@ static void vhost_user_blk_reset(VirtIODevice > > > > *vdev) > > > > vhost_dev_free_inflight(s->inflight); > > > > } > > > > > > > > +static int vhost_user_blk_connect(DeviceState *dev) > > > > +{ > > > > + VirtIODevice *vdev = VIRTIO_DEVICE(dev); > > > > + VHostUserBlk *s = VHOST_USER_BLK(vdev); > > > > + int ret = 0; > > > > + > > > > + if (s->connected) { > > > > + return 0; > > > > + } > > > > + s->connected = true; > > > > + > > > > + s->dev.nvqs = s->num_queues; > > > > + s->dev.vqs = s->vqs; > > > > + s->dev.vq_index = 0; > > > > + s->dev.backend_features = 0; > > > > + > > > > + vhost_dev_set_config_notifier(&s->dev, &blk_ops); > > > > + > > > > + ret = vhost_dev_init(&s->dev, s->vhost_user, > > > > VHOST_BACKEND_TYPE_USER, 0); > > > > + if (ret < 0) { > > > > + error_report("vhost-user-blk: vhost initialization failed: %s", > > > > + strerror(-ret)); > > > > + return ret; > > > > + } > > > > + > > > > + /* restore vhost state */ > > > > + if (s->should_start) { > > > > + ret = vhost_user_blk_start(vdev); > > > > + if (ret < 0) { > > > > + error_report("vhost-user-blk: vhost start failed: %s", > > > > + strerror(-ret)); > > > > + return ret; > > > > + } > > > > + } > > > > + > > > > + return 0; > > > > +} > > > > + > > > > +static void vhost_user_blk_disconnect(DeviceState *dev) > > > > +{ > > > > + VirtIODevice *vdev = VIRTIO_DEVICE(dev); > > > > + VHostUserBlk *s = VHOST_USER_BLK(vdev); > > > > + > > > > + if (!s->connected) { > > > > + return; > > > > + } > > > > + s->connected = false; > > > > + > > > > + if (s->dev.started) { > > > > + vhost_user_blk_stop(vdev); > > > > + } > > > > + > > > > + vhost_dev_cleanup(&s->dev); > > > > +} > > > > + > > > > +static gboolean vhost_user_blk_watch(GIOChannel *chan, GIOCondition > > > > cond, > > > > + void *opaque) > > > > +{ > > > > + DeviceState *dev = opaque; > > > > + VirtIODevice *vdev = VIRTIO_DEVICE(dev); > > > > + VHostUserBlk *s = VHOST_USER_BLK(vdev); > > > > + > > > > + qemu_chr_fe_disconnect(&s->chardev); > > > > + > > > > + return true; > > > > +} > > > > + > > > > +static void vhost_user_blk_event(void *opaque, int event) > > > > +{ > > > > + DeviceState *dev = opaque; > > > > + VirtIODevice *vdev = VIRTIO_DEVICE(dev); > > > > + VHostUserBlk *s = VHOST_USER_BLK(vdev); > > > > + > > > > + switch (event) { > > > > + case CHR_EVENT_OPENED: > > > > + if (vhost_user_blk_connect(dev) < 0) { > > > > + qemu_chr_fe_disconnect(&s->chardev); > > > > + return; > > > > + } > > > > + s->watch = qemu_chr_fe_add_watch(&s->chardev, G_IO_HUP, > > > > + vhost_user_blk_watch, dev); > > > > + break; > > > > + case CHR_EVENT_CLOSED: > > > > + vhost_user_blk_disconnect(dev); > > > > + if (s->watch) { > > > > + g_source_remove(s->watch); > > > > + s->watch = 0; > > > > + } > > > > + break; > > > > + } > > > > +} > > > > + > > > > + > > > > static void vhost_user_blk_device_realize(DeviceState *dev, Error > > > > **errp) > > > > { > > > > VirtIODevice *vdev = VIRTIO_DEVICE(dev); > > > > VHostUserBlk *s = VHOST_USER_BLK(vdev); > > > > VhostUserState *user; > > > > - struct vhost_virtqueue *vqs = NULL; > > > > int i, ret; > > > > + Error *err = NULL; > > > > > > > > if (!s->chardev.chr) { > > > > error_setg(errp, "vhost-user-blk: chardev is mandatory"); > > > > @@ -312,27 +442,28 @@ static void > > > > vhost_user_blk_device_realize(DeviceState *dev, Error **errp) > > > > } > > > > > > > > s->inflight = g_new0(struct vhost_inflight, 1); > > > > - > > > > - s->dev.nvqs = s->num_queues; > > > > - s->dev.vqs = g_new(struct vhost_virtqueue, s->dev.nvqs); > > > > - s->dev.vq_index = 0; > > > > - s->dev.backend_features = 0; > > > > - vqs = s->dev.vqs; > > > > - > > > > - vhost_dev_set_config_notifier(&s->dev, &blk_ops); > > > > - > > > > - ret = vhost_dev_init(&s->dev, s->vhost_user, > > > > VHOST_BACKEND_TYPE_USER, 0); > > > > - if (ret < 0) { > > > > - error_setg(errp, "vhost-user-blk: vhost initialization failed: > > > > %s", > > > > - strerror(-ret)); > > > > - goto virtio_err; > > > > - } > > > > + s->vqs = g_new(struct vhost_virtqueue, s->num_queues); > > > > + s->watch = 0; > > > > + s->should_start = false; > > > > + s->connected = false; > > > > + > > > > + qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, > > > > vhost_user_blk_event, > > > > + NULL, (void *)dev, NULL, true); > > > > + > > > > +reconnect: > > > > + do { > > > > + if (qemu_chr_fe_wait_connected(&s->chardev, &err) < 0) { > > > > + error_report_err(err); > > > > + err = NULL; > > > > + sleep(1); > > > > + } > > > > + } while (!s->connected); > > > > > > > > ret = vhost_dev_get_config(&s->dev, (uint8_t *)&s->blkcfg, > > > > - sizeof(struct virtio_blk_config)); > > > > + sizeof(struct virtio_blk_config)); > > > > if (ret < 0) { > > > > - error_setg(errp, "vhost-user-blk: get block config failed"); > > > > - goto vhost_err; > > > > + error_report("vhost-user-blk: get block config failed"); > > > > + goto reconnect; > > > > } > > > > > > > > if (s->blkcfg.num_queues != s->num_queues) { > > > > @@ -340,29 +471,19 @@ static void > > > > vhost_user_blk_device_realize(DeviceState *dev, Error **errp) > > > > } > > > > > > > > return; > > > > > > With this we end up with return at end of function > > > which makes no sense. > > > > > > > Oh, yes. I will remove it. > > > > > > - > > > > -vhost_err: > > > > - vhost_dev_cleanup(&s->dev); > > > > -virtio_err: > > > > - g_free(vqs); > > > > - g_free(s->inflight); > > > > - virtio_cleanup(vdev); > > > > - > > > > - vhost_user_cleanup(user); > > > > - g_free(user); > > > > - s->vhost_user = NULL; > > > > } > > > > > > > > static void vhost_user_blk_device_unrealize(DeviceState *dev, Error > > > > **errp) > > > > { > > > > VirtIODevice *vdev = VIRTIO_DEVICE(dev); > > > > VHostUserBlk *s = VHOST_USER_BLK(dev); > > > > - struct vhost_virtqueue *vqs = s->dev.vqs; > > > > > > > > vhost_user_blk_set_status(vdev, 0); > > > > + qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, NULL, > > > > + NULL, NULL, NULL, false); > > > > vhost_dev_cleanup(&s->dev); > > > > vhost_dev_free_inflight(s->inflight); > > > > - g_free(vqs); > > > > + g_free(s->vqs); > > > > g_free(s->inflight); > > > > virtio_cleanup(vdev); > > > > > > > > diff --git a/include/hw/virtio/vhost-user-blk.h > > > > b/include/hw/virtio/vhost-user-blk.h > > > > index 445516604a..4849aa5eb5 100644 > > > > --- a/include/hw/virtio/vhost-user-blk.h > > > > +++ b/include/hw/virtio/vhost-user-blk.h > > > > @@ -38,6 +38,10 @@ typedef struct VHostUserBlk { > > > > struct vhost_dev dev; > > > > struct vhost_inflight *inflight; > > > > VhostUserState *vhost_user; > > > > + struct vhost_virtqueue *vqs; > > > > + guint watch; > > > > + bool should_start; > > > > > > Please add comments about fields. > > > Also should_start seems to be set on guest activity. > > > Does it need to be migrated? > > > It would be better not to track guest state like this. > > > Why do we need it? > > > Pls add comments with an explanation. > > > > > > > We can't relay on vhost_dev.started to track guest's state in > > vhost_user_blk_set_status() when we enable reconnecting. For example, > > qemu will lose guest's state if guest stop device during backend is > > disconnected. So vhost_dev.started is used to track backend's state > > and should_start is used to track guest's state in this patch. And > > this variable is based on VirtIODevice.status, so I think it is not > > necessary to migrate it. What do you think about it? > > If it's based on VirtIODevice.status then how about not keeping > it around at all and just calculating from VirtIODevice.status > when necessary? > Sorry, I made a mistake before. Normally, should_start is based on VirtIODevice.status. But it's not the case for virtio 1.0 transitional devices. Then I realize there may be a bug. If migration completes between kicking virtqueue and setting VIRTIO_CONFIG_S_DRIVER_OK, guest may be hung. So I think should_start should be migrated in this case. Thanks, Yongji