15.03.2019, 12:46, "Daniel P. Berrangé" <berra...@redhat.com>: > On Thu, Mar 14, 2019 at 03:31:47PM +0300, Yury Kotov wrote: >> Hi, >> >> 14.03.2019, 14:44, "Daniel P. Berrangé" <berra...@redhat.com>: >> > On Thu, Mar 14, 2019 at 07:34:03AM -0400, Michael S. Tsirkin wrote: >> >> On Thu, Mar 14, 2019 at 11:24:22AM +0000, Daniel P. Berrangé wrote: >> >> > On Tue, Mar 12, 2019 at 12:49:35PM -0400, Michael S. Tsirkin wrote: >> >> > > On Thu, Feb 28, 2019 at 04:53:54PM +0800, elohi...@gmail.com wrote: >> >> > > > From: Xie Yongji <xieyon...@baidu.com> >> >> > > > >> >> > > > Since we now support the message VHOST_USER_GET_INFLIGHT_FD >> >> > > > and VHOST_USER_SET_INFLIGHT_FD. The backend is able to restart >> >> > > > safely because it can track inflight I/O in shared memory. >> >> > > > This patch allows qemu to reconnect the backend after >> >> > > > connection closed. >> >> > > > >> >> > > > Signed-off-by: Xie Yongji <xieyon...@baidu.com> >> >> > > > Signed-off-by: Ni Xun <ni...@baidu.com> >> >> > > > Signed-off-by: Zhang Yu <zhangy...@baidu.com> >> >> > > > --- >> >> > > > hw/block/vhost-user-blk.c | 205 +++++++++++++++++++++++------ >> >> > > > include/hw/virtio/vhost-user-blk.h | 4 + >> >> > > > 2 files changed, 167 insertions(+), 42 deletions(-) >> >> > >> >> > >> >> > > > static void vhost_user_blk_device_realize(DeviceState *dev, Error >> **errp) >> >> > > > { >> >> > > > VirtIODevice *vdev = VIRTIO_DEVICE(dev); >> >> > > > VHostUserBlk *s = VHOST_USER_BLK(vdev); >> >> > > > VhostUserState *user; >> >> > > > - struct vhost_virtqueue *vqs = NULL; >> >> > > > int i, ret; >> >> > > > + Error *err = NULL; >> >> > > > >> >> > > > if (!s->chardev.chr) { >> >> > > > error_setg(errp, "vhost-user-blk: chardev is mandatory"); >> >> > > > @@ -312,27 +442,28 @@ static void >> vhost_user_blk_device_realize(DeviceState *dev, Error **errp) >> >> > > > } >> >> > > > >> >> > > > s->inflight = g_new0(struct vhost_inflight, 1); >> >> > > > - >> >> > > > - s->dev.nvqs = s->num_queues; >> >> > > > - s->dev.vqs = g_new(struct vhost_virtqueue, s->dev.nvqs); >> >> > > > - s->dev.vq_index = 0; >> >> > > > - s->dev.backend_features = 0; >> >> > > > - vqs = s->dev.vqs; >> >> > > > - >> >> > > > - vhost_dev_set_config_notifier(&s->dev, &blk_ops); >> >> > > > - >> >> > > > - ret = vhost_dev_init(&s->dev, s->vhost_user, >> VHOST_BACKEND_TYPE_USER, 0); >> >> > > > - if (ret < 0) { >> >> > > > - error_setg(errp, "vhost-user-blk: vhost initialization failed: >> %s", >> >> > > > - strerror(-ret)); >> >> > > > - goto virtio_err; >> >> > > > - } >> >> > > > + s->vqs = g_new(struct vhost_virtqueue, s->num_queues); >> >> > > > + s->watch = 0; >> >> > > > + s->should_start = false; >> >> > > > + s->connected = false; >> >> > > > + >> >> > > > + qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, >> vhost_user_blk_event, >> >> > > > + NULL, (void *)dev, NULL, true); >> >> > > > + >> >> > > > +reconnect: >> >> > > > + do { >> >> > > > + if (qemu_chr_fe_wait_connected(&s->chardev, &err) < 0) { >> >> > > > + error_report_err(err); >> >> > > > + err = NULL; >> >> > > > + sleep(1); >> >> > > >> >> > > Seems arbitrary. Is this basically waiting until backend will >> reconnect? >> >> > > Why not block until event on the fd triggers? >> >> > > >> >> > > Also, it looks like this will just block forever with no monitor >> input >> >> > > and no way for user to figure out what is going on short of >> >> > > crashing QEMU. >> >> > >> >> > FWIW, the current vhost-user-net device does exactly the same thing >> >> > with calling qemu_chr_fe_wait_connected during its realize() function. >> >> >> >> Hmm yes. It doesn't sleep for an arbitrary 1 sec so less of an eyesore >> :) >> > >> > The sleep(1) in this patch simply needs to be removed. I think that >> > probably dates from when it was written against the earlier broken >> > version of qemu_chr_fe_wait_connected(). That would not correctly >> > deal with the "reconnect" flag, and so needing this loop with a sleep >> > in it. >> > >> > In fact the while loop can be removed as well in this code. It just >> > needs to call qemu_chr_fe_wait_connected() once. It is guaranteed >> > to have a connected peer once that returns 0. >> > >> > qemu_chr_fe_wait_connected() only returns -1 if the operating in >> > client mode, and it failed to connect and reconnect is *not* >> > requested. In such case the caller should honour the failure and >> > quit, not loop to retry. >> > >> > The reason vhost-user-net does a loop is because once it has a >> > connection it tries todo a protocol handshake, and if that >> > handshake fails it closes the chardev and tries to connect >> > again. That's not the case in this blk code os the loop is >> > not needed. >> > >> >> But vhost-user-blk also has a handshake in device realize. What happens if >> the >> connection is broken during realization? IIUC we have to retry a handshake >> in >> such case just like vhost-user-net. > > I'm just commenting on the current code which does not do that > handshake in the loop afaict. If it needs to do that then the > patch should be updated... >
Oh, yes... This loop doesn't do a handshake. Handshake is after the loop. But now it gotos to reconnect. So may be it makes sense to rewrite a handshake since we don't need two nested loops to get reconnection without gotos. Regards, Yury