Re: [Qemu-devel] [PATCH v7 6/7] vhost-user-blk: Add support to reconnect backend

Yongji Xie Fri, 15 Mar 2019 06:20:38 -0700

On Fri, 15 Mar 2019 at 18:41, Yury Kotov <yury-ko...@yandex-team.ru> wrote:
>
> 15.03.2019, 12:46, "Daniel P. Berrangé" <berra...@redhat.com>:
> > On Thu, Mar 14, 2019 at 03:31:47PM +0300, Yury Kotov wrote:
> >>  Hi,
> >>
> >>  14.03.2019, 14:44, "Daniel P. Berrangé" <berra...@redhat.com>:
> >>  > On Thu, Mar 14, 2019 at 07:34:03AM -0400, Michael S. Tsirkin wrote:
> >>  >>  On Thu, Mar 14, 2019 at 11:24:22AM +0000, Daniel P. Berrangé wrote:
> >>  >>  > On Tue, Mar 12, 2019 at 12:49:35PM -0400, Michael S. Tsirkin wrote:
> >>  >>  > > On Thu, Feb 28, 2019 at 04:53:54PM +0800, elohi...@gmail.com 
> >> wrote:
> >>  >>  > > > From: Xie Yongji <xieyon...@baidu.com>
> >>  >>  > > >
> >>  >>  > > > Since we now support the message VHOST_USER_GET_INFLIGHT_FD
> >>  >>  > > > and VHOST_USER_SET_INFLIGHT_FD. The backend is able to restart
> >>  >>  > > > safely because it can track inflight I/O in shared memory.
> >>  >>  > > > This patch allows qemu to reconnect the backend after
> >>  >>  > > > connection closed.
> >>  >>  > > >
> >>  >>  > > > Signed-off-by: Xie Yongji <xieyon...@baidu.com>
> >>  >>  > > > Signed-off-by: Ni Xun <ni...@baidu.com>
> >>  >>  > > > Signed-off-by: Zhang Yu <zhangy...@baidu.com>
> >>  >>  > > > ---
> >>  >>  > > > hw/block/vhost-user-blk.c | 205 +++++++++++++++++++++++------
> >>  >>  > > > include/hw/virtio/vhost-user-blk.h | 4 +
> >>  >>  > > > 2 files changed, 167 insertions(+), 42 deletions(-)
> >>  >>  >
> >>  >>  >
> >>  >>  > > > static void vhost_user_blk_device_realize(DeviceState *dev, 
> >> Error **errp)
> >>  >>  > > > {
> >>  >>  > > > VirtIODevice *vdev = VIRTIO_DEVICE(dev);
> >>  >>  > > > VHostUserBlk *s = VHOST_USER_BLK(vdev);
> >>  >>  > > > VhostUserState *user;
> >>  >>  > > > - struct vhost_virtqueue *vqs = NULL;
> >>  >>  > > > int i, ret;
> >>  >>  > > > + Error *err = NULL;
> >>  >>  > > >
> >>  >>  > > > if (!s->chardev.chr) {
> >>  >>  > > > error_setg(errp, "vhost-user-blk: chardev is mandatory");
> >>  >>  > > > @@ -312,27 +442,28 @@ static void 
> >> vhost_user_blk_device_realize(DeviceState *dev, Error **errp)
> >>  >>  > > > }
> >>  >>  > > >
> >>  >>  > > > s->inflight = g_new0(struct vhost_inflight, 1);
> >>  >>  > > > -
> >>  >>  > > > - s->dev.nvqs = s->num_queues;
> >>  >>  > > > - s->dev.vqs = g_new(struct vhost_virtqueue, s->dev.nvqs);
> >>  >>  > > > - s->dev.vq_index = 0;
> >>  >>  > > > - s->dev.backend_features = 0;
> >>  >>  > > > - vqs = s->dev.vqs;
> >>  >>  > > > -
> >>  >>  > > > - vhost_dev_set_config_notifier(&s->dev, &blk_ops);
> >>  >>  > > > -
> >>  >>  > > > - ret = vhost_dev_init(&s->dev, s->vhost_user, 
> >> VHOST_BACKEND_TYPE_USER, 0);
> >>  >>  > > > - if (ret < 0) {
> >>  >>  > > > - error_setg(errp, "vhost-user-blk: vhost initialization 
> >> failed: %s",
> >>  >>  > > > - strerror(-ret));
> >>  >>  > > > - goto virtio_err;
> >>  >>  > > > - }
> >>  >>  > > > + s->vqs = g_new(struct vhost_virtqueue, s->num_queues);
> >>  >>  > > > + s->watch = 0;
> >>  >>  > > > + s->should_start = false;
> >>  >>  > > > + s->connected = false;
> >>  >>  > > > +
> >>  >>  > > > + qemu_chr_fe_set_handlers(&s->chardev, NULL, NULL, 
> >> vhost_user_blk_event,
> >>  >>  > > > + NULL, (void *)dev, NULL, true);
> >>  >>  > > > +
> >>  >>  > > > +reconnect:
> >>  >>  > > > + do {
> >>  >>  > > > + if (qemu_chr_fe_wait_connected(&s->chardev, &err) < 0) {
> >>  >>  > > > + error_report_err(err);
> >>  >>  > > > + err = NULL;
> >>  >>  > > > + sleep(1);
> >>  >>  > >
> >>  >>  > > Seems arbitrary. Is this basically waiting until backend will 
> >> reconnect?
> >>  >>  > > Why not block until event on the fd triggers?
> >>  >>  > >
> >>  >>  > > Also, it looks like this will just block forever with no monitor 
> >> input
> >>  >>  > > and no way for user to figure out what is going on short of
> >>  >>  > > crashing QEMU.
> >>  >>  >
> >>  >>  > FWIW, the current vhost-user-net device does exactly the same thing
> >>  >>  > with calling qemu_chr_fe_wait_connected during its realize() 
> >> function.
> >>  >>
> >>  >>  Hmm yes. It doesn't sleep for an arbitrary 1 sec so less of an 
> >> eyesore :)
> >>  >
> >>  > The sleep(1) in this patch simply needs to be removed. I think that
> >>  > probably dates from when it was written against the earlier broken
> >>  > version of qemu_chr_fe_wait_connected(). That would not correctly
> >>  > deal with the "reconnect" flag, and so needing this loop with a sleep
> >>  > in it.
> >>  >
> >>  > In fact the while loop can be removed as well in this code. It just
> >>  > needs to call qemu_chr_fe_wait_connected() once. It is guaranteed
> >>  > to have a connected peer once that returns 0.
> >>  >
> >>  > qemu_chr_fe_wait_connected() only returns -1 if the operating in
> >>  > client mode, and it failed to connect and reconnect is *not*
> >>  > requested. In such case the caller should honour the failure and
> >>  > quit, not loop to retry.
> >>  >
> >>  > The reason vhost-user-net does a loop is because once it has a
> >>  > connection it tries todo a protocol handshake, and if that
> >>  > handshake fails it closes the chardev and tries to connect
> >>  > again. That's not the case in this blk code os the loop is
> >>  > not needed.
> >>  >
> >>
> >>  But vhost-user-blk also has a handshake in device realize. What happens 
> >> if the
> >>  connection is broken during realization? IIUC we have to retry a 
> >> handshake in
> >>  such case just like vhost-user-net.
> >
> > I'm just commenting on the current code which does not do that
> > handshake in the loop afaict. If it needs to do that then the
> > patch should be updated...
> >
>
> Oh, yes... This loop doesn't do a handshake. Handshake is after the loop.
> But now it gotos to reconnect. So may be it makes sense to rewrite a handshake
> since we don't need two nested loops to get reconnection without gotos.
>


Actually we do a handshake in loop like this:

qemu_chr_fe_wait_connected()
  tcp_chr_wait_connected()
    tcp_chr_connect_client_sync()
      tcp_chr_new_client()
        qemu_chr_be_event(chr, CHR_EVENT_OPENED);
          vhost_user_blk_event()
            vhost_user_blk_connect()
              vhost_dev_init()

Then I use s->connected to check the result of vhost_dev_init().

Thanks,
Yongji

Re: [Qemu-devel] [PATCH v7 6/7] vhost-user-blk: Add support to reconnect backend

Reply via email to