Re: [Qemu-devel] [snabb-devel] Re: [PATCH v6 1/2] vhost-user: add multi queue support

Michael S. Tsirkin Thu, 27 Aug 2015 06:07:04 -0700

On Tue, Aug 25, 2015 at 03:25:54AM +0000, Ouyang, Changchun wrote:
> Hi Michael,
> 
> > -----Original Message-----
> > From: snabb-de...@googlegroups.com [mailto:snabb-
> > de...@googlegroups.com] On Behalf Of Michael S. Tsirkin
> > Sent: Thursday, August 13, 2015 5:19 PM
> > To: Ouyang, Changchun
> > Cc: qemu-devel@nongnu.org; snabb-de...@googlegroups.com;
> > thibaut.col...@6wind.com; n.nikol...@virtualopensystems.com;
> > l...@snabb.co; Long, Thomas
> > Subject: [snabb-devel] Re: [PATCH v6 1/2] vhost-user: add multi queue
> > support
> > 
> > On Wed, Aug 12, 2015 at 02:25:41PM +0800, Ouyang Changchun wrote:
> > > Based on patch by Nikolay Nikolaev:
> > > Vhost-user will implement the multi queue support in a similar way to
> > > what vhost already has - a separate thread for each queue.
> > > To enable the multi queue functionality - a new command line parameter
> > > "queues" is introduced for the vhost-user netdev.
> > >
> > > The RESET_OWNER change is based on commit:
> > >    294ce717e0f212ed0763307f3eab72b4a1bdf4d0
> > > If it is reverted, the patch need update for it accordingly.
> > >
> > > Signed-off-by: Nikolay Nikolaev <n.nikol...@virtualopensystems.com>
> > > Signed-off-by: Changchun Ouyang <changchun.ouy...@intel.com>
> > > ---
> > > Changes since v5:
> > >  - fix the message descption for VHOST_RESET_OWNER in vhost-user txt
> > >
> > > Changes since v4:
> > >  - remove the unnecessary trailing '\n'
> > >
> > > Changes since v3:
> > >  - fix one typo and wrap one long line
> > >
> > > Changes since v2:
> > >  - fix vq index issue for set_vring_call
> > >    When it is the case of VHOST_SET_VRING_CALL, The vq_index is not
> > initialized before it is used,
> > >    thus it could be a random value. The random value leads to crash in 
> > > vhost
> > after passing down
> > >    to vhost, as vhost use this random value to index an array index.
> > >  - fix the typo in the doc and description
> > >  - address vq index for reset_owner
> > >
> > > Changes since v1:
> > >  - use s->nc.info_str when bringing up/down the backend
> > >
> > >  docs/specs/vhost-user.txt |  7 ++++++-
> > >  hw/net/vhost_net.c        |  3 ++-
> > >  hw/virtio/vhost-user.c    | 11 ++++++++++-
> > >  net/vhost-user.c          | 37 ++++++++++++++++++++++++-------------
> > >  qapi-schema.json          |  6 +++++-
> > >  qemu-options.hx           |  5 +++--
> > >  6 files changed, 50 insertions(+), 19 deletions(-)
> > >
> > > diff --git a/docs/specs/vhost-user.txt b/docs/specs/vhost-user.txt
> > > index 70da3b1..9390f89 100644
> > > --- a/docs/specs/vhost-user.txt
> > > +++ b/docs/specs/vhost-user.txt
> > > @@ -135,6 +135,11 @@ As older slaves don't support negotiating
> > > protocol features,  a feature bit was dedicated for this purpose:
> > >  #define VHOST_USER_F_PROTOCOL_FEATURES 30
> > >
> > > +Multi queue support
> > > +-------------------
> > > +The protocol supports multiple queues by setting all index fields in
> > > +the sent messages to a properly calculated value.
> > > +
> > >  Message types
> > >  -------------
> > >
> > > @@ -198,7 +203,7 @@ Message types
> > >
> > >        Id: 4
> > >        Equivalent ioctl: VHOST_RESET_OWNER
> > > -      Master payload: N/A
> > > +      Master payload: vring state description
> > >
> > >        Issued when a new connection is about to be closed. The Master 
> > > will no
> > >        longer own this connection (and will usually close it).
> > 
> > This is an interface change, isn't it?
> > We can't make it unconditionally, need to make it dependent on a protocol
> > flag.



Pls remember to fix this one.

> > 
> > > diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c index
> > > 1f25cb3..9cd6c05 100644
> > > --- a/hw/net/vhost_net.c
> > > +++ b/hw/net/vhost_net.c
> > > @@ -159,6 +159,7 @@ struct vhost_net *vhost_net_init(VhostNetOptions
> > > *options)
> > >
> > >      net->dev.nvqs = 2;
> > >      net->dev.vqs = net->vqs;
> > > +    net->dev.vq_index = net->nc->queue_index;
> > >
> > >      r = vhost_dev_init(&net->dev, options->opaque,
> > >                         options->backend_type, options->force); @@
> > > -269,7 +270,7 @@ static void vhost_net_stop_one(struct vhost_net *net,
> > >          for (file.index = 0; file.index < net->dev.nvqs; ++file.index) {
> > >              const VhostOps *vhost_ops = net->dev.vhost_ops;
> > >              int r = vhost_ops->vhost_call(&net->dev, VHOST_RESET_OWNER,
> > > -                                          NULL);
> > > +                                          &file);
> > >              assert(r >= 0);
> > >          }
> > >      }
> > > diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c index
> > > 27ba035..fb11d4c 100644
> > > --- a/hw/virtio/vhost-user.c
> > > +++ b/hw/virtio/vhost-user.c
> > > @@ -219,7 +219,12 @@ static int vhost_user_call(struct vhost_dev *dev,
> > unsigned long int request,
> > >          break;
> > >
> > >      case VHOST_USER_SET_OWNER:
> > > +        break;
> > > +
> > >      case VHOST_USER_RESET_OWNER:
> > > +        memcpy(&msg.state, arg, sizeof(struct vhost_vring_state));
> > > +        msg.state.index += dev->vq_index;
> > > +        msg.size = sizeof(m.state);
> > >          break;
> > >
> > >      case VHOST_USER_SET_MEM_TABLE:
> > > @@ -262,17 +267,20 @@ static int vhost_user_call(struct vhost_dev *dev,
> > unsigned long int request,
> > >      case VHOST_USER_SET_VRING_NUM:
> > >      case VHOST_USER_SET_VRING_BASE:
> > >          memcpy(&msg.state, arg, sizeof(struct vhost_vring_state));
> > > +        msg.state.index += dev->vq_index;
> > >          msg.size = sizeof(m.state);
> > >          break;
> > >
> > >      case VHOST_USER_GET_VRING_BASE:
> > >          memcpy(&msg.state, arg, sizeof(struct vhost_vring_state));
> > > +        msg.state.index += dev->vq_index;
> > >          msg.size = sizeof(m.state);
> > >          need_reply = 1;
> > >          break;
> > >
> > >      case VHOST_USER_SET_VRING_ADDR:
> > >          memcpy(&msg.addr, arg, sizeof(struct vhost_vring_addr));
> > > +        msg.addr.index += dev->vq_index;
> > >          msg.size = sizeof(m.addr);
> > >          break;
> > >
> > > @@ -280,7 +288,7 @@ static int vhost_user_call(struct vhost_dev *dev,
> > unsigned long int request,
> > >      case VHOST_USER_SET_VRING_CALL:
> > >      case VHOST_USER_SET_VRING_ERR:
> > >          file = arg;
> > > -        msg.u64 = file->index & VHOST_USER_VRING_IDX_MASK;
> > > +        msg.u64 = (file->index + dev->vq_index) &
> > > + VHOST_USER_VRING_IDX_MASK;
> > >          msg.size = sizeof(m.u64);
> > >          if (ioeventfd_enabled() && file->fd > 0) {
> > >              fds[fd_num++] = file->fd; @@ -322,6 +330,7 @@ static int
> > > vhost_user_call(struct vhost_dev *dev, unsigned long int request,
> > >                  error_report("Received bad msg size.");
> > >                  return -1;
> > >              }
> > > +            msg.state.index -= dev->vq_index;
> > >              memcpy(arg, &msg.state, sizeof(struct vhost_vring_state));
> > >              break;
> > >          default:
> > > diff --git a/net/vhost-user.c b/net/vhost-user.c index
> > > 1d86a2b..904d8af 100644
> > > --- a/net/vhost-user.c
> > > +++ b/net/vhost-user.c
> > > @@ -121,35 +121,39 @@ static void net_vhost_user_event(void *opaque,
> > int event)
> > >      case CHR_EVENT_OPENED:
> > >          vhost_user_start(s);
> > >          net_vhost_link_down(s, false);
> > > -        error_report("chardev \"%s\" went up", s->chr->label);
> > > +        error_report("chardev \"%s\" went up", s->nc.info_str);
> > >          break;
> > >      case CHR_EVENT_CLOSED:
> > >          net_vhost_link_down(s, true);
> > >          vhost_user_stop(s);
> > > -        error_report("chardev \"%s\" went down", s->chr->label);
> > > +        error_report("chardev \"%s\" went down", s->nc.info_str);
> > >          break;
> > >      }
> > >  }

BTW this seems pretty hacky: you get multiple messages when one
client connects. Why add multiple event listeners to the same
chat device?


> > >
> > >  static int net_vhost_user_init(NetClientState *peer, const char *device,
> > > -                               const char *name, CharDriverState *chr)
> > > +                               const char *name, CharDriverState *chr,
> > > +                               uint32_t queues)
> > >  {
> > >      NetClientState *nc;
> > >      VhostUserState *s;
> > > +    int i;
> > >
> > > -    nc = qemu_new_net_client(&net_vhost_user_info, peer, device,
> > name);
> > > +    for (i = 0; i < queues; i++) {
> > > +        nc = qemu_new_net_client(&net_vhost_user_info, peer, device,
> > > + name);
> > >
> > > -    snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user to %s",
> > > -             chr->label);
> > > +        snprintf(nc->info_str, sizeof(nc->info_str), "vhost-user%d to 
> > > %s",
> > > +                 i, chr->label);
> > >
> > > -    s = DO_UPCAST(VhostUserState, nc, nc);
> > > +        s = DO_UPCAST(VhostUserState, nc, nc);
> > >
> > > -    /* We don't provide a receive callback */
> > > -    s->nc.receive_disabled = 1;
> > > -    s->chr = chr;
> > > -
> > > -    qemu_chr_add_handlers(s->chr, NULL, NULL, net_vhost_user_event,
> > s);
> > > +        /* We don't provide a receive callback */
> > > +        s->nc.receive_disabled = 1;
> > > +        s->chr = chr;
> > > +        s->nc.queue_index = i;
> > >
> > > +        qemu_chr_add_handlers(s->chr, NULL, NULL, net_vhost_user_event,
> > s);
> > > +    }
> > >      return 0;
> > >  }
> > >
> > > @@ -225,6 +229,7 @@ static int net_vhost_check_net(QemuOpts *opts,
> > > void *opaque)
> > 
> > 
> > There are two problems here:
> > 
> > 1. we don't really know that the backend
> >    is able to support the requested number of queues.
> >    If not, everything will fail, silently.
> >    A new message to query the # of queues could help, though
> >    I'm not sure what can be done on failure. Fail connection?
> > 
> > 2. each message (e.g. set memory table) is sent multiple times,
> >    on the same socket.
> > 
> I think it is tough to resolve these 2 comments, as the current message is 
> either vhost-dev based or virt-queue based,
> The multiple queues(pair) feature use multiple vhost-devs to implement itself.
> For #1
> So the queue number is something should be seen in the upper level of 
> vhost-dev rather than inside the vhost-dev.
> For each vhost-net, there are 2 virt-queues, one is for Rx the other is for 
> Tx.
> introduce the virt-queue pair number into the vhost-dev? But I don't think it 
> is good, as for each vhost-dev, there is only one
> virt-queue pair.
> 
> Where should I put the virt-queue pair number to? I don't get the perfect 
> answer till now. Any suggestion is welcome.
> 
> Could we assume the vhost backend has the ability to create enough virt-queue 
> pair(e.g. 0x8000 is the max) if qemu require
> Vhost backend to do it. If it is correct, we don't need get virt-queue pair 
> number from vhost backend, as vhost backend can
> Create all virt-queue pair required by qemu.
> The virtio frontend(on guest) has the flexibility to enable which virt-queue 
> according to its own capability, qemu can do it by using
> Set_vring_flag message to notify vhost backend.   

I'm reluctant to agree to this. Implementations tend to get this wrong,
e.g. they would only test with 2 queues and assume everything is OK.
With an explicit message, this seems more robust.

Why is it so hard to implement?  User specifies queues=X.
Can't we simply validate that backend supports this # of queues?


> For #2 
> The memory table message is also vhost-dev based, it wouldn't hurt we send it 
> a few times, vhost backend could
> Keep it vhost-dev based too, or keep it once(keep it when first time and 
> ignore in rest messages from the same connected-fd)
> Any other good suggestion is welcome too :-)

Add code in vhost-user to skip sending the useless messages?  Yes they
seem harmless but implementations tend to develop dependencies on such
bugs, then we get to maintain them forever.

> > 
> > 
> > >  int net_init_vhost_user(const NetClientOptions *opts, const char *name,
> > >                          NetClientState *peer)  {
> > > +    uint32_t queues;
> > >      const NetdevVhostUserOptions *vhost_user_opts;
> > >      CharDriverState *chr;
> > >
> > > @@ -243,6 +248,12 @@ int net_init_vhost_user(const NetClientOptions
> > *opts, const char *name,
> > >          return -1;
> > >      }
> > >
> > > +    /* number of queues for multiqueue */
> > > +    if (vhost_user_opts->has_queues) {
> > > +        queues = vhost_user_opts->queues;
> > > +    } else {
> > > +        queues = 1;
> > > +    }
> > >
> > > -    return net_vhost_user_init(peer, "vhost_user", name, chr);
> > > +    return net_vhost_user_init(peer, "vhost_user", name, chr,
> > > + queues);
> > >  }
> > > diff --git a/qapi-schema.json b/qapi-schema.json index
> > > f97ffa1..51e40ce 100644
> > > --- a/qapi-schema.json
> > > +++ b/qapi-schema.json
> > > @@ -2444,12 +2444,16 @@
> > >  #
> > >  # @vhostforce: #optional vhost on for non-MSIX virtio guests (default:
> > false).
> > >  #
> > > +# @queues: #optional number of queues to be created for multiqueue
> > vhost-user
> > > +#          (default: 1) (Since 2.5)
> > > +#
> > >  # Since 2.1
> > >  ##
> > >  { 'struct': 'NetdevVhostUserOptions',
> > >    'data': {
> > >      'chardev':        'str',
> > > -    '*vhostforce':    'bool' } }
> > > +    '*vhostforce':    'bool',
> > > +    '*queues':        'uint32' } }
> > >
> > >  ##
> > >  # @NetClientOptions
> > > diff --git a/qemu-options.hx b/qemu-options.hx index ec356f6..dad035e
> > > 100644
> > > --- a/qemu-options.hx
> > > +++ b/qemu-options.hx
> > > @@ -1942,13 +1942,14 @@ The hubport netdev lets you connect a NIC to a
> > > QEMU "vlan" instead of a single  netdev.  @code{-net} and
> > > @code{-device} with parameter @option{vlan} create the  required hub
> > automatically.
> > >
> > > -@item -netdev vhost-user,chardev=@var{id}[,vhostforce=on|off]
> > > +@item -netdev
> > > +vhost-user,chardev=@var{id}[,vhostforce=on|off][,queues=n]
> > >
> > >  Establish a vhost-user netdev, backed by a chardev @var{id}. The
> > > chardev should  be a unix domain socket backed one. The vhost-user
> > > uses a specifically defined  protocol to pass vhost ioctl replacement
> > > messages to an application on the other  end of the socket. On
> > > non-MSIX guests, the feature can be forced with -@var{vhostforce}.
> > > +@var{vhostforce}. Use 'queues=@var{n}' to specify the number of
> > > +queues to be created for multiqueue vhost-user.
> > >
> > >  Example:
> > >  @example
> > > --
> > > 1.8.4.2
> > >
> > 
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Snabb Switch development" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to snabb-devel+unsubscr...@googlegroups.com.
> > To post to this group, send an email to snabb-de...@googlegroups.com.
> > Visit this group at http://groups.google.com/group/snabb-devel.

Re: [Qemu-devel] [snabb-devel] Re: [PATCH v6 1/2] vhost-user: add multi queue support

Reply via email to