On Wed, Jan 15, 2014 at 3:49 PM, Michael S. Tsirkin <m...@redhat.com> wrote:

> On Wed, Jan 15, 2014 at 01:50:47PM +0100, Antonios Motakis wrote:
> >
> >
> >
> > On Wed, Jan 15, 2014 at 10:07 AM, Michael S. Tsirkin <m...@redhat.com>
> wrote:
> >
> >     On Tue, Jan 14, 2014 at 07:13:43PM +0100, Antonios Motakis wrote:
> >     >
> >     >
> >     >
> >     > On Tue, Jan 14, 2014 at 12:33 PM, Michael S. Tsirkin <
> m...@redhat.com>
> >     wrote:
> >     >
> >     >     On Mon, Jan 13, 2014 at 03:25:11PM +0100, Antonios Motakis
> wrote:
> >     >     > In this patch series we would like to introduce our approach
> for
> >     putting
> >     >     a
> >     >     > virtio-net backend in an external userspace process. Our
> eventual
> >     target
> >     >     is to
> >     >     > run the network backend in the Snabbswitch ethernet switch,
> while
> >     >     receiving
> >     >     > traffic from a guest inside QEMU/KVM which runs an unmodified
> >     virtio-net
> >     >     > implementation.
> >     >     >
> >     >     > For this, we are working into extending vhost to allow
> equivalent
> >     >     functionality
> >     >     > for userspace. Vhost already passes control of the data
> plane of
> >     >     virtio-net to
> >     >     > the host kernel; we want to realize a similar model, but for
> >     userspace.
> >     >     >
> >     >     > In this patch series the concept of a vhost-backend is
> introduced.
> >     >     >
> >     >     > We define two vhost backend types - vhost-kernel and
> vhost-user.
> >     The
> >     >     former is
> >     >     > the interface to the current kernel module implementation.
> Its
> >     control
> >     >     plane is
> >     >     > ioctl based. The data plane is the kernel directly accessing
> the
> >     QEMU
> >     >     allocated,
> >     >     > guest memory.
> >     >     >
> >     >     > In the new vhost-user backend, the control plane is based on
> >     >     communication
> >     >     > between QEMU and another userspace process using a unix
> domain
> >     socket.
> >     >     This
> >     >     > allows to implement a virtio backend for a guest running in
> QEMU,
> >     inside
> >     >     the
> >     >     > other userspace process.
> >     >     >
> >     >     > We change -mem-path to QemuOpts and add prealloc, share and
> unlink
> >     as
> >     >     properties
> >     >     > to it. HugeTLBFS requirements of -mem-path are relaxed, so
> any
> >     valid path
> >     >     can
> >     >     > be used now. The new properties allow more fine grained
> control
> >     over the
> >     >     guest
> >     >     > RAM backing store.
> >     >     >
> >     >     > The data path is realized by directly accessing the vrings
> and the
> >     buffer
> >     >     data
> >     >     > off the guest's memory.
> >     >     >
> >     >     > The current user of vhost-user is only vhost-net. We add new
> netdev
> >     >     backend
> >     >     > that is intended to initialize vhost-net with vhost-user
> backend.
> >     >
> >     >     Some meta comments.
> >     >
> >     >     Something that makes this patch harder to review is how it's
> >     >     split up. Generally IMHO it's not a good idea to repeatedly
> >     >     edit same part of file adding stuff in patch after patch,
> >     >     it's only making things harder to read if you add stubs, then
> fill
> >     them up.
> >     >     (we do this sometimes when we are changing existing code, but
> >     >     it is generally not needed when adding new code)
> >     >
> >     >     Instead, split it like this:
> >     >
> >     >     1. general refactoring, split out linux specific and generic
> parts
> >     >        and add the ops indirection
> >     >     2. add new files for vhost-user with complete implementation.
> >     >        without command line to support it, there will be no way to
> use
> >     it,
> >     >        but should build fine.
> >     >     3. tie it all up with option parsing
> >     >
> >     >
> >     >     Generic vhost and vhost net files should be kept separate.
> >     >     Don't let vhost net stuff seep back into generic files,
> >     >     we have vhost-scsi too.
> >     >     I would also prefer that userspace vhost has its own files.
> >     >
> >     >
> >     > Ok, we'll keep this into account.
> >     >
> >     >
> >     >
> >     >     We need a small test server qemu can talk to, to verify things
> >     >     actually work.
> >     >
> >     >
> >     > We have implemented such a test app: https://github.com/
> >     virtualopensystems/vapp
> >     >
> >     > We use it for testing, and also as a reference implementation. A
> client
> >     is also
> >     > included.
> >     >
> >
> >     Sounds good. Can we include this in qemu and tie
> >     it into the qtest framework?
> >     >From a brief look, it merely needs to be tweaked for portability,
> >     unless
> >
> >     >
> >     >     Already commented on: reuse the chardev syntax and preferably
> code.
> >     >     We already support a bunch of options there for
> >     >     domain sockets that will be useful here, they should
> >     >     work here as well.
> >     >
> >     >
> >     > We adapted the syntax for this to be consistent with chardev. What
> we
> >     didn't
> >     > use, it is not obvious at all to us on how they should be used; a
> lot of
> >     the
> >     > chardev options just don't apply to us.
> >     >
> >
> >     Well server option should work at least.
> >     nowait can work too?
> >
> >     Also, if reconnect is useful it should be for chardevs too, so if we
> don't
> >     share code, need to code it in two places to stay consistent.
> >
> >     Overall sharing some code might be better ...
> >
> >
> >
> > What you have in mind is to use the functions chardev uses from
> qemu-sockets.c
> > right? Chardev itself doesn't look to have anything else that can be
> shared.
>
> Yes.
>
> > The problem with reconnect is that it is implemented at the protocol
> level; we
> > are not just transparently reconnecting the socket. So the same approach
> would
> > most likely not apply for chardev.
>
> Chardev mostly just could use transparent reconnect.
> vhost-user could use that and get a callback to reconfigure
> everything after reconnect.
>
> Once you write up the protocol in some text file we can
> discuss this in more detail.
> For example I wonder how would feature negotiation work
> with reconnect: new connection could be from another
> application that does not support same features, but
> virtio assumes that device features never change.
>

That's a good point; I think we can handle this by checking if the
supported features are still the same (or at least a superset of the
features used by virtio-net). Otherwise we would have to break virtio-net,
which would be bad.

Antonios


>
> >
> >
> >
> >     >     In particular you shouldn't require filesystem access by qemu,
> >     >     passing fd for domain socket should work.
> >     >
> >     >
> >     > We can add an option to pass an fd for the domain socket if needed.
> >     However as
> >     > far as we understand, chardev doesn't do that either (at least form
> >     looking at
> >     > the man page). Maybe we misunderstand what you mean.
> >
> >     Sorry. I got confused with e.g. tap which has this. This might be
> >     useful but does not have to block this patch.
> >
> >     >
> >     >
> >     >     > Example usage:
> >     >     >
> >     >     > qemu -m 1024 -mem-path /hugetlbfs,prealloc=on,share=on \
> >     >     >      -netdev
> type=vhost-user,id=net0,path=/path/to/sock,poll_time=
> >     2500 \
> >     >     >      -device virtio-net-pci,netdev=net0
> >     >
> >     >     It's not clear which parts of -mem-path are required for
> vhost-user.
> >     >     It should be documented somewhere, made clear in -help
> >     >     and should fail gracefully if misconfigured.
> >     >
> >     >
> >     >
> >     > Ok.
> >     >
> >     >
> >     >
> >     >     >
> >     >     > Changes from v5:
> >     >     >  - Split -mem-path unlink option to a separate patch
> >     >     >  - Fds are passed only in the ancillary data
> >     >     >  - Stricter message size checks on receive/send
> >     >     >  - Netdev vhost-user now includes path and poll_time options
> >     >     >  - The connection probing interval is configurable
> >     >     >
> >     >     > Changes from v4:
> >     >     >  - Use error_report for errors
> >     >     >  - VhostUserMsg has new field `size` indicating the following
> >     payload
> >     >     length.
> >     >     >    Field `flags` now has version and reply bits. The
> structure is
> >     packed.
> >     >     >  - Send data is of variable length (`size` field in message)
> >     >     >  - Receive in 2 steps, header and payload
> >     >     >  - Add new message type VHOST_USER_ECHO, to check connection
> status
> >     >     >
> >     >     > Changes from v3:
> >     >     >  - Convert -mem-path to QemuOpts with prealloc, share and
> unlink
> >     >     properties
> >     >     >  - Set 1 sec timeout when read/write to the unix domain
> socket
> >     >     >  - Fix file descriptor leak
> >     >     >
> >     >     > Changes from v2:
> >     >     >  - Reconnect when the backend disappears
> >     >     >
> >     >     > Changes from v1:
> >     >     >  - Implementation of vhost-user netdev backend
> >     >     >  - Code improvements
> >     >     >
> >     >     > Antonios Motakis (8):
> >     >     >   Convert -mem-path to QemuOpts and add prealloc and share
> >     properties
> >     >     >   New -mem-path option - unlink.
> >     >     >   Decouple vhost from kernel interface
> >     >     >   Add vhost-user skeleton
> >     >     >   Add domain socket communication for vhost-user backend
> >     >     >   Add vhost-user calls implementation
> >     >     >   Add new vhost-user netdev backend
> >     >     >   Add vhost-user reconnection
> >     >     >
> >     >     >  exec.c                            |  57 +++-
> >     >     >  hmp-commands.hx                   |   4 +-
> >     >     >  hw/net/vhost_net.c                | 144 +++++++---
> >     >     >  hw/net/virtio-net.c               |  42 ++-
> >     >     >  hw/scsi/vhost-scsi.c              |  13 +-
> >     >     >  hw/virtio/Makefile.objs           |   2 +-
> >     >     >  hw/virtio/vhost-backend.c         | 556
> >     >     ++++++++++++++++++++++++++++++++++++++
> >     >     >  hw/virtio/vhost.c                 |  46 ++--
> >     >     >  include/exec/cpu-all.h            |   3 -
> >     >     >  include/hw/virtio/vhost-backend.h |  40 +++
> >     >     >  include/hw/virtio/vhost.h         |   4 +-
> >     >     >  include/net/vhost-user.h          |  17 ++
> >     >     >  include/net/vhost_net.h           |  15 +-
> >     >     >  net/Makefile.objs                 |   2 +-
> >     >     >  net/clients.h                     |   3 +
> >     >     >  net/hub.c                         |   1 +
> >     >     >  net/net.c                         |   2 +
> >     >     >  net/tap.c                         |  16 +-
> >     >     >  net/vhost-user.c                  | 177 ++++++++++++
> >     >     >  qapi-schema.json                  |  21 +-
> >     >     >  qemu-options.hx                   |  24 +-
> >     >     >  vl.c                              |  41 ++-
> >     >     >  22 files changed, 1106 insertions(+), 124 deletions(-)
> >     >     >  create mode 100644 hw/virtio/vhost-backend.c
> >     >     >  create mode 100644 include/hw/virtio/vhost-backend.h
> >     >     >  create mode 100644 include/net/vhost-user.h
> >     >     >  create mode 100644 net/vhost-user.c
> >     >     >
> >     >     > --
> >     >     > 1.8.3.2
> >     >     >
> >     >
> >     >
> >
> >
>

Reply via email to