> -----Original Message-----
> From: Michael S. Tsirkin [mailto:m...@redhat.com]
> Sent: Thursday, February 12, 2015 2:09 PM
> To: Traynor, Kevin
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH RFC v6 1/1] netdev-dpdk: add dpdk vhost ports
> 
> On Thu, Feb 12, 2015 at 12:59:17PM +0000, Traynor, Kevin wrote:
> > > -----Original Message-----
> > > From: Michael S. Tsirkin [mailto:m...@redhat.com]
> > > Sent: Wednesday, January 21, 2015 11:19 AM
> > > To: Traynor, Kevin
> > > Cc: dev@openvswitch.org
> > > Subject: Re: [ovs-dev] [PATCH RFC v6 1/1] netdev-dpdk: add dpdk vhost
> ports
> > >
> > > On Thu, Jan 08, 2015 at 11:05:02PM +0000, Kevin Traynor wrote:
> > > > This patch adds support for a new port type to userspace datapath
> > > > called dpdkvhost. This allows KVM (QEMU) to offload the servicing
> > > > of virtio-net devices to its associated dpdkvhost port. Instructions
> > > > for use are in INSTALL.DPDK.
> > > >
> > > > This has been tested on Intel multi-core platforms and with clients
> > > > that have virtio-net interfaces.
> > > >
> > > >  ver 6:
> > > >    - rebased with master
> > > >    - modified to use DPDK v1.8.0 vhost library
> > > >    - reworked for review comments
> > > >  ver 5:
> > > >    - rebased against latest master
> > > >  ver 4:
> > > >    - added eventfd_link.h and eventfd_link.c to EXTRA_DIST in
> > > >  utilities/automake.mk
> > > >    - rebased with master to work with DPDK 1.7 ver 3:
> > > >    - rebased with master
> > > >  ver 2:
> > > >    - rebased with master
> > > >
> > > > Signed-off-by: Ciara Loftus <ciara.lof...@intel.com>
> > > > Signed-off-by: Kevin Traynor <kevin.tray...@intel.com>
> > > > Signed-off-by: Maryam Tahhan <maryam.tah...@intel.com>
> > > > ---
> > > >  INSTALL.DPDK.md         |  236 +++++++++++++++++
> > > >  Makefile.am             |    4 +
> > > >  lib/automake.mk         |    1 +
> > > >  lib/netdev-dpdk.c       |  649
> +++++++++++++++++++++++++++++++++++++++--------
> > > >  lib/netdev.c            |    3 +-
> > > >  utilities/automake.mk   |    3 +-
> > > >  utilities/qemu-wrap.py  |  389 ++++++++++++++++++++++++++++
> > > >  vswitchd/ovs-vswitchd.c |    4 +-
> > > >  8 files changed, 1177 insertions(+), 112 deletions(-)
> > > >  mode change 100644 => 100755 lib/netdev-dpdk.c
> > > >  create mode 100755 utilities/qemu-wrap.py
> > > >
> > > > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> > > > index 2cc7636..da8116d 100644
> > > > --- a/INSTALL.DPDK.md
> > > > +++ b/INSTALL.DPDK.md
> > > > @@ -17,6 +17,7 @@ Building and Installing:
> > > >  ------------------------
> > > >
> > > >  Required DPDK 1.7
> > > > +Optional `fuse`, `fuse-devel`
> > > >
> > > >  1. Configure build & install DPDK:
> > > >    1. Set `$DPDK_DIR`
> > > > @@ -264,6 +265,241 @@ A general rule of thumb for better performance is
> that the client
> > > >  application should not be assigned the same dpdk core mask "-c" as
> > > >  the vswitchd.
> > > >
> > > > +DPDK vHost:
> > > > +-----------
> > > > +
> > > > +Prerequisites:
> > > > +1.  DPDK 1.8 with vHost support enabled and recompile OVS as above.
> > > > +
> > > > +     Update `config/common_linuxapp` so that DPDK is built with vHost
> > > > +     libraries:
> > > > +
> > > > +     `CONFIG_RTE_LIBRTE_VHOST=y`
> > > > +
> > > > +2.  Insert the Fuse module:
> > > > +
> > > > +      `modprobe fuse`
> > > > +
> > > > +3.  Build and insert the `eventfd_link` module:
> > > > +
> > > > +     `cd $DPDK_DIR/lib/librte_vhost/eventfd_link/`
> > > > +     `make`
> > > > +     `insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko`
> > > > +
> > > > +4.  Remove /dev/vhost-net character device:
> > > > +
> > > > +      `rm -rf /dev/vhost-net`
> > >
> > > I think it's not a good idea to tell people to do this,
> > > best to drop this section and put "with standard vhost"
> > > here instead.
> >
> > Not clear what you'd like to see dropped?

> > This will be necessary
> > if using the default vhost file, so can change to make that clearer.
> 
> Using a location that is likely to conflict with a kernel device is not
> a good idea.  So what you are promoting here is not a good
> configuration: people will follow your advice, then complain that kernel
> device stopped working for all VMs.  It will also likely conflict with
> distro's rules for the device, if any.
> 
> There are no advantages that I can see.

Reworked to explain this and put using an alternative device instructions 
in the main body of text.

> 
> 
> > >
> > > > +
> > > > +Following the steps above to create a bridge, you can now add DPDK
> vHost
> > > > +as a port to the vswitch.
> > > > +
> > > > +`ovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0
> type=dpdkvhost`
> > > > +
> > > > +Unlike DPDK ring ports, DPDK vHost ports can have arbitrary names:
> > > > +
> > > > +`ovs-vsctl add-port br0 port123ABC -- set Interface port123ABC
> type=dpdkvhost`
> > > > +
> > > > +However, please note that when attaching userspace devices to QEMU,
> the
> > > > +name provided during the add-port operation must match the ifname
> parameter
> > > > +on the QEMU command line.
> > > > +
> > > > +DPDK vHost VM configuration:
> > > > +----------------------------
> > > > +
> > > > +1. Configure virtio-net adaptors:
> > > > +   The guest must be configured with virtio-net adapters and offloads
> > > > +   MUST BE DISABLED.
> > >
> > > Any plans to address this?
> >
> > There's no plans at present
> 
> Why not?
> That's pretty bad I think, devices really should report their
> capabilities to userspace, not rely on users to configure them just so.
> 

There is now support in the DPDK vhost to report its features. I've tested
with and without offloads explicitly set in the QEMU cmd line and it looks
to be reporting its features correctly. 

> 
> > >
> > > > +    This means the following parameters should be passed
> > > > +   to the QEMU binary:
> > > > +
> > > > +     ```
> > > > +     -netdev
> tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on
> > > > +     -device virtio-net-pci,netdev=net1,mac=<mac>,csum=off,gso=off,
> > > > +     guest_tso4=off,guest_tso6=off,guest_ecn=off
> > > > +     ```
> > > > +
> > > > +     Repeat the above parameters for multiple devices.
> > > > +
> > > > +2. Configure huge pages:
> > > > +   QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access
> a
> > > > +   virtio-net device's virtual rings and packet buffers mapping the
> VM's
> > > > +   physical memory on hugetlbfs. To enable vhost-ports to map the VM's
> > > > +   memory into their process address space, pass the following
> paramters
> > > > +   to QEMU:
> > > > +
> > > > +     `-mem-path /dev/hugepages -mem-prealloc`
> > >
> > > I guess you also need to request MAP_SHARED mappings - otherwise
> > > I think you won't be able to poke at them.
> >
> > Ok, it depends on version of QEMU, so we can call that out. We've tested
> > with QEMU 1.6.2
> 
> Only that? Why?
> 1.6.2 was released in 2013, there has been several releases since then.

We've tested 2.1.0 and updated the docs for this.

> 
> > >
> > > > +
> > > > +DPDK vHost with standard vHost:
> > > > +-------------------------------
> > > > +
> > > > +DPDK vHost ports use a Linux* character device to communicate with
> QEMU.
> > > > +By default it is set to `/dev/vhost-net`. This conflicts with the
> kernel
> > > > +vHost device, hence the need to remove `/dev/vhost-net` above.
> However,
> > > > +if you wish to use kernel vhost in parallel, you can specify an
> > > > +alternative basename on the vswitchd command line like so:
> > > > +
> > > > +     `./vswitchd/ovs-vswitchd --dpdk --basename my-vhost-net -c 0x1
> ...`
> > > > +
> > > > +Note that the basename arguement and associated string must be the
> first
> > > > +arguements after `--dpdk` and come before the EAL arguements.
> > > > +
> > > > +DPDK vHost VM configuration with standard vHost:
> > > > +------------------------------------------------
> > > > +
> > > > +1. As with the "normal" (i.e. using `/dev/vhost-net`) DPDK vHost
> setup,
> > > > +the guest must be configured with virtio-net adapters and offloads
> > > > +MUST BE DISABLED. However, this time you must also pass in a `vhostfd`
> > > > +argument:
> > > > +
> > > > +     ```
> > > > +     -netdev
> tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on,
> > > > +     vhostfd=<open_fd>
> > > > +     -device virtio-net-pci,netdev=net1,mac=<mac>,csum=off,gso=off,
> > > > +     guest_tso4=off,guest_tso6=off,guest_ecn=off
> > > > +     ```
> > > > +
> > > > +     The open file descriptor must be passed to QEMU running as a
> child
> > > > +     process.
> > >
> > > You might as well tell people how to do this. E.g. with bash:
> > >
> > > vhostfd=42 42<>/path/to/vhost/chardev
> > >
> > > 42 is, of course, The Answer.
> >
> > True - I think we need to distinguish between default and specified
> > file better.
> 
> Just drop the "default" - it will cause conflicts and pain to users.

Hopefully it is clearer now. We have also documented a couple lines of 
python to show how to get the fd.

> 
> > >
> > >
> > > > +2. As above, QEMU must allocate the VM's memory on hugetlbfs:
> > > > +
> > > > +     `-mem-path /dev/hugepages -mem-prealloc`
> > > > +
> > > > +3. (Optional) If you are using libvirt, you must enable libvirt to
> access
> > > > +the userspace device file by adding it to controllers cgroup for
> libvirtd
> > > > +using the following steps:
> > > > +
> > > > +     1. In `/etc/libvirt/qemu.conf` add/edit the following lines:
> > > > +
> > > > +        ```
> > > > +        1) cgroup_controllers = [ ... "devices", ... ]
> > > > +        2) clear_emulator_capabilities = 0
> > > > +        3) user = "root"
> > > > +        4) group = "root"
> > > > +        5) cgroup_device_acl = [
> > > > +               "/dev/null", "/dev/full", "/dev/zero",
> > > > +               "/dev/random", "/dev/urandom",
> > > > +               "/dev/ptmx", "/dev/kvm", "/dev/kqemu",
> > > > +               "/dev/rtc", "/dev/hpet", "/dev/net/tun",
> > > > +               "/dev/<devbase-name>-<index>",
> > > > +               "/dev/hugepages"]
> > > > +        ```
> > > > +
> > > > +     2. Disable SELinux or set to permissive mode
> > >
> > >
> > > It's a work-around, but the right thing to do is really
> > > to write up correct selinux policies.
> > > Any plans to do this?
> >
> > No plans for this at present
> 
> That's pretty bad, so one has to give up some security to
> gain some other feature. How does one make a call?
> Why don't you want to fix it?

We haven't been able to get to do this now. I'm not clear yet 
if this will be needed for vhost-user?

> 
> > >
> > > > +     3. Mount cgroup device controller:
> > > > +
> > > > +        ```
> > > > +        mkdir /dev/cgroup
> > > > +        mount -t cgroup none /dev/cgroup -o devices
> > > > +        ```
> > > > +
> > > > +     4. Restart the libvirtd process
> > > > +        For example, on Fedora:
> > > > +
> > > > +          `systemctl restart libvirtd.service`
> > > > +
> > > > +The easiest way to setup a Guest that isn't using `/dev/vhost-net` is
> to
> > > > +use the `qemu-wrap.py` script located in utilities. This Python script
> > > > +automates the requirements specified above and can be used in
> conjunction
> > > > +with libvirt.
> > >
> > > I notice that new libvirt versions should have ability to specify
> everything
> > > directly in the conf, that would be preferable if available.
> > > Should be documented too?
> >
> > It's not something we've looked at, but will bring it up with the dpdk team
> 
> Please do, wrapper scripts simply can't be supported by libvirt.

The vhostfd could be put manually into libvirt or the wrapper script could be 
used.
We didn't see another way to get it into the XML?
 
> 
> > >
> > > > +
> > > > +DPDK vHost VM configuration with QEMU wrapper:
> > >
> > > ...
> > >
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to