On Thu, Feb 12, 2015 at 12:59:17PM +0000, Traynor, Kevin wrote:
> > -----Original Message-----
> > From: Michael S. Tsirkin [mailto:m...@redhat.com]
> > Sent: Wednesday, January 21, 2015 11:19 AM
> > To: Traynor, Kevin
> > Cc: dev@openvswitch.org
> > Subject: Re: [ovs-dev] [PATCH RFC v6 1/1] netdev-dpdk: add dpdk vhost ports
> > 
> > On Thu, Jan 08, 2015 at 11:05:02PM +0000, Kevin Traynor wrote:
> > > This patch adds support for a new port type to userspace datapath
> > > called dpdkvhost. This allows KVM (QEMU) to offload the servicing
> > > of virtio-net devices to its associated dpdkvhost port. Instructions
> > > for use are in INSTALL.DPDK.
> > >
> > > This has been tested on Intel multi-core platforms and with clients
> > > that have virtio-net interfaces.
> > >
> > >  ver 6:
> > >    - rebased with master
> > >    - modified to use DPDK v1.8.0 vhost library
> > >    - reworked for review comments
> > >  ver 5:
> > >    - rebased against latest master
> > >  ver 4:
> > >    - added eventfd_link.h and eventfd_link.c to EXTRA_DIST in
> > >  utilities/automake.mk
> > >    - rebased with master to work with DPDK 1.7 ver 3:
> > >    - rebased with master
> > >  ver 2:
> > >    - rebased with master
> > >
> > > Signed-off-by: Ciara Loftus <ciara.lof...@intel.com>
> > > Signed-off-by: Kevin Traynor <kevin.tray...@intel.com>
> > > Signed-off-by: Maryam Tahhan <maryam.tah...@intel.com>
> > > ---
> > >  INSTALL.DPDK.md         |  236 +++++++++++++++++
> > >  Makefile.am             |    4 +
> > >  lib/automake.mk         |    1 +
> > >  lib/netdev-dpdk.c       |  649 
> > > +++++++++++++++++++++++++++++++++++++++--------
> > >  lib/netdev.c            |    3 +-
> > >  utilities/automake.mk   |    3 +-
> > >  utilities/qemu-wrap.py  |  389 ++++++++++++++++++++++++++++
> > >  vswitchd/ovs-vswitchd.c |    4 +-
> > >  8 files changed, 1177 insertions(+), 112 deletions(-)
> > >  mode change 100644 => 100755 lib/netdev-dpdk.c
> > >  create mode 100755 utilities/qemu-wrap.py
> > >
> > > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> > > index 2cc7636..da8116d 100644
> > > --- a/INSTALL.DPDK.md
> > > +++ b/INSTALL.DPDK.md
> > > @@ -17,6 +17,7 @@ Building and Installing:
> > >  ------------------------
> > >
> > >  Required DPDK 1.7
> > > +Optional `fuse`, `fuse-devel`
> > >
> > >  1. Configure build & install DPDK:
> > >    1. Set `$DPDK_DIR`
> > > @@ -264,6 +265,241 @@ A general rule of thumb for better performance is 
> > > that the client
> > >  application should not be assigned the same dpdk core mask "-c" as
> > >  the vswitchd.
> > >
> > > +DPDK vHost:
> > > +-----------
> > > +
> > > +Prerequisites:
> > > +1.  DPDK 1.8 with vHost support enabled and recompile OVS as above.
> > > +
> > > +     Update `config/common_linuxapp` so that DPDK is built with vHost
> > > +     libraries:
> > > +
> > > +     `CONFIG_RTE_LIBRTE_VHOST=y`
> > > +
> > > +2.  Insert the Fuse module:
> > > +
> > > +      `modprobe fuse`
> > > +
> > > +3.  Build and insert the `eventfd_link` module:
> > > +
> > > +     `cd $DPDK_DIR/lib/librte_vhost/eventfd_link/`
> > > +     `make`
> > > +     `insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko`
> > > +
> > > +4.  Remove /dev/vhost-net character device:
> > > +
> > > +      `rm -rf /dev/vhost-net`
> > 
> > I think it's not a good idea to tell people to do this,
> > best to drop this section and put "with standard vhost"
> > here instead.
> 
> Not clear what you'd like to see dropped?
> This will be necessary 
> if using the default vhost file, so can change to make that clearer.

Using a location that is likely to conflict with a kernel device is not
a good idea.  So what you are promoting here is not a good
configuration: people will follow your advice, then complain that kernel
device stopped working for all VMs.  It will also likely conflict with
distro's rules for the device, if any.

There are no advantages that I can see.


> > 
> > > +
> > > +Following the steps above to create a bridge, you can now add DPDK vHost
> > > +as a port to the vswitch.
> > > +
> > > +`ovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0 
> > > type=dpdkvhost`
> > > +
> > > +Unlike DPDK ring ports, DPDK vHost ports can have arbitrary names:
> > > +
> > > +`ovs-vsctl add-port br0 port123ABC -- set Interface port123ABC 
> > > type=dpdkvhost`
> > > +
> > > +However, please note that when attaching userspace devices to QEMU, the
> > > +name provided during the add-port operation must match the ifname 
> > > parameter
> > > +on the QEMU command line.
> > > +
> > > +DPDK vHost VM configuration:
> > > +----------------------------
> > > +
> > > +1. Configure virtio-net adaptors:
> > > +   The guest must be configured with virtio-net adapters and offloads
> > > +   MUST BE DISABLED.
> > 
> > Any plans to address this?
> 
> There's no plans at present

Why not?
That's pretty bad I think, devices really should report their
capabilities to userspace, not rely on users to configure them just so.


> > 
> > > +    This means the following parameters should be passed
> > > +   to the QEMU binary:
> > > +
> > > +     ```
> > > +     -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on
> > > +     -device virtio-net-pci,netdev=net1,mac=<mac>,csum=off,gso=off,
> > > +     guest_tso4=off,guest_tso6=off,guest_ecn=off
> > > +     ```
> > > +
> > > +     Repeat the above parameters for multiple devices.
> > > +
> > > +2. Configure huge pages:
> > > +   QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a
> > > +   virtio-net device's virtual rings and packet buffers mapping the VM's
> > > +   physical memory on hugetlbfs. To enable vhost-ports to map the VM's
> > > +   memory into their process address space, pass the following paramters
> > > +   to QEMU:
> > > +
> > > +     `-mem-path /dev/hugepages -mem-prealloc`
> > 
> > I guess you also need to request MAP_SHARED mappings - otherwise
> > I think you won't be able to poke at them.
> 
> Ok, it depends on version of QEMU, so we can call that out. We've tested 
> with QEMU 1.6.2 

Only that? Why?
1.6.2 was released in 2013, there has been several releases since then.

> > 
> > > +
> > > +DPDK vHost with standard vHost:
> > > +-------------------------------
> > > +
> > > +DPDK vHost ports use a Linux* character device to communicate with QEMU.
> > > +By default it is set to `/dev/vhost-net`. This conflicts with the kernel
> > > +vHost device, hence the need to remove `/dev/vhost-net` above. However,
> > > +if you wish to use kernel vhost in parallel, you can specify an
> > > +alternative basename on the vswitchd command line like so:
> > > +
> > > +     `./vswitchd/ovs-vswitchd --dpdk --basename my-vhost-net -c 0x1 ...`
> > > +
> > > +Note that the basename arguement and associated string must be the first
> > > +arguements after `--dpdk` and come before the EAL arguements.
> > > +
> > > +DPDK vHost VM configuration with standard vHost:
> > > +------------------------------------------------
> > > +
> > > +1. As with the "normal" (i.e. using `/dev/vhost-net`) DPDK vHost setup,
> > > +the guest must be configured with virtio-net adapters and offloads
> > > +MUST BE DISABLED. However, this time you must also pass in a `vhostfd`
> > > +argument:
> > > +
> > > +     ```
> > > +     -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on,
> > > +     vhostfd=<open_fd>
> > > +     -device virtio-net-pci,netdev=net1,mac=<mac>,csum=off,gso=off,
> > > +     guest_tso4=off,guest_tso6=off,guest_ecn=off
> > > +     ```
> > > +
> > > +     The open file descriptor must be passed to QEMU running as a child
> > > +     process.
> > 
> > You might as well tell people how to do this. E.g. with bash:
> > 
> > vhostfd=42 42<>/path/to/vhost/chardev
> > 
> > 42 is, of course, The Answer.
> 
> True - I think we need to distinguish between default and specified 
> file better.

Just drop the "default" - it will cause conflicts and pain to users.

> > 
> > 
> > > +2. As above, QEMU must allocate the VM's memory on hugetlbfs:
> > > +
> > > +     `-mem-path /dev/hugepages -mem-prealloc`
> > > +
> > > +3. (Optional) If you are using libvirt, you must enable libvirt to access
> > > +the userspace device file by adding it to controllers cgroup for libvirtd
> > > +using the following steps:
> > > +
> > > +     1. In `/etc/libvirt/qemu.conf` add/edit the following lines:
> > > +
> > > +        ```
> > > +        1) cgroup_controllers = [ ... "devices", ... ]
> > > +        2) clear_emulator_capabilities = 0
> > > +        3) user = "root"
> > > +        4) group = "root"
> > > +        5) cgroup_device_acl = [
> > > +               "/dev/null", "/dev/full", "/dev/zero",
> > > +               "/dev/random", "/dev/urandom",
> > > +               "/dev/ptmx", "/dev/kvm", "/dev/kqemu",
> > > +               "/dev/rtc", "/dev/hpet", "/dev/net/tun",
> > > +               "/dev/<devbase-name>-<index>",
> > > +               "/dev/hugepages"]
> > > +        ```
> > > +
> > > +     2. Disable SELinux or set to permissive mode
> > 
> > 
> > It's a work-around, but the right thing to do is really
> > to write up correct selinux policies.
> > Any plans to do this?
> 
> No plans for this at present

That's pretty bad, so one has to give up some security to
gain some other feature. How does one make a call?
Why don't you want to fix it?

> > 
> > > +     3. Mount cgroup device controller:
> > > +
> > > +        ```
> > > +        mkdir /dev/cgroup
> > > +        mount -t cgroup none /dev/cgroup -o devices
> > > +        ```
> > > +
> > > +     4. Restart the libvirtd process
> > > +        For example, on Fedora:
> > > +
> > > +          `systemctl restart libvirtd.service`
> > > +
> > > +The easiest way to setup a Guest that isn't using `/dev/vhost-net` is to
> > > +use the `qemu-wrap.py` script located in utilities. This Python script
> > > +automates the requirements specified above and can be used in conjunction
> > > +with libvirt.
> > 
> > I notice that new libvirt versions should have ability to specify everything
> > directly in the conf, that would be preferable if available.
> > Should be documented too?
> 
> It's not something we've looked at, but will bring it up with the dpdk team

Please do, wrapper scripts simply can't be supported by libvirt.

> > 
> > > +
> > > +DPDK vHost VM configuration with QEMU wrapper:
> > 
> > ...
> > 
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to