> -----Original Message----- > From: Michael S. Tsirkin [mailto:m...@redhat.com] > Sent: Wednesday, January 21, 2015 11:19 AM > To: Traynor, Kevin > Cc: dev@openvswitch.org > Subject: Re: [ovs-dev] [PATCH RFC v6 1/1] netdev-dpdk: add dpdk vhost ports > > On Thu, Jan 08, 2015 at 11:05:02PM +0000, Kevin Traynor wrote: > > This patch adds support for a new port type to userspace datapath > > called dpdkvhost. This allows KVM (QEMU) to offload the servicing > > of virtio-net devices to its associated dpdkvhost port. Instructions > > for use are in INSTALL.DPDK. > > > > This has been tested on Intel multi-core platforms and with clients > > that have virtio-net interfaces. > > > > ver 6: > > - rebased with master > > - modified to use DPDK v1.8.0 vhost library > > - reworked for review comments > > ver 5: > > - rebased against latest master > > ver 4: > > - added eventfd_link.h and eventfd_link.c to EXTRA_DIST in > > utilities/automake.mk > > - rebased with master to work with DPDK 1.7 ver 3: > > - rebased with master > > ver 2: > > - rebased with master > > > > Signed-off-by: Ciara Loftus <ciara.lof...@intel.com> > > Signed-off-by: Kevin Traynor <kevin.tray...@intel.com> > > Signed-off-by: Maryam Tahhan <maryam.tah...@intel.com> > > --- > > INSTALL.DPDK.md | 236 +++++++++++++++++ > > Makefile.am | 4 + > > lib/automake.mk | 1 + > > lib/netdev-dpdk.c | 649 > > +++++++++++++++++++++++++++++++++++++++-------- > > lib/netdev.c | 3 +- > > utilities/automake.mk | 3 +- > > utilities/qemu-wrap.py | 389 ++++++++++++++++++++++++++++ > > vswitchd/ovs-vswitchd.c | 4 +- > > 8 files changed, 1177 insertions(+), 112 deletions(-) > > mode change 100644 => 100755 lib/netdev-dpdk.c > > create mode 100755 utilities/qemu-wrap.py > > > > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md > > index 2cc7636..da8116d 100644 > > --- a/INSTALL.DPDK.md > > +++ b/INSTALL.DPDK.md > > @@ -17,6 +17,7 @@ Building and Installing: > > ------------------------ > > > > Required DPDK 1.7 > > +Optional `fuse`, `fuse-devel` > > > > 1. Configure build & install DPDK: > > 1. Set `$DPDK_DIR` > > @@ -264,6 +265,241 @@ A general rule of thumb for better performance is > > that the client > > application should not be assigned the same dpdk core mask "-c" as > > the vswitchd. > > > > +DPDK vHost: > > +----------- > > + > > +Prerequisites: > > +1. DPDK 1.8 with vHost support enabled and recompile OVS as above. > > + > > + Update `config/common_linuxapp` so that DPDK is built with vHost > > + libraries: > > + > > + `CONFIG_RTE_LIBRTE_VHOST=y` > > + > > +2. Insert the Fuse module: > > + > > + `modprobe fuse` > > + > > +3. Build and insert the `eventfd_link` module: > > + > > + `cd $DPDK_DIR/lib/librte_vhost/eventfd_link/` > > + `make` > > + `insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko` > > + > > +4. Remove /dev/vhost-net character device: > > + > > + `rm -rf /dev/vhost-net` > > I think it's not a good idea to tell people to do this, > best to drop this section and put "with standard vhost" > here instead.
Not clear what you'd like to see dropped? This will be necessary if using the default vhost file, so can change to make that clearer. > > > + > > +Following the steps above to create a bridge, you can now add DPDK vHost > > +as a port to the vswitch. > > + > > +`ovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0 > > type=dpdkvhost` > > + > > +Unlike DPDK ring ports, DPDK vHost ports can have arbitrary names: > > + > > +`ovs-vsctl add-port br0 port123ABC -- set Interface port123ABC > > type=dpdkvhost` > > + > > +However, please note that when attaching userspace devices to QEMU, the > > +name provided during the add-port operation must match the ifname parameter > > +on the QEMU command line. > > + > > +DPDK vHost VM configuration: > > +---------------------------- > > + > > +1. Configure virtio-net adaptors: > > + The guest must be configured with virtio-net adapters and offloads > > + MUST BE DISABLED. > > Any plans to address this? There's no plans at present > > > + This means the following parameters should be passed > > + to the QEMU binary: > > + > > + ``` > > + -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on > > + -device virtio-net-pci,netdev=net1,mac=<mac>,csum=off,gso=off, > > + guest_tso4=off,guest_tso6=off,guest_ecn=off > > + ``` > > + > > + Repeat the above parameters for multiple devices. > > + > > +2. Configure huge pages: > > + QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a > > + virtio-net device's virtual rings and packet buffers mapping the VM's > > + physical memory on hugetlbfs. To enable vhost-ports to map the VM's > > + memory into their process address space, pass the following paramters > > + to QEMU: > > + > > + `-mem-path /dev/hugepages -mem-prealloc` > > I guess you also need to request MAP_SHARED mappings - otherwise > I think you won't be able to poke at them. Ok, it depends on version of QEMU, so we can call that out. We've tested with QEMU 1.6.2 > > > + > > +DPDK vHost with standard vHost: > > +------------------------------- > > + > > +DPDK vHost ports use a Linux* character device to communicate with QEMU. > > +By default it is set to `/dev/vhost-net`. This conflicts with the kernel > > +vHost device, hence the need to remove `/dev/vhost-net` above. However, > > +if you wish to use kernel vhost in parallel, you can specify an > > +alternative basename on the vswitchd command line like so: > > + > > + `./vswitchd/ovs-vswitchd --dpdk --basename my-vhost-net -c 0x1 ...` > > + > > +Note that the basename arguement and associated string must be the first > > +arguements after `--dpdk` and come before the EAL arguements. > > + > > +DPDK vHost VM configuration with standard vHost: > > +------------------------------------------------ > > + > > +1. As with the "normal" (i.e. using `/dev/vhost-net`) DPDK vHost setup, > > +the guest must be configured with virtio-net adapters and offloads > > +MUST BE DISABLED. However, this time you must also pass in a `vhostfd` > > +argument: > > + > > + ``` > > + -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on, > > + vhostfd=<open_fd> > > + -device virtio-net-pci,netdev=net1,mac=<mac>,csum=off,gso=off, > > + guest_tso4=off,guest_tso6=off,guest_ecn=off > > + ``` > > + > > + The open file descriptor must be passed to QEMU running as a child > > + process. > > You might as well tell people how to do this. E.g. with bash: > > vhostfd=42 42<>/path/to/vhost/chardev > > 42 is, of course, The Answer. True - I think we need to distinguish between default and specified file better. > > > > +2. As above, QEMU must allocate the VM's memory on hugetlbfs: > > + > > + `-mem-path /dev/hugepages -mem-prealloc` > > + > > +3. (Optional) If you are using libvirt, you must enable libvirt to access > > +the userspace device file by adding it to controllers cgroup for libvirtd > > +using the following steps: > > + > > + 1. In `/etc/libvirt/qemu.conf` add/edit the following lines: > > + > > + ``` > > + 1) cgroup_controllers = [ ... "devices", ... ] > > + 2) clear_emulator_capabilities = 0 > > + 3) user = "root" > > + 4) group = "root" > > + 5) cgroup_device_acl = [ > > + "/dev/null", "/dev/full", "/dev/zero", > > + "/dev/random", "/dev/urandom", > > + "/dev/ptmx", "/dev/kvm", "/dev/kqemu", > > + "/dev/rtc", "/dev/hpet", "/dev/net/tun", > > + "/dev/<devbase-name>-<index>", > > + "/dev/hugepages"] > > + ``` > > + > > + 2. Disable SELinux or set to permissive mode > > > It's a work-around, but the right thing to do is really > to write up correct selinux policies. > Any plans to do this? No plans for this at present > > > + 3. Mount cgroup device controller: > > + > > + ``` > > + mkdir /dev/cgroup > > + mount -t cgroup none /dev/cgroup -o devices > > + ``` > > + > > + 4. Restart the libvirtd process > > + For example, on Fedora: > > + > > + `systemctl restart libvirtd.service` > > + > > +The easiest way to setup a Guest that isn't using `/dev/vhost-net` is to > > +use the `qemu-wrap.py` script located in utilities. This Python script > > +automates the requirements specified above and can be used in conjunction > > +with libvirt. > > I notice that new libvirt versions should have ability to specify everything > directly in the conf, that would be preferable if available. > Should be documented too? It's not something we've looked at, but will bring it up with the dpdk team > > > + > > +DPDK vHost VM configuration with QEMU wrapper: > > ... > _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev