On Thu, Jan 08, 2015 at 11:05:02PM +0000, Kevin Traynor wrote: > This patch adds support for a new port type to userspace datapath > called dpdkvhost. This allows KVM (QEMU) to offload the servicing > of virtio-net devices to its associated dpdkvhost port. Instructions > for use are in INSTALL.DPDK. > > This has been tested on Intel multi-core platforms and with clients > that have virtio-net interfaces. > > ver 6: > - rebased with master > - modified to use DPDK v1.8.0 vhost library > - reworked for review comments > ver 5: > - rebased against latest master > ver 4: > - added eventfd_link.h and eventfd_link.c to EXTRA_DIST in > utilities/automake.mk > - rebased with master to work with DPDK 1.7 ver 3: > - rebased with master > ver 2: > - rebased with master > > Signed-off-by: Ciara Loftus <ciara.lof...@intel.com> > Signed-off-by: Kevin Traynor <kevin.tray...@intel.com> > Signed-off-by: Maryam Tahhan <maryam.tah...@intel.com> > --- > INSTALL.DPDK.md | 236 +++++++++++++++++ > Makefile.am | 4 + > lib/automake.mk | 1 + > lib/netdev-dpdk.c | 649 > +++++++++++++++++++++++++++++++++++++++-------- > lib/netdev.c | 3 +- > utilities/automake.mk | 3 +- > utilities/qemu-wrap.py | 389 ++++++++++++++++++++++++++++ > vswitchd/ovs-vswitchd.c | 4 +- > 8 files changed, 1177 insertions(+), 112 deletions(-) > mode change 100644 => 100755 lib/netdev-dpdk.c > create mode 100755 utilities/qemu-wrap.py > > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md > index 2cc7636..da8116d 100644 > --- a/INSTALL.DPDK.md > +++ b/INSTALL.DPDK.md > @@ -17,6 +17,7 @@ Building and Installing: > ------------------------ > > Required DPDK 1.7 > +Optional `fuse`, `fuse-devel` > > 1. Configure build & install DPDK: > 1. Set `$DPDK_DIR` > @@ -264,6 +265,241 @@ A general rule of thumb for better performance is that > the client > application should not be assigned the same dpdk core mask "-c" as > the vswitchd. > > +DPDK vHost: > +----------- > + > +Prerequisites: > +1. DPDK 1.8 with vHost support enabled and recompile OVS as above. > + > + Update `config/common_linuxapp` so that DPDK is built with vHost > + libraries: > + > + `CONFIG_RTE_LIBRTE_VHOST=y` > + > +2. Insert the Fuse module: > + > + `modprobe fuse` > + > +3. Build and insert the `eventfd_link` module: > + > + `cd $DPDK_DIR/lib/librte_vhost/eventfd_link/` > + `make` > + `insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko` > + > +4. Remove /dev/vhost-net character device: > + > + `rm -rf /dev/vhost-net`
I think it's not a good idea to tell people to do this, best to drop this section and put "with standard vhost" here instead. > + > +Following the steps above to create a bridge, you can now add DPDK vHost > +as a port to the vswitch. > + > +`ovs-vsctl add-port br0 dpdkvhost0 -- set Interface dpdkvhost0 > type=dpdkvhost` > + > +Unlike DPDK ring ports, DPDK vHost ports can have arbitrary names: > + > +`ovs-vsctl add-port br0 port123ABC -- set Interface port123ABC > type=dpdkvhost` > + > +However, please note that when attaching userspace devices to QEMU, the > +name provided during the add-port operation must match the ifname parameter > +on the QEMU command line. > + > +DPDK vHost VM configuration: > +---------------------------- > + > +1. Configure virtio-net adaptors: > + The guest must be configured with virtio-net adapters and offloads > + MUST BE DISABLED. Any plans to address this? > + This means the following parameters should be passed > + to the QEMU binary: > + > + ``` > + -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on > + -device virtio-net-pci,netdev=net1,mac=<mac>,csum=off,gso=off, > + guest_tso4=off,guest_tso6=off,guest_ecn=off > + ``` > + > + Repeat the above parameters for multiple devices. > + > +2. Configure huge pages: > + QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a > + virtio-net device's virtual rings and packet buffers mapping the VM's > + physical memory on hugetlbfs. To enable vhost-ports to map the VM's > + memory into their process address space, pass the following paramters > + to QEMU: > + > + `-mem-path /dev/hugepages -mem-prealloc` I guess you also need to request MAP_SHARED mappings - otherwise I think you won't be able to poke at them. > + > +DPDK vHost with standard vHost: > +------------------------------- > + > +DPDK vHost ports use a Linux* character device to communicate with QEMU. > +By default it is set to `/dev/vhost-net`. This conflicts with the kernel > +vHost device, hence the need to remove `/dev/vhost-net` above. However, > +if you wish to use kernel vhost in parallel, you can specify an > +alternative basename on the vswitchd command line like so: > + > + `./vswitchd/ovs-vswitchd --dpdk --basename my-vhost-net -c 0x1 ...` > + > +Note that the basename arguement and associated string must be the first > +arguements after `--dpdk` and come before the EAL arguements. > + > +DPDK vHost VM configuration with standard vHost: > +------------------------------------------------ > + > +1. As with the "normal" (i.e. using `/dev/vhost-net`) DPDK vHost setup, > +the guest must be configured with virtio-net adapters and offloads > +MUST BE DISABLED. However, this time you must also pass in a `vhostfd` > +argument: > + > + ``` > + -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on, > + vhostfd=<open_fd> > + -device virtio-net-pci,netdev=net1,mac=<mac>,csum=off,gso=off, > + guest_tso4=off,guest_tso6=off,guest_ecn=off > + ``` > + > + The open file descriptor must be passed to QEMU running as a child > + process. You might as well tell people how to do this. E.g. with bash: vhostfd=42 42<>/path/to/vhost/chardev 42 is, of course, The Answer. > +2. As above, QEMU must allocate the VM's memory on hugetlbfs: > + > + `-mem-path /dev/hugepages -mem-prealloc` > + > +3. (Optional) If you are using libvirt, you must enable libvirt to access > +the userspace device file by adding it to controllers cgroup for libvirtd > +using the following steps: > + > + 1. In `/etc/libvirt/qemu.conf` add/edit the following lines: > + > + ``` > + 1) cgroup_controllers = [ ... "devices", ... ] > + 2) clear_emulator_capabilities = 0 > + 3) user = "root" > + 4) group = "root" > + 5) cgroup_device_acl = [ > + "/dev/null", "/dev/full", "/dev/zero", > + "/dev/random", "/dev/urandom", > + "/dev/ptmx", "/dev/kvm", "/dev/kqemu", > + "/dev/rtc", "/dev/hpet", "/dev/net/tun", > + "/dev/<devbase-name>-<index>", > + "/dev/hugepages"] > + ``` > + > + 2. Disable SELinux or set to permissive mode It's a work-around, but the right thing to do is really to write up correct selinux policies. Any plans to do this? > + 3. Mount cgroup device controller: > + > + ``` > + mkdir /dev/cgroup > + mount -t cgroup none /dev/cgroup -o devices > + ``` > + > + 4. Restart the libvirtd process > + For example, on Fedora: > + > + `systemctl restart libvirtd.service` > + > +The easiest way to setup a Guest that isn't using `/dev/vhost-net` is to > +use the `qemu-wrap.py` script located in utilities. This Python script > +automates the requirements specified above and can be used in conjunction > +with libvirt. I notice that new libvirt versions should have ability to specify everything directly in the conf, that would be preferable if available. Should be documented too? > + > +DPDK vHost VM configuration with QEMU wrapper: ... _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev