On 1/11/2016 2:43 AM, Tan, Jianfeng wrote: > This patchset is to provide high performance networking interface (virtio) > for container-based DPDK applications. The way of starting DPDK apps in > containers with ownership of NIC devices exclusively is beyond the scope. > The basic idea here is to present a new virtual device (named eth_cvio), > which can be discovered and initialized in container-based DPDK apps using > rte_eal_init(). To minimize the change, we reuse already-existing virtio > frontend driver code (driver/net/virtio/). > > Compared to QEMU/VM case, virtio device framework (translates I/O port r/w > operations into unix socket/cuse protocol, which is originally provided in > QEMU), is integrated in virtio frontend driver. So this converged driver > actually plays the role of original frontend driver and the role of QEMU > device framework. > > The major difference lies in how to calculate relative address for vhost. > The principle of virtio is that: based on one or multiple shared memory > segments, vhost maintains a reference system with the base addresses and > length for each segment so that an address from VM comes (usually GPA, > Guest Physical Address) can be translated into vhost-recognizable address > (named VVA, Vhost Virtual Address). To decrease the overhead of address > translation, we should maintain as few segments as possible. In VM's case, > GPA is always locally continuous. In container's case, CVA (Container > Virtual Address) can be used. Specifically: > a. when set_base_addr, CVA address is used; > b. when preparing RX's descriptors, CVA address is used; > c. when transmitting packets, CVA is filled in TX's descriptors; > d. in TX and CQ's header, CVA is used. > > How to share memory? In VM's case, qemu always shares all physical layout > to backend. But it's not feasible for a container, as a process, to share > all virtual memory regions to backend. So only specified virtual memory > regions (with type of shared) are sent to backend. It's a limitation that > only addresses in these areas can be used to transmit or receive packets. > > Known issues > > a. When used with vhost-net, root privilege is required to create tap > device inside. > b. Control queue and multi-queue are not supported yet. > c. When --single-file option is used, socket_id of the memory may be > wrong. (Use "numactl -N x -m x" to work around this for now) > > How to use? > > a. Apply this patchset. > > b. To compile container apps: > $: make config RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc > $: make install RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc > $: make -C examples/l2fwd RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc > $: make -C examples/vhost RTE_SDK=`pwd` T=x86_64-native-linuxapp-gcc > > c. To build a docker image using Dockerfile below. > $: cat ./Dockerfile > FROM ubuntu:latest > WORKDIR /usr/src/dpdk > COPY . /usr/src/dpdk > ENV PATH "$PATH:/usr/src/dpdk/examples/l2fwd/build/" > $: docker build -t dpdk-app-l2fwd . > > d. Used with vhost-user > $: ./examples/vhost/build/vhost-switch -c 3 -n 4 \ > --socket-mem 1024,1024 -- -p 0x1 --stats 1 > $: docker run -i -t -v <path_to_vhost_unix_socket>:/var/run/usvhost \ > -v /dev/hugepages:/dev/hugepages \ > dpdk-app-l2fwd l2fwd -c 0x4 -n 4 -m 1024 --no-pci \ > --vdev=eth_cvio0,path=/var/run/usvhost -- -p 0x1 > > f. Used with vhost-net > $: modprobe vhost > $: modprobe vhost-net > $: docker run -i -t --privileged \ > -v /dev/vhost-net:/dev/vhost-net \ > -v /dev/net/tun:/dev/net/tun \ > -v /dev/hugepages:/dev/hugepages \ > dpdk-app-l2fwd l2fwd -c 0x4 -n 4 -m 1024 --no-pci \ > --vdev=eth_cvio0,path=/dev/vhost-net -- -p 0x1
We'd better add a ifname, like --vdev=eth_cvio0,path=/dev/vhost-net,ifname=tap0, so that user could add the tap to the bridge first. Thanks, Michael > > By the way, it's not necessary to run in a container. > > Signed-off-by: Huawei Xie <huawei.xie at intel.com> > Signed-off-by: Jianfeng Tan <jianfeng.tan at intel.com> > > Jianfeng Tan (4): > mem: add --single-file to create single mem-backed file > mem: add API to obstain memory-backed file info > virtio/vdev: add ways to interact with vhost > virtio/vdev: add a new vdev named eth_cvio > > config/common_linuxapp | 5 + > drivers/net/virtio/Makefile | 4 + > drivers/net/virtio/vhost.c | 734 > +++++++++++++++++++++++++++++ > drivers/net/virtio/vhost.h | 192 ++++++++ > drivers/net/virtio/virtio_ethdev.c | 338 ++++++++++--- > drivers/net/virtio/virtio_ethdev.h | 4 + > drivers/net/virtio/virtio_pci.h | 52 +- > drivers/net/virtio/virtio_rxtx.c | 11 +- > drivers/net/virtio/virtio_rxtx_simple.c | 14 +- > drivers/net/virtio/virtqueue.h | 13 +- > lib/librte_eal/common/eal_common_options.c | 17 + > lib/librte_eal/common/eal_internal_cfg.h | 1 + > lib/librte_eal/common/eal_options.h | 2 + > lib/librte_eal/common/include/rte_memory.h | 16 + > lib/librte_eal/linuxapp/eal/eal_memory.c | 82 +++- > 15 files changed, 1392 insertions(+), 93 deletions(-) > create mode 100644 drivers/net/virtio/vhost.c > create mode 100644 drivers/net/virtio/vhost.h >