Hi Bhanuprakash, Comments inline.
On Tue, Jun 7, 2016 at 6:45 PM, Bhanuprakash Bodireddy < bhanuprakash.bodire...@intel.com> wrote: > Refactor the INSTALL.DPDK in to two documents named INSTALL.DPDK and > INSTALL.DPDK-ADVANCED. While INSTALL.DPDK document shall facilitate the > novice user in setting up the OVS DPDK and running it out of box, the > ADVANCED document is targeted at expert users looking for the optimum > performance running dpdk datapath. > > This commit updates INSTALL.DPDK.md document. > > Signed-off-by: Bhanuprakash Bodireddy <bhanuprakash.bodire...@intel.com> > --- > INSTALL.DPDK.md | 1295 > ++++++++++++++++++------------------------------------- > 1 file changed, 420 insertions(+), 875 deletions(-) > > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md > index c2e32bf..027fb36 100644 > --- a/INSTALL.DPDK.md > +++ b/INSTALL.DPDK.md > @@ -1,1024 +1,569 @@ > -Using Open vSwitch with DPDK > -============================ > +OVS DPDK INSTALL GUIDE > +================================ > > -Open vSwitch can use Intel(R) DPDK lib to operate entirely in > -userspace. This file explains how to install and use Open vSwitch in > -such a mode. > +## Contents > > -The DPDK support of Open vSwitch is considered experimental. > -It has not been thoroughly tested. > +1. [Overview](#overview) > +2. [Building and Installation](#build) > +3. [Setup OVS DPDK datapath](#ovssetup) > +4. [DPDK in the VM](#builddpdk) > +5. [OVS Testcases](#ovstc) > +6. [Limitations ](#ovslimits) > > -This version of Open vSwitch should be built manually with `configure` > -and `make`. > +## <a name="overview"></a> 1. Overview > > -OVS needs a system with 1GB hugepages support. > +Open vSwitch can use DPDK lib to operate entirely in userspace. > +This file provides information on installation and use of Open vSwitch > +using DPDK datapath. This version of Open vSwitch should be built > manually > +with `configure` and `make`. > > -Building and Installing: > ------------------------- > +The DPDK support of Open vSwitch is considered 'experimental'. > > -Required: DPDK 16.04 > -Optional (if building with vhost-cuse): `fuse`, `fuse-devel` > (`libfuse-dev` > -on Debian/Ubuntu) > +### Prerequisites > > -1. Configure build & install DPDK: > - 1. Set `$DPDK_DIR` > +* Required: DPDK 16.04 > +* Hardware: [DPDK Supported NICs] when physical ports in use > > - ``` > - export DPDK_DIR=/usr/src/dpdk-16.04 > - cd $DPDK_DIR > - ``` > - > - 2. Then run `make install` to build and install the library. > - For default install without IVSHMEM: > - > - `make install T=x86_64-native-linuxapp-gcc DESTDIR=install` > - > - To include IVSHMEM (shared memory): > - > - `make install T=x86_64-ivshmem-linuxapp-gcc DESTDIR=install` > - > - For further details refer to http://dpdk.org/ > - > -2. Configure & build the Linux kernel: > - > - Refer to intel-dpdk-getting-started-guide.pdf for understanding > - DPDK kernel requirement. > - > -3. Configure & build OVS: > - > - * Non IVSHMEM: > - > - `export DPDK_BUILD=$DPDK_DIR/x86_64-native-linuxapp-gcc/` > - > - * IVSHMEM: > - > - `export DPDK_BUILD=$DPDK_DIR/x86_64-ivshmem-linuxapp-gcc/` > - > - ``` > - cd $(OVS_DIR)/ > - ./boot.sh > - ./configure --with-dpdk=$DPDK_BUILD [CFLAGS="-g -O2 -Wno-cast-align"] > - make > - ``` > - > - Note: 'clang' users may specify the '-Wno-cast-align' flag to suppress > DPDK cast-align warnings. > - > -To have better performance one can enable aggressive compiler > optimizations and > -use the special instructions(popcnt, crc32) that may not be available on > all > -machines. Instead of typing `make`, type: > - > -`make CFLAGS='-O3 -march=native'` > - > -Refer to [INSTALL.userspace.md] for general requirements of building > userspace OVS. > - > -Using the DPDK with ovs-vswitchd: > ---------------------------------- > - > -1. Setup system boot > - Add the following options to the kernel bootline: > - > - `default_hugepagesz=1GB hugepagesz=1G hugepages=1` > - > -2. Setup DPDK devices: > - > - DPDK devices can be setup using either the VFIO (for DPDK 1.7+) or UIO > - modules. UIO requires inserting an out of tree driver igb_uio.ko that > is > - available in DPDK. Setup for both methods are described below. > - > - * UIO: > - 1. insert uio.ko: `modprobe uio` > - 2. insert igb_uio.ko: `insmod $DPDK_BUILD/kmod/igb_uio.ko` > - 3. Bind network device to igb_uio: > - `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio eth1` > - > - * VFIO: > - > - VFIO needs to be supported in the kernel and the BIOS. More > information > - can be found in the [DPDK Linux GSG]. > - > - 1. Insert vfio-pci.ko: `modprobe vfio-pci` > - 2. Set correct permissions on vfio device: `sudo /usr/bin/chmod a+x > /dev/vfio` > - and: `sudo /usr/bin/chmod 0666 /dev/vfio/*` > - 3. Bind network device to vfio-pci: > - `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=vfio-pci eth1` > - > -3. Mount the hugetable filesystem > - > - `mount -t hugetlbfs -o pagesize=1G none /dev/hugepages` > - > - Ref to http://www.dpdk.org/doc/quick-start for verifying DPDK setup. > - > -4. Follow the instructions in [INSTALL.md] to install only the > - userspace daemons and utilities (via 'make install'). > - 1. First time only db creation (or clearing): > - > - ``` > - mkdir -p /usr/local/etc/openvswitch > - mkdir -p /usr/local/var/run/openvswitch > - rm /usr/local/etc/openvswitch/conf.db > - ovsdb-tool create /usr/local/etc/openvswitch/conf.db \ > - /usr/local/share/openvswitch/vswitch.ovsschema > - ``` > - > - 2. Start ovsdb-server > - > - ``` > - ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \ > - --remote=db:Open_vSwitch,Open_vSwitch,manager_options \ > - --private-key=db:Open_vSwitch,SSL,private_key \ > - --certificate=Open_vSwitch,SSL,certificate \ > - --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile > --detach > - ``` > - > - 3. First time after db creation, initialize: > - > - ``` > - ovs-vsctl --no-wait init > - ``` > - > -5. Start vswitchd: > - > - DPDK configuration arguments can be passed to vswitchd via Open_vSwitch > - other_config column. The recognized configuration options are listed. > - Defaults will be provided for all values not explicitly set. > - > - * dpdk-init > - Specifies whether OVS should initialize and support DPDK ports. This is > - a boolean, and defaults to false. > - > - * dpdk-lcore-mask > - Specifies the CPU cores on which dpdk lcore threads should be spawned. > - The DPDK lcore threads are used for DPDK library tasks, such as > - library internal message processing, logging, etc. Value should be in > - the form of a hex string (so '0x123') similar to the 'taskset' mask > - input. > - If not specified, the value will be determined by choosing the lowest > - CPU core from initial cpu affinity list. Otherwise, the value will be > - passed directly to the DPDK library. > - For performance reasons, it is best to set this to a single core on > - the system, rather than allow lcore threads to float. > - > - * dpdk-alloc-mem > - This sets the total memory to preallocate from hugepages regardless of > - processor socket. It is recommended to use dpdk-socket-mem instead. > - > - * dpdk-socket-mem > - Comma separated list of memory to pre-allocate from hugepages on > specific > - sockets. > - > - * dpdk-hugepage-dir > - Directory where hugetlbfs is mounted > - > - * dpdk-extra > - Extra arguments to provide to DPDK EAL, as previously specified on the > - command line. Do not pass '--no-huge' to the system in this way. > Support > - for running the system without hugepages is nonexistent. > - > - * cuse-dev-name > - Option to set the vhost_cuse character device name. > - > - * vhost-sock-dir > - Option to set the path to the vhost_user unix socket files. > - > - NOTE: Changing any of these options requires restarting the > ovs-vswitchd > - application. > - > - Open vSwitch can be started as normal. DPDK will be initialized as long > - as the dpdk-init option has been set to 'true'. > - > - > - ``` > - export DB_SOCK=/usr/local/var/run/openvswitch/db.sock > - ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true > - ovs-vswitchd unix:$DB_SOCK --pidfile --detach > - ``` > - > - If allocated more than one GB hugepage (as for IVSHMEM), set amount and > - use NUMA node 0 memory: > - > - ``` > - ovs-vsctl --no-wait set Open_vSwitch . > other_config:dpdk-socket-mem="1024,0" > - ovs-vswitchd unix:$DB_SOCK --pidfile --detach > - ``` > - > -6. Add bridge & ports > - > - To use ovs-vswitchd with DPDK, create a bridge with datapath_type > - "netdev" in the configuration database. For example: > - > - `ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev` > - > - Now you can add dpdk devices. OVS expects DPDK device names to start > with > - "dpdk" and end with a portid. vswitchd should print (in the log file) > the > - number of dpdk devices found. > - > - ``` > - ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk > - ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk > - ``` > - > - Once first DPDK port is added to vswitchd, it creates a Polling thread > and > - polls dpdk device in continuous loop. Therefore CPU utilization > - for that thread is always 100%. > - > - Note: creating bonds of DPDK interfaces is slightly different to > creating > - bonds of system interfaces. For DPDK, the interface type must be > explicitly > - set, for example: > - > - ``` > - ovs-vsctl add-bond br0 dpdkbond dpdk0 dpdk1 -- set Interface dpdk0 > type=dpdk -- set Interface dpdk1 type=dpdk > - ``` > - > -7. Add test flows > - > - Test flow script across NICs (assuming ovs in /usr/src/ovs): > - Execute script: > - > - ``` > - #! /bin/sh > - # Move to command directory > - cd /usr/src/ovs/utilities/ > - > - # Clear current flows > - ./ovs-ofctl del-flows br0 > - > - # Add flows between port 1 (dpdk0) to port 2 (dpdk1) > - ./ovs-ofctl add-flow br0 in_port=1,action=output:2 > - ./ovs-ofctl add-flow br0 in_port=2,action=output:1 > - ``` > - > -8. QoS usage example > - > - Assuming you have a vhost-user port transmitting traffic consisting of > - packets of size 64 bytes, the following command would limit the egress > - transmission rate of the port to ~1,000,000 packets per second: > - > - `ovs-vsctl set port vhost-user0 qos=@newqos -- --id=@newqos create qos > - type=egress-policer other-config:cir=46000000 other-config:cbs=2048` > - > - To examine the QoS configuration of the port: > - > - `ovs-appctl -t ovs-vswitchd qos/show vhost-user0` > - > - To clear the QoS configuration from the port and ovsdb use the > following: > - > - `ovs-vsctl destroy QoS vhost-user0 -- clear Port vhost-user0 qos` > - > - For more details regarding egress-policer parameters please refer to > the > - vswitch.xml. > - > -9. Ingress Policing Example > - > - Assuming you have a vhost-user port receiving traffic consisting of > - packets of size 64 bytes, the following command would limit the > reception > - rate of the port to ~1,000,000 packets per second: > - > - `ovs-vsctl set interface vhost-user0 ingress_policing_rate=368000 > - ingress_policing_burst=1000` > - > - To examine the ingress policer configuration of the port: > - > - `ovs-vsctl list interface vhost-user0` > - > - To clear the ingress policer configuration from the port use the > following: > - > - `ovs-vsctl set interface vhost-user0 ingress_policing_rate=0` > - > - For more details regarding ingress-policer see the vswitch.xml. > - > -Performance Tuning: > -------------------- > - > -1. PMD affinitization > - > - A poll mode driver (pmd) thread handles the I/O of all DPDK > - interfaces assigned to it. A pmd thread will busy loop through > - the assigned port/rxq's polling for packets, switch the packets > - and send to a tx port if required. Typically, it is found that > - a pmd thread is CPU bound, meaning that the greater the CPU > - occupancy the pmd thread can get, the better the performance. To > - that end, it is good practice to ensure that a pmd thread has as > - many cycles on a core available to it as possible. This can be > - achieved by affinitizing the pmd thread with a core that has no > - other workload. See section 7 below for a description of how to > - isolate cores for this purpose also. > - > - The following command can be used to specify the affinity of the > - pmd thread(s). > - > - `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=<hex string>` > +## <a name="build"></a> 2. Building and Installation > > - By setting a bit in the mask, a pmd thread is created and pinned > - to the corresponding CPU core. e.g. to run a pmd thread on core 1 > +### 2.1 Configure & build the Linux kernel > > - `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=2` > +On Linux Distros running kernel version >= 3.0, kernel rebuild is not > required > +and only grub cmdline needs to be updated for enabling IOMMU [VFIO > support - 3.2]. > +For older kernels, check if kernel is built with UIO, HUGETLBFS, > PROC_PAGE_MONITOR, > +HPET, HPET_MMAP support. > > - For more information, please refer to the Open_vSwitch TABLE section in > +Detailed system requirements can be found at [DPDK requirements] and also > refer to > +advanced install guide [INSTALL.DPDK-ADVANCED.md] > > - `man ovs-vswitchd.conf.db` > +### 2.2 Install DPDK > + 1. [Download DPDK] and extract the file, for example in to /usr/src > + and set DPDK_DIR > > - Note, that a pmd thread on a NUMA node is only created if there is > - at least one DPDK interface from that NUMA node added to OVS. > - > -2. Multiple poll mode driver threads > - > - With pmd multi-threading support, OVS creates one pmd thread > - for each NUMA node by default. However, it can be seen that in cases > - where there are multiple ports/rxq's producing traffic, performance > - can be improved by creating multiple pmd threads running on separate > - cores. These pmd threads can then share the workload by each being > - responsible for different ports/rxq's. Assignment of ports/rxq's to > - pmd threads is done automatically. > - > - The following command can be used to specify the affinity of the > - pmd threads. > - > - `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=<hex string>` > - > - A set bit in the mask means a pmd thread is created and pinned > - to the corresponding CPU core. e.g. to run pmd threads on core 1 and 2 > - > - `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6` > - > - For more information, please refer to the Open_vSwitch TABLE section in > - > - `man ovs-vswitchd.conf.db` > - > - For example, when using dpdk and dpdkvhostuser ports in a > bi-directional > - VM loopback as shown below, spreading the workload over 2 or 4 pmd > - threads shows significant improvements as there will be more total CPU > - occupancy available. > - > - NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1 > - > - The following command can be used to confirm that the port/rxq > assignment > - to pmd threads is as required: > - > - `ovs-appctl dpif-netdev/pmd-rxq-show` > - > - This can also be checked with: > - > - ``` > - top -H > - taskset -p <pid_of_pmd> > - ``` > - > - To understand where most of the pmd thread time is spent and whether > the > - caches are being utilized, these commands can be used: > - > - ``` > - # Clear previous stats > - ovs-appctl dpif-netdev/pmd-stats-clear > - > - # Check current stats > - ovs-appctl dpif-netdev/pmd-stats-show > - ``` > - > -3. DPDK port Rx Queues > - > - `ovs-vsctl set Interface <DPDK interface> options:n_rxq=<integer>` > - > - The command above sets the number of rx queues for DPDK interface. > - The rx queues are assigned to pmd threads on the same NUMA node in a > - round-robin fashion. For more information, please refer to the > - Open_vSwitch TABLE section in > - > - `man ovs-vswitchd.conf.db` > - > -4. Exact Match Cache > - > - Each pmd thread contains one EMC. After initial flow setup in the > - datapath, the EMC contains a single table and provides the lowest level > - (fastest) switching for DPDK ports. If there is a miss in the EMC then > - the next level where switching will occur is the datapath classifier. > - Missing in the EMC and looking up in the datapath classifier incurs a > - significant performance penalty. If lookup misses occur in the EMC > - because it is too small to handle the number of flows, its size can > - be increased. The EMC size can be modified by editing the define > - EM_FLOW_HASH_SHIFT in lib/dpif-netdev.c. > - > - As mentioned above an EMC is per pmd thread. So an alternative way of > - increasing the aggregate amount of possible flow entries in EMC and > - avoiding datapath classifier lookups is to have multiple pmd threads > - running. This can be done as described in section 2. > - > -5. Compiler options > - > - The default compiler optimization level is '-O2'. Changing this to > - more aggressive compiler optimizations such as '-O3' or > - '-Ofast -march=native' with gcc can produce performance gains. > - > -6. Simultaneous Multithreading (SMT) > - > - With SMT enabled, one physical core appears as two logical cores > - which can improve performance. > - > - SMT can be utilized to add additional pmd threads without consuming > - additional physical cores. Additional pmd threads may be added in the > - same manner as described in section 2. If trying to minimize the use > - of physical cores for pmd threads, care must be taken to set the > - correct bits in the pmd-cpu-mask to ensure that the pmd threads are > - pinned to SMT siblings. > - > - For example, when using 2x 10 core processors in a dual socket system > - with HT enabled, /proc/cpuinfo will report 40 logical cores. To use > - two logical cores which share the same physical core for pmd threads, > - the following command can be used to identify a pair of logical cores. > - > - `cat /sys/devices/system/cpu/cpuN/topology/thread_siblings_list` > - > - where N is the logical core number. In this example, it would show that > - cores 1 and 21 share the same physical core. The pmd-cpu-mask to enable > - two pmd threads running on these two logical cores (one physical core) > - is. > - > - `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=100002` > - > - Note that SMT is enabled by the Hyper-Threading section in the > - BIOS, and as such will apply to the whole system. So the impact of > - enabling/disabling it for the whole system should be considered > - e.g. If workloads on the system can scale across multiple cores, > - SMT may very beneficial. However, if they do not and perform best > - on a single physical core, SMT may not be beneficial. > + ``` > + cd /usr/src/ > + wget http://dpdk.org/browse/dpdk/snapshot/dpdk-16.04.zip > + unzip dpdk-16.04.zip > > -7. The isolcpus kernel boot parameter > + export DPDK_DIR=/usr/src/dpdk-16.04 > + cd $DPDK_DIR > + ``` > > - isolcpus can be used on the kernel bootline to isolate cores from the > - kernel scheduler and hence dedicate them to OVS or other packet > - forwarding related workloads. For example a Linux kernel boot-line > - could be: > + 2. Configure and Install DPDK > > - ``` > - GRUB_CMDLINE_LINUX_DEFAULT="quiet hugepagesz=1G hugepages=4 > - default_hugepagesz=1G 'intel_iommu=off' isolcpus=1-19" > - ``` > + Build and install the DPDK library. > > -8. NUMA/Cluster On Die > + ``` > + export DPDK_TARGET=x86_64-native-linuxapp-gcc > + export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET > + make install T=$DPDK_TARGET DESTDIR=install > + ``` > > - Ideally inter NUMA datapaths should be avoided where possible as > packets > - will go across QPI and there may be a slight performance penalty when > - compared with intra NUMA datapaths. On Intel Xeon Processor E5 v3, > - Cluster On Die is introduced on models that have 10 cores or more. > - This makes it possible to logically split a socket into two NUMA > regions > - and again it is preferred where possible to keep critical datapaths > - within the one cluster. > + Note: For IVSHMEM, Set `export > DPDK_TARGET=x86_64-ivshmem-linuxapp-gcc` > > - It is good practice to ensure that threads that are in the datapath are > - pinned to cores in the same NUMA area. e.g. pmd threads and QEMU vCPUs > - responsible for forwarding. > +### 2.3 Install OVS > + OVS can be installed using different methods. For OVS to use DPDK > datapath, > + it has to be configured with DPDK support and is done by './configure > --with-dpdk'. > + This section focus on generic recipe that suits most cases and for > distribution > + specific instructions, refer [INSTALL.Fedora.md], [INSTALL.RHEL.md] and > + [INSTALL.Debian.md]. > > -9. Rx Mergeable buffers > + OVS can be downloaded in compressed format from the OVS release page > (or) > + cloned from git repository if user intends to develop and contribute > + patches upstream. > I think it is better just to have one download method, it keeps things simple. > > - Rx Mergeable buffers is a virtio feature that allows chaining of > multiple > - virtio descriptors to handle large packet sizes. As such, large packets > - are handled by reserving and chaining multiple free descriptors > - together. Mergeable buffer support is negotiated between the virtio - driver and virtio device and is supported by the DPDK vhost library. > - This behavior is typically supported and enabled by default, however > - in the case where the user knows that rx mergeable buffers are not > needed > - i.e. jumbo frames are not needed, it can be forced off by adding > - mrg_rxbuf=off to the QEMU command line options. By not reserving > multiple > - chains of descriptors it will make more individual virtio descriptors > - available for rx to the guest using dpdkvhost ports and this can > improve > - performance. > + - [Download OVS] tar ball and extract the file, for example in to > /usr/src > + and set OVS_DIR > > -10. Packet processing in the guest > + ``` > + wget -O ovs.tar https://github.com/openvswitch/ovs/tarball/master > + mkdir -p /usr/src/ovs > + tar -xvf ovs.tar -C /usr/src/ovs --strip-components=1 > + export OVS_DIR=/usr/src/ovs > + ``` > > - It is good practice whether simply forwarding packets from one > - interface to another or more complex packet processing in the guest, > - to ensure that the thread performing this work has as much CPU > - occupancy as possible. For example when the DPDK sample application > - `testpmd` is used to forward packets in the guest, multiple QEMU vCPU > - threads can be created. Taskset can then be used to affinitize the > - vCPU thread responsible for forwarding to a dedicated core not used > - for other general processing on the host system. > + - Clone the Git repository for OVS, for example in to /usr/src > > -11. DPDK virtio pmd in the guest > + ``` > + cd /usr/src/ > + git clone https://github.com/openvswitch/ovs.git > + export OVS_DIR=/usr/src/ovs > + ``` > > - dpdkvhostcuse or dpdkvhostuser ports can be used to accelerate the path > - to the guest using the DPDK vhost library. This library is compatible > with > - virtio-net drivers in the guest but significantly better performance > can > - be observed when using the DPDK virtio pmd driver in the guest. The > DPDK > - `testpmd` application can be used in the guest as an example > application > - that forwards packet from one DPDK vhost port to another. An example of > - running `testpmd` in the guest can be seen here. > + - Install OVS dependencies > > - ``` > - ./testpmd -c 0x3 -n 4 --socket-mem 512 -- --burst=64 -i > --txqflags=0xf00 > - --disable-hw-vlan --forward-mode=io --auto-start > - ``` > + GNU make, GCC 4.x (or) Clang 3.4 (Mandatory) > + libssl, libcap-ng, Python 2.7 (Optional) > + More information can be found at [Build Requirements] > > - See below information on dpdkvhostcuse and dpdkvhostuser ports. > - See [DPDK Docs] for more information on `testpmd`. > + - Configure, Install OVS > > -DPDK Rings : > ------------- > + ``` > + cd $OVS_DIR > + ./boot.sh > + ./configure --with-dpdk=$DPDK_BUILD > + make install > + ``` > > -Following the steps above to create a bridge, you can now add dpdk rings > -as a port to the vswitch. OVS will expect the DPDK ring device name to > -start with dpdkr and end with a portid. > + Note: Passing DPDK_BUILD can be skipped if DPDK library is installed > in > + standard locations i.e `./configure --with-dpdk` should suffice. > > -`ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 type=dpdkr` > + Additional information can be found in [INSTALL.md]. > > -DPDK rings client test application > +## <a name="ovssetup"></a> 3. Setup OVS with DPDK datapath > > -Included in the test directory is a sample DPDK application for testing > -the rings. This is from the base dpdk directory and modified to work > -with the ring naming used within ovs. > +### 3.1 Setup Hugepages > > -location tests/ovs_client > + Allocate and mount 2M Huge pages: > > -To run the client : > + - For persistent allocation of huge pages, write to hugepages.conf file > + in /etc/sysctl.d > > -``` > -cd /usr/src/ovs/tests/ > -ovsclient -c 1 -n 4 --proc-type=secondary -- -n "port id you gave dpdkr" > -``` > + `echo 'vm.nr_hugepages=2048' > /etc/sysctl.d/hugepages.conf` > > -In the case of the dpdkr example above the "port id you gave dpdkr" is 0. > + - For run-time allocation of huge pages > > -It is essential to have --proc-type=secondary > + `sysctl -w vm.nr_hugepages=N` where N = No. of 2M huge pages allocated > > -The application simply receives an mbuf on the receive queue of the > -ethernet ring and then places that same mbuf on the transmit ring of > -the ethernet ring. It is a trivial loopback application. > + - To verify hugepage configuration > > -DPDK rings in VM (IVSHMEM shared memory communications) > -------------------------------------------------------- > + `grep HugePages_ /proc/meminfo` > > -In addition to executing the client in the host, you can execute it within > -a guest VM. To do so you will need a patched qemu. You can download the > -patch and getting started guide at : > + - Mount hugepages > > -https://01.org/packet-processing/downloads > + `mount -t hugetlbfs none /dev/hugepages` > > -A general rule of thumb for better performance is that the client > -application should not be assigned the same dpdk core mask "-c" as > -the vswitchd. > + Note: Mount hugepages if not already mounted by default. > > -DPDK vhost: > ------------ > +### 3.2 Setup DPDK devices using VFIO > > -DPDK 16.04 supports two types of vhost: > + - Supported with DPDK release >= 1.7 and kernel version >= 3.6 > It is already mentioned that DPDK 16.04 is required, then the comment about the DPDK version is not necessary. > + - VFIO needs support from BIOS and kernel. > + - BIOS changes: > > -1. vhost-user > -2. vhost-cuse > + Enable VT-d, can be verified from `dmesg | grep -e DMAR -e IOMMU` > output > > -Whatever type of vhost is enabled in the DPDK build specified, is the type > -that will be enabled in OVS. By default, vhost-user is enabled in DPDK. > -Therefore, unless vhost-cuse has been enabled in DPDK, vhost-user ports > -will be enabled in OVS. > -Please note that support for vhost-cuse is intended to be deprecated in > OVS > -in a future release. > + - GRUB bootline: > > -DPDK vhost-user: > ----------------- > + Add `iommu=pt intel_iommu=on`, can be verified from `cat > /proc/cmdline` output > > -The following sections describe the use of vhost-user 'dpdkvhostuser' > ports > -with OVS. > + - Load modules and bind the NIC to VFIO driver > > -DPDK vhost-user Prerequisites: > -------------------------- > + ``` > + modprobe vfio-pci > + sudo /usr/bin/chmod a+x /dev/vfio > + sudo /usr/bin/chmod 0666 /dev/vfio/* > + $DPDK_DIR/tools/dpdk_nic_bind.py --bind=vfio-pci eth1 > + $DPDK_DIR/tools/dpdk_nic_bind.py --status > + ``` > > -1. DPDK 16.04 with vhost support enabled as documented in the "Building > and > - Installing section" > + Note: If using older DPDK release (or) running kernels < 3.6 UIO > drivers to be used, > Same here for DPDK version > + please check section 4 (DPDK devices using UIO) for the steps. > > -2. QEMU version v2.1.0+ > +### 3.3 Setup OVS > > - QEMU v2.1.0 will suffice, but it is recommended to use v2.2.0 if > providing > - your VM with memory greater than 1GB due to potential issues with > memory > - mapping larger areas. > + 1. DB creation (One time step) > > -Adding DPDK vhost-user ports to the Switch: > --------------------------------------- > + ``` > + mkdir -p /usr/local/etc/openvswitch > + mkdir -p /usr/local/var/run/openvswitch > + rm /usr/local/etc/openvswitch/conf.db > + ovsdb-tool create /usr/local/etc/openvswitch/conf.db \ > + /usr/local/share/openvswitch/vswitch.ovsschema > + ``` > > -Following the steps above to create a bridge, you can now add DPDK > vhost-user > -as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-user ports > can > -have arbitrary names, except that forward and backward slashes are > prohibited > -in the names. > + 2. Start ovsdb-server > > - - For vhost-user, the name of the port type is `dpdkvhostuser` > + No SSL support > > ``` > - ovs-vsctl add-port br0 vhost-user-1 -- set Interface vhost-user-1 > - type=dpdkvhostuser > + ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \ > + --remote=db:Open_vSwitch,Open_vSwitch,manager_options \ > + --pidfile --detach > ``` > > - This action creates a socket located at > - `/usr/local/var/run/openvswitch/vhost-user-1`, which you must provide > - to your VM on the QEMU command line. More instructions on this can be > - found in the next section "DPDK vhost-user VM configuration" > - - If you wish for the vhost-user sockets to be created in a > sub-directory of > - `/usr/local/var/run/openvswitch`, you may specify this directory in > the > - ovsdb like so: > - > - `./utilities/ovs-vsctl --no-wait \ > - set Open_vSwitch . other_config:vhost-sock-dir=subdir` > - > -DPDK vhost-user VM configuration: > ---------------------------------- > -Follow the steps below to attach vhost-user port(s) to a VM. > + SSL support > > -1. Configure sockets. > - Pass the following parameters to QEMU to attach a vhost-user device: > + ``` > + ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \ > + --remote=db:Open_vSwitch,Open_vSwitch,manager_options \ > + --private-key=db:Open_vSwitch,SSL,private_key \ > + --certificate=Open_vSwitch,SSL,certificate \ > + --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile > --detach > + ``` > > - ``` > - -chardev > socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-user-1 > - -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce > - -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1 > - ``` > + 3. Initialize DB (One time step) > > - ...where vhost-user-1 is the name of the vhost-user port added > - to the switch. > - Repeat the above parameters for multiple devices, changing the > - chardev path and id as necessary. Note that a separate and different > - chardev path needs to be specified for each vhost-user device. For > - example you have a second vhost-user port named 'vhost-user-2', you > - append your QEMU command line with an additional set of parameters: > + ``` > + ovs-vsctl --no-wait init > + ``` > > - ``` > - -chardev > socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2 > - -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce > - -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2 > - ``` > + 4. Start vswitchd > > -2. Configure huge pages. > - QEMU must allocate the VM's memory on hugetlbfs. vhost-user ports > access > - a virtio-net device's virtual rings and packet buffers mapping the VM's > - physical memory on hugetlbfs. To enable vhost-user ports to map the > VM's > - memory into their process address space, pass the following paramters > - to QEMU: > + DPDK configuration arguments can be passed to vswitchd via > Open_vSwitch > + 'other_config' column. The important configuration options are > listed below. > + Defaults will be provided for all values not explicitly set. Refer > + ovs-vswitchd.conf.db(5) for additional information on configuration > options. > > - ``` > - -object memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages, > - share=on > - -numa node,memdev=mem -mem-prealloc > - ``` > + * dpdk-init > + Specifies whether OVS should initialize and support DPDK ports. This > is > + a boolean, and defaults to false. > > -3. Optional: Enable multiqueue support > - The vhost-user interface must be configured in Open vSwitch with the > - desired amount of queues with: > + * dpdk-lcore-mask > + Specifies the CPU cores on which dpdk lcore threads should be > spawned and > + expects hex string (eg '0x123'). > > - ``` > - ovs-vsctl set Interface vhost-user-2 options:n_rxq=<requested queues> > - ``` > + * dpdk-socket-mem > + Comma separated list of memory to pre-allocate from hugepages on > specific > + sockets. > > - QEMU needs to be configured as well. > - The $q below should match the queues requested in OVS (if $q is more, > - packets will not be received). > - The $v is the number of vectors, which is '$q x 2 + 2'. > + * dpdk-hugepage-dir > + Directory where hugetlbfs is mounted > > - ``` > - -chardev > socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-user-2 > - -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce,queues=$q > - -device > virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=$v > - ``` > + * vhost-sock-dir > + Option to set the path to the vhost_user unix socket files. > > - If one wishes to use multiple queues for an interface in the guest, the > - driver in the guest operating system must be configured to do so. It is > - recommended that the number of queues configured be equal to '$q'. > + NOTE: Changing any of these options requires restarting the > ovs-vswitchd > + application. > > - For example, this can be done for the Linux kernel virtio-net driver > with: > + Open vSwitch can be started as normal. DPDK will be initialized as > long > + as the dpdk-init option has been set to 'true'. > > - ``` > - ethtool -L <DEV> combined <$q> > - ``` > + ``` > + export DB_SOCK=/usr/local/var/run/openvswitch/db.sock > + ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true > + ovs-vswitchd unix:$DB_SOCK --pidfile --detach > + ``` > > - A note on the command above: > + If allocated more than one GB hugepage (as for IVSHMEM), set amount > and > + use NUMA node 0 memory. For details on using ivshmem with DPDK, > refer to > + [OVS Testcases]. > > - `-L`: Changes the numbers of channels of the specified network device > + ``` > + ovs-vsctl --no-wait set Open_vSwitch . > other_config:dpdk-socket-mem="1024,0" > + ovs-vswitchd unix:$DB_SOCK --pidfile --detach > + ``` > > - `combined`: Changes the number of multi-purpose channels. > + To better scale the work loads across cores, Multiple pmd threads > can be > + created and pinned to CPU cores by explicity specifying pmd-cpu-mask. > + eg: To spawn 2 pmd threads and pin them to cores 1, 2 > > -DPDK vhost-cuse: > ----------------- > + ``` > + ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6 > + ``` > > -The following sections describe the use of vhost-cuse 'dpdkvhostcuse' > ports > -with OVS. > + 5. Create bridge & add DPDK devices > > -DPDK vhost-cuse Prerequisites: > -------------------------- > + create a bridge with datapath_type "netdev" in the configuration > database > > -1. DPDK 16.04 with vhost support enabled as documented in the "Building > and > - Installing section" > - As an additional step, you must enable vhost-cuse in DPDK by setting > the > - following additional flag in `config/common_base`: > + `ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev` > > - `CONFIG_RTE_LIBRTE_VHOST_USER=n` > + Now you can add DPDK devices. OVS expects DPDK device names to start > with > + "dpdk" and end with a portid. vswitchd should print (in the log > file) the > + number of dpdk devices found. > > - Following this, rebuild DPDK as per the instructions in the "Building > and > - Installing" section. Finally, rebuild OVS as per step 3 in the > "Building > - and Installing" section - OVS will detect that DPDK has vhost-cuse > libraries > - compiled and in turn will enable support for it in the switch and > disable > - vhost-user support. > + ``` > + ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk > + ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk > + ``` > > -2. Insert the Cuse module: > + After the DPDK ports get added to switch, a polling thread > continuously polls > + DPDK devices and consumes 100% of the core as can be checked from > 'top' and 'ps' cmds. > > - `modprobe cuse` > + ``` > + top -H > + ps -eLo pid,psr,comm | grep pmd > + ``` > > -3. Build and insert the `eventfd_link` module: > + Note: creating bonds of DPDK interfaces is slightly different to > creating > + bonds of system interfaces. For DPDK, the interface type must be > explicitly > + set, for example: > > ``` > - cd $DPDK_DIR/lib/librte_vhost/eventfd_link/ > - make > - insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko > + ovs-vsctl add-bond br0 dpdkbond dpdk0 dpdk1 -- set Interface dpdk0 > type=dpdk -- set Interface dpdk1 type=dpdk > ``` > > -4. QEMU version v2.1.0+ > + 6. PMD thread statistics > > - vhost-cuse will work with QEMU v2.1.0 and above, however it is > recommended to > - use v2.2.0 if providing your VM with memory greater than 1GB due to > potential > - issues with memory mapping larger areas. > - Note: QEMU v1.6.2 will also work, with slightly different command line > parameters, > - which are specified later in this document. > + ``` > + # Check current stats > + ovs-appctl dpif-netdev/pmd-stats-show > > -Adding DPDK vhost-cuse ports to the Switch: > --------------------------------------- > + # Show port/rxq assignment > + ovs-appctl dpif-netdev/pmd-rxq-show > > -Following the steps above to create a bridge, you can now add DPDK > vhost-cuse > -as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-cuse ports > can have > -arbitrary names. > + # Clear previous stats > + ovs-appctl dpif-netdev/pmd-stats-clear > + ``` > > - - For vhost-cuse, the name of the port type is `dpdkvhostcuse` > + 7. Stop vswitchd & Delete bridge > > ``` > - ovs-vsctl add-port br0 vhost-cuse-1 -- set Interface vhost-cuse-1 > - type=dpdkvhostcuse > + ovs-appctl -t ovs-vswitchd exit > + ovs-appctl -t ovsdb-server exit > + ovs-vsctl del-br br0 > ``` > > - When attaching vhost-cuse ports to QEMU, the name provided during the > - add-port operation must match the ifname parameter on the QEMU > command > - line. More instructions on this can be found in the next section. > +## <a name="builddpdk"></a> 4. DPDK in the VM > > -DPDK vhost-cuse VM configuration: > ---------------------------------- > +DPDK 'testpmd' application can be run in the Guest VM for high speed > +packet forwarding between vhostuser ports. This needs DPDK, testpmd to be > +compiled along with kernel modules. I think that sentence is not clear. What do you mean by "testpmd to be compiled along with kernel modules" ? Below are the steps for setting up > +the testpmd application in the VM. More information on the vhostuser ports > +can be found in [Vhost Walkthrough]. > > - vhost-cuse ports use a Linux* character device to communicate with > QEMU. > - By default it is set to `/dev/vhost-net`. It is possible to reuse this > - standard device for DPDK vhost, which makes setup a little simpler but > it > - is better practice to specify an alternative character device in order > to > - avoid any conflicts if kernel vhost is to be used in parallel. > + * Instantiate the Guest > > -1. This step is only needed if using an alternative character device. > + ``` > + Qemu version >= 2.2.0 > > - The new character device filename must be specified in the ovsdb: > + export VM_NAME=Centos-vm > + export GUEST_MEM=3072M > + export QCOW2_IMAGE=/root/CentOS7_x86_64.qcow2 > + export VHOST_SOCK_DIR=/usr/local/var/run/openvswitch > > - `./utilities/ovs-vsctl --no-wait set Open_vSwitch . \ > - other_config:cuse-dev-name=my-vhost-net` > + qemu-system-x86_64 -name $VM_NAME -cpu host -enable-kvm -m $GUEST_MEM > -object > memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on > -numa node,memdev=mem -mem-prealloc -smp sockets=1,cores=2 -drive > file=$QCOW2_IMAGE -chardev > socket,id=char0,path=$VHOST_SOCK_DIR/dpdkvhostuser0 -netdev > type=vhost-user,id=mynet1,chardev=char0,vhostforce -device > virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off -chardev > socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser1 -netdev > type=vhost-user,id=mynet2,chardev=char1,vhostforce -device > virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mrg_rxbuf=off > --nographic -snapshot > I would remove all the things that are not exclusively related to vhost user ports, for example "-name", "--nographic", "--snapshot". I am not sure if putting the disk image is a good idea. > + ``` > > - In the example above, the character device to be used will be > - `/dev/my-vhost-net`. > + * Copy the DPDK Srcs to VM and build DPDK > > -2. This step is only needed if reusing the standard character device. It > will > - conflict with the kernel vhost character device so the user must first > - remove it. > + ``` > + cd /root/dpdk/ > + wget http://dpdk.org/browse/dpdk/snapshot/dpdk-16.04.zip > + unzip dpdk-16.04.zip > + export DPDK_DIR=/root/dpdk/dpdk-16.04 > + cd $DPDK_DIR > + export DPDK_TARGET=x86_64-native-linuxapp-gcc > + export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET > + make install T=$DPDK_TARGET DESTDIR=install > + ``` > > - `rm -rf /dev/vhost-net` > + * Build the test-pmd application > > -3a. Configure virtio-net adaptors: > - The following parameters must be passed to the QEMU binary: > + ``` > + cd app/test-pmd > + make > I am asking without trying it first, Is not it necessary to set RTE_TARGET and RTE_SDK before running make? > + ``` > > - ``` > - -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on > - -device virtio-net-pci,netdev=net1,mac=<mac> > - ``` > + * Setup Huge pages and DPDK devices using UIO > > - Repeat the above parameters for multiple devices. > + ``` > + sysctl vm.nr_hugepages=1024 > + mkdir -p /dev/hugepages > + mount -t hugetlbfs hugetlbfs /dev/hugepages (only if not already > mounted) > + modprobe uio > + insmod $DPDK_BUILD/kmod/igb_uio.ko > + $DPDK_DIR/tools/dpdk_nic_bind.py --status > + $DPDK_DIR/tools/dpdk_nic_bind.py -b igb_uio 00:03.0 00:04.0 > + ``` > > - The DPDK vhost library will negiotiate its own features, so they > - need not be passed in as command line params. Note that as offloads > are > - disabled this is the equivalent of setting: > + vhost ports pci ids can be retrieved using `lspci | grep Ethernet` cmd. > > - `csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off` > +## <a name="ovstc"></a> 5. OVS Testcases > > -3b. If using an alternative character device. It must be also explicitly > - passed to QEMU using the `vhostfd` argument: > + Below are few testcases and the list of steps to be followed. > > - ``` > - -netdev tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on, > - vhostfd=<open_fd> > - -device virtio-net-pci,netdev=net1,mac=<mac> > - ``` > +### 5.1 PHY-PHY > + > + The steps (1-5) in 3.3 section will create & initialize DB, start > vswitchd and also > + add DPDK devices to bridge 'br0'. > > - The open file descriptor must be passed to QEMU running as a child > - process. This could be done with a simple python script. > + 1. Add Test flows to forward packets betwen DPDK port 0 and port 1 > > ``` > - #!/usr/bin/python > - fd = os.open("/dev/usvhost", os.O_RDWR) > - subprocess.call("qemu-system-x86_64 .... -netdev tap,id=vhostnet0,\ > - vhost=on,vhostfd=" + fd +"...", shell=True) > - > - Alternatively the `qemu-wrap.py` script can be used to automate the > - requirements specified above and can be used in conjunction with > libvirt if > - desired. See the "DPDK vhost VM configuration with QEMU wrapper" > section > - below. > - > -4. Configure huge pages: > - QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a > - virtio-net device's virtual rings and packet buffers mapping the VM's > - physical memory on hugetlbfs. To enable vhost-ports to map the VM's > - memory into their process address space, pass the following parameters > - to QEMU: > - > - `-object > memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages, > - share=on -numa node,memdev=mem -mem-prealloc` > - > - Note: For use with an earlier QEMU version such as v1.6.2, use the > - following to configure hugepages instead: > - > - `-mem-path /dev/hugepages -mem-prealloc` > - > -DPDK vhost-cuse VM configuration with QEMU wrapper: > ---------------------------------------------------- > -The QEMU wrapper script automatically detects and calls QEMU with the > -necessary parameters. It performs the following actions: > - > - * Automatically detects the location of the hugetlbfs and inserts this > - into the command line parameters. > - * Automatically open file descriptors for each virtio-net device and > - inserts this into the command line parameters. > - * Calls QEMU passing both the command line parameters passed to the > - script itself and those it has auto-detected. > - > -Before use, you **must** edit the configuration parameters section of the > -script to point to the correct emulator location and set additional > -settings. Of these settings, `emul_path` and `us_vhost_path` **must** be > -set. All other settings are optional. > - > -To use directly from the command line simply pass the wrapper some of the > -QEMU parameters: it will configure the rest. For example: > - > -``` > -qemu-wrap.py -cpu host -boot c -hda <disk image> -m 4096 -smp 4 > - --enable-kvm -nographic -vnc none -net none -netdev tap,id=net1, > - script=no,downscript=no,ifname=if1,vhost=on -device virtio-net-pci, > - netdev=net1,mac=00:00:00:00:00:01 > -``` > - > -DPDK vhost-cuse VM configuration with libvirt: > ----------------------------------------------- > - > -If you are using libvirt, you must enable libvirt to access the character > -device by adding it to controllers cgroup for libvirtd using the following > -steps. > - > - 1. In `/etc/libvirt/qemu.conf` add/edit the following lines: > - > - ``` > - 1) clear_emulator_capabilities = 0 > - 2) user = "root" > - 3) group = "root" > - 4) cgroup_device_acl = [ > - "/dev/null", "/dev/full", "/dev/zero", > - "/dev/random", "/dev/urandom", > - "/dev/ptmx", "/dev/kvm", "/dev/kqemu", > - "/dev/rtc", "/dev/hpet", "/dev/net/tun", > - "/dev/<my-vhost-device>", > - "/dev/hugepages"] > - ``` > - > - <my-vhost-device> refers to "vhost-net" if using the > `/dev/vhost-net` > - device. If you have specificed a different name in the database > - using the "other_config:cuse-dev-name" parameter, please specify > that > - filename instead. > - > - 2. Disable SELinux or set to permissive mode > - > - 3. Restart the libvirtd process > - For example, on Fedora: > - > - `systemctl restart libvirtd.service` > - > -After successfully editing the configuration, you may launch your > -vhost-enabled VM. The XML describing the VM can be configured like so > -within the <qemu:commandline> section: > - > - 1. Set up shared hugepages: > + # Clear current flows > + ovs-ofctl del-flows br0 > > - ``` > - <qemu:arg value='-object'/> > - <qemu:arg > value='memory-backend-file,id=mem,size=4096M,mem-path=/dev/hugepages,share=on'/> > - <qemu:arg value='-numa'/> > - <qemu:arg value='node,memdev=mem'/> > - <qemu:arg value='-mem-prealloc'/> > - ``` > + # Add flows between port 1 (dpdk0) to port 2 (dpdk1) > + ovs-ofctl add-flow br0 in_port=1,action=output:2 > + ovs-ofctl add-flow br0 in_port=2,action=output:1 > + ``` > > - 2. Set up your tap devices: > +### 5.2 PHY-VM-PHY [VHOST LOOPBACK] > > - ``` > - <qemu:arg value='-netdev'/> > - <qemu:arg > value='type=tap,id=net1,script=no,downscript=no,ifname=vhost0,vhost=on'/> > - <qemu:arg value='-device'/> > - <qemu:arg value='virtio-net-pci,netdev=net1,mac=00:00:00:00:00:01'/> > - ``` > + The steps (1-5) in 3.3 section will create & initialize DB, start > vswitchd and also > + add DPDK devices to bridge 'br0'. > > - Repeat for as many devices as are desired, modifying the id, ifname > - and mac as necessary. > + 1. Add dpdkvhostuser ports to bridge 'br0'. More information on the > dpdkvhostuser ports > + can be found in [Vhost Walkthrough]. > > - Again, if you are using an alternative character device (other than > - `/dev/vhost-net`), please specify the file descriptor like so: > + ``` > + ovs-vsctl add-port br0 dpdkvhostuser0 -- set Interface > dpdkvhostuser0 type=dpdkvhostuser > + ovs-vsctl add-port br0 dpdkvhostuser1 -- set Interface > dpdkvhostuser1 type=dpdkvhostuser > + ``` > > - `<qemu:arg > value='type=tap,id=net3,script=no,downscript=no,ifname=vhost0,vhost=on,vhostfd=<open_fd>'/>` > + 2. Add Test flows to forward packets betwen DPDK devices and VM ports > > - Where <open_fd> refers to the open file descriptor of the character > device. > - Instructions of how to retrieve the file descriptor can be found in > the > - "DPDK vhost VM configuration" section. > - Alternatively, the process is automated with the qemu-wrap.py script, > - detailed in the next section. > + ``` > + # Clear current flows > + ovs-ofctl del-flows br0 > > -Now you may launch your VM using virt-manager, or like so: > + # Add flows > + ovs-ofctl add-flow br0 idle_timeout=0,in_port=1,action=output:3 > + ovs-ofctl add-flow br0 idle_timeout=0,in_port=3,action=output:1 > + ovs-ofctl add-flow br0 idle_timeout=0,in_port=4,action=output:2 > + ovs-ofctl add-flow br0 idle_timeout=0,in_port=2,action=output:4 > What is "idle_timeout=0" for?, is this strictly necessary? > - `virsh create my_vhost_vm.xml` > + # Dump flows > + ovs-ofctl dump-flows br0 > + ``` > > -DPDK vhost-cuse VM configuration with libvirt and QEMU wrapper: > ----------------------------------------------------------- > + 3. Instantiate Guest VM using Qemu cmdline > > -To use the qemu-wrapper script in conjuntion with libvirt, follow the > -steps in the previous section before proceeding with the following steps: > + Guest Configuration > > - 1. Place `qemu-wrap.py` in libvirtd's binary search PATH ($PATH) > - Ideally in the same directory that the QEMU binary is located. > + ``` > + | configuration | values | comments > + |----------------------|--------|----------------- > + | qemu version | 2.2.0 | > + | qemu thread affinity | core 5 | taskset 0x20 > + | memory | 4GB | - > + | cores | 2 | - > + | Qcow2 image | CentOS7| - > + | mrg_rxbuf | off | - > + ``` > > - 2. Ensure that the script has the same owner/group and file permissions > - as the QEMU binary. > + Instantiate Guest > > - 3. Update the VM xml file using "virsh edit VM.xml" > + ``` > + export VM_NAME=vhost-vm > + export GUEST_MEM=3072M > + export QCOW2_IMAGE=/root/CentOS7_x86_64.qcow2 > + export VHOST_SOCK_DIR=/usr/local/var/run/openvswitch > > - 1. Set the VM to use the launch script. > - Set the emulator path contained in the `<emulator><emulator/>` > tags. > - For example, replace: > + taskset 0x20 qemu-system-x86_64 -name $VM_NAME -cpu host > -enable-kvm -m $GUEST_MEM -object > memory-backend-file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on > -numa node,memdev=mem -mem-prealloc -smp sockets=1,cores=2 -drive > file=$QCOW2_IMAGE -chardev > socket,id=char0,path=$VHOST_SOCK_DIR/dpdkvhostuser0 -netdev > type=vhost-user,id=mynet1,chardev=char0,vhostforce -device > virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off -chardev > socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser1 -netdev > type=vhost-user,id=mynet2,chardev=char1,vhostforce -device > virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2,mrg_rxbuf=off > --nographic -snapshot > + ``` > > - `<emulator>/usr/bin/qemu-kvm<emulator/>` > + 4. Guest VM using libvirt > > - with: > + The below is a simple xml configuration of 'demovm' guest that can > be instantiated > + using 'virsh'. The guest uses a pair of vhostuser port and boots > with 4GB RAM and 2 cores. > + More information can be found in [Vhost Walkthrough]. > > - `<emulator>/usr/bin/qemu-wrap.py<emulator/>` > + ``` > + <domain type='kvm'> > + <name>demovm</name> > + <uuid>4a9b3f53-fa2a-47f3-a757-dd87720d9d1d</uuid> > + <memory unit='KiB'>4194304</memory> > + <currentMemory unit='KiB'>4194304</currentMemory> > + <memoryBacking> > + <hugepages> > + <page size='2' unit='M' nodeset='0'/> > + </hugepages> > + </memoryBacking> > + <vcpu placement='static'>2</vcpu> > + <cputune> > + <shares>4096</shares> > + <vcpupin vcpu='0' cpuset='4'/> > + <vcpupin vcpu='1' cpuset='5'/> > + <emulatorpin cpuset='4,5'/> > + </cputune> > + <os> > + <type arch='x86_64' machine='pc'>hvm</type> > + <boot dev='hd'/> > + </os> > + <features> > + <acpi/> > + <apic/> > + </features> > + <cpu mode='host-model'> > + <model fallback='allow'/> > + <topology sockets='2' cores='1' threads='1'/> > + <numa> > + <cell id='0' cpus='0-1' memory='4194304' unit='KiB' > memAccess='shared'/> > + </numa> > + </cpu> > + <on_poweroff>destroy</on_poweroff> > + <on_reboot>restart</on_reboot> > + <on_crash>destroy</on_crash> > + <devices> > + <emulator>/usr/bin/qemu-kvm</emulator> > + <disk type='file' device='disk'> > + <driver name='qemu' type='qcow2' cache='none'/> > + <source file='/root/CentOS7_x86_64.qcow2'/> > + <target dev='vda' bus='virtio'/> > + </disk> > + <disk type='dir' device='disk'> > + <driver name='qemu' type='fat'/> > + <source dir='/usr/src/dpdk-16.04'/> > + <target dev='vdb' bus='virtio'/> > + <readonly/> > + </disk> > + <interface type='vhostuser'> > + <mac address='00:00:00:00:00:01'/> > + <source type='unix' > path='/usr/local/var/run/openvswitch/dpdkvhostuser0' mode='client'/> > + <model type='virtio'/> > + <driver queues='2'> > + <host mrg_rxbuf='off'/> > + </driver> > + </interface> > + <interface type='vhostuser'> > + <mac address='00:00:00:00:00:02'/> > + <source type='unix' > path='/usr/local/var/run/openvswitch/dpdkvhostuser1' mode='client'/> > + <model type='virtio'/> > + <driver queues='2'> > + <host mrg_rxbuf='off'/> > + </driver> > + </interface> > + <serial type='pty'> > + <target port='0'/> > + </serial> > + <console type='pty'> > + <target type='serial' port='0'/> > + </console> > + </devices> > + </domain> > + ``` > > - 4. Edit the Configuration Parameters section of the script to point to > - the correct emulator location and set any additional options. If you are > - using a alternative character device name, please set "us_vhost_path" > to the > - location of that device. The script will automatically detect and insert > - the correct "vhostfd" value in the QEMU command line arguments. > + 5. DPDK Packet forwarding in Guest VM > > - 5. Use virt-manager to launch the VM > + To accomplish this, DPDK and testpmd application have to be first > compiled > + on the VM and the steps are listed in [DPDK in the VM]. > > -Running ovs-vswitchd with DPDK backend inside a VM > --------------------------------------------------- > + * Run test-pmd application > > -Please note that additional configuration is required if you want to run > -ovs-vswitchd with DPDK backend inside a QEMU virtual machine. Ovs-vswitchd > -creates separate DPDK TX queues for each CPU core available. This > operation > -fails inside QEMU virtual machine because, by default, VirtIO NIC provided > -to the guest is configured to support only single TX queue and single RX > -queue. To change this behavior, you need to turn on 'mq' (multiqueue) > -property of all virtio-net-pci devices emulated by QEMU and used by DPDK. > -You may do it manually (by changing QEMU command line) or, if you use > Libvirt, > -by adding the following string: > + ``` > + cd $DPDK_DIR/app/test-pmd; > + ./testpmd -c 0x3 -n 4 --socket-mem 1024 -- --burst=64 -i > --txqflags=0xf00 --disable-hw-vlan > + set fwd mac_retry > + start > + ``` > > -`<driver name='vhost' queues='N'/>` > + * Bind vNIC back to kernel once the test is completed. > > -to <interface> sections of all network devices used by DPDK. Parameter 'N' > -determines how many queues can be used by the guest. > + ``` > + $DPDK_DIR/tools/dpdk_nic_bind.py --bind=virtio-pci 0000:00:03.0 > + $DPDK_DIR/tools/dpdk_nic_bind.py --bind=virtio-pci 0000:00:04.0 > + ``` > + Note: Appropriate PCI IDs to be passed in above example. The PCI > IDs can be > + retrieved using '$DPDK_DIR/tools/dpdk_nic_bind.py --status' cmd. > > -Restrictions: > -------------- > +### 5.3 PHY-VM-PHY [IVSHMEM] > > - - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue. > - - Currently DPDK port does not make use any offload functionality. > - - DPDK-vHost support works with 1G huge pages. > + The steps for setup of IVSHMEM are covered in section 5.2(PVP - IVSHMEM) > + of [OVS Testcases] in ADVANCED install guide. > > - ivshmem: > - - If you run Open vSwitch with smaller page sizes (e.g. 2MB), you may be > - unable to share any rings or mempools with a virtual machine. > - This is because the current implementation of ivshmem works by sharing > - a single 1GB huge page from the host operating system to any guest > - operating system through the Qemu ivshmem device. When using smaller > - page sizes, multiple pages may be required to hold the ring > descriptors > - and buffer pools. The Qemu ivshmem device does not allow you to share > - multiple file descriptors to the guest operating system. However, if > you > - want to share dpdkr rings with other processes on the host, you can do > - this with smaller page sizes. > +## <a name="ovslimits"></a> 6. Limitations > > - Platform and Network Interface: > - - By default with DPDK 16.04, a maximum of 64 TX queues can be used > with an > - Intel XL710 Network Interface on a platform with more than 64 logical > - cores. If a user attempts to add an XL710 interface as a DPDK port > type to > - a system as described above, an error will be reported that > initialization > - failed for the 65th queue. OVS will then roll back to the previous > - successful queue initialization and use that value as the total > number of > - TX queues available with queue locking. If a user wishes to use more > than > - 64 queues and avoid locking, then the > - `CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_PF` config parameter in DPDK > must be > - increased to the desired number of queues. Both DPDK and OVS must be > - recompiled for this change to take effect. > + - Supports MTU size 1500, MTU setting for DPDK netdevs will be in > future OVS release. > + - Currently DPDK ports does not use HW offload functionality. > > Bug Reporting: > -------------- > > Please report problems to b...@openvswitch.org. > > -[INSTALL.userspace.md]:INSTALL.userspace.md > + > +[DPDK requirements]: http://dpdk.org/doc/guides/linux_gsg/sys_reqs.html > +[Download DPDK]: http://dpdk.org/browse/dpdk/refs/ > +[Download OVS]: http://openvswitch.org/releases/ > +[DPDK Supported NICs]: http://dpdk.org/doc/nics > +[Build Requirements]: > https://github.com/openvswitch/ovs/blob/master/INSTALL.md#build-requirements > +[INSTALL.DPDK-ADVANCED.md]: INSTALL.DPDK-ADVANCED.md > +[OVS Testcases]: INSTALL.DPDK-ADVANCED.md#ovstc > +[Vhost Walkthrough]: INSTALL.DPDK-ADVANCED.md#vhost > +[DPDK in the VM]: INSTALL.DPDK.md#builddpdk > [INSTALL.md]:INSTALL.md > -[DPDK Linux GSG]: > http://www.dpdk.org/doc/guides/linux_gsg/build_dpdk.html#binding-and-unbinding-network-ports-to-from-the-igb-uioor-vfio-modules > -[DPDK Docs]: http://dpdk.org/doc > +[INSTALL.Fedora.md]:INSTALL.Fedora.md > +[INSTALL.RHEL.md]:INSTALL.RHEL.md > +[INSTALL.Debian.md]:INSTALL.Debian.md > -- > 2.4.11 > > Regards, Mauricio V, _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev