Thanks Aaron for reviewing the install guide. Please see my reply inline. > -----Original Message----- > From: Aaron Conole [mailto:acon...@redhat.com] > Sent: Friday, May 13, 2016 4:55 PM > To: Bodireddy, Bhanuprakash <bhanuprakash.bodire...@intel.com> > Cc: dev@openvswitch.org > Subject: Re: [ovs-dev] [PATCH 1/2] doc: Refactor DPDK install documentation > > Bhanuprakash Bodireddy <bhanuprakash.bodire...@intel.com> writes: > > > Refactor the INSTALL.DPDK in to two documents named INSTALL.DPDK and > > INSTALL.DPDK-ADVANCED. While INSTALL.DPDK document shall facilitate > the > > novice user in setting up the OVS DPDK and running it out of box, the > > ADVANCED document is targeted at expert users looking for the optimum > > performance running dpdk datapath. > > > > This commit updates INSTALL.DPDK.md document. > > > > Signed-off-by: Bhanuprakash Bodireddy > <bhanuprakash.bodire...@intel.com> > > --- > > INSTALL.DPDK.md | 1193 +++++++++++++++------------------------------------ > ---- > > 1 file changed, 331 insertions(+), 862 deletions(-) > > > > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md > > index 93f92e4..bf646bf 100644 > > --- a/INSTALL.DPDK.md > > +++ b/INSTALL.DPDK.md > > @@ -1,1001 +1,470 @@ > > -Using Open vSwitch with DPDK > > -============================ > > +OVS DPDK INSTALL GUIDE > > +================================ > > > > -Open vSwitch can use Intel(R) DPDK lib to operate entirely in > > -userspace. This file explains how to install and use Open vSwitch in > > -such a mode. > > +## Contents > > > > -The DPDK support of Open vSwitch is considered experimental. > > -It has not been thoroughly tested. > > +1. [Overview](#overview) > > +2. [Building and Installation](#build) > > +3. [Setup OVS DPDK datapath](#ovssetup) > > +4. [DPDK in the VM](#builddpdk) > > +5. [OVS Testcases](#ovstc) > > +6. [Limitations ](#ovslimits) > > > > -This version of Open vSwitch should be built manually with `configure` > > -and `make`. > > +## <a name="overview"></a> 1. Overview > > > > -OVS needs a system with 1GB hugepages support. > > +Open vSwitch can use DPDK lib to operate entirely in userspace. > > +This file provides information on installation and use of Open vSwitch > > +using DPDK datapath. This version of Open vSwitch should be built > manually > > +with `configure` and `make`. > > > > -Building and Installing: > > ------------------------- > > +The DPDK support of Open vSwitch is considered 'experimental'. > > > > -Required: DPDK 16.04 > > -Optional (if building with vhost-cuse): `fuse`, `fuse-devel` (`libfuse-dev` > > -on Debian/Ubuntu) > > +### Prerequisites > > > > -1. Configure build & install DPDK: > > - 1. Set `$DPDK_DIR` > > +* Required: DPDK 16.04 > > +* Hardware: [DPDK Supported NICs] when physical ports in use > > > > - ``` > > - export DPDK_DIR=/usr/src/dpdk-16.04 > > - cd $DPDK_DIR > > - ``` > > - > > - 2. Then run `make install` to build and install the library. > > - For default install without IVSHMEM: > > - > > - `make install T=x86_64-native-linuxapp-gcc DESTDIR=install` > > - > > - To include IVSHMEM (shared memory): > > - > > - `make install T=x86_64-ivshmem-linuxapp-gcc DESTDIR=install` > > - > > - For further details refer to http://dpdk.org/ > > - > > -2. Configure & build the Linux kernel: > > - > > - Refer to intel-dpdk-getting-started-guide.pdf for understanding > > - DPDK kernel requirement. > > - > > -3. Configure & build OVS: > > - > > - * Non IVSHMEM: > > - > > - `export DPDK_BUILD=$DPDK_DIR/x86_64-native-linuxapp-gcc/` > > - > > - * IVSHMEM: > > - > > - `export DPDK_BUILD=$DPDK_DIR/x86_64-ivshmem-linuxapp-gcc/` > > - > > - ``` > > - cd $(OVS_DIR)/ > > - ./boot.sh > > - ./configure --with-dpdk=$DPDK_BUILD [CFLAGS="-g -O2 -Wno-cast- > align"] > > - make > > - ``` > > - > > - Note: 'clang' users may specify the '-Wno-cast-align' flag to suppress > DPDK cast-align warnings. > > - > > -To have better performance one can enable aggressive compiler > optimizations and > > -use the special instructions(popcnt, crc32) that may not be available on > > all > > -machines. Instead of typing `make`, type: > > - > > -`make CFLAGS='-O3 -march=native'` > > - > > -Refer to [INSTALL.userspace.md] for general requirements of building > userspace OVS. > > - > > -Using the DPDK with ovs-vswitchd: > > ---------------------------------- > > - > > -1. Setup system boot > > - Add the following options to the kernel bootline: > > - > > - `default_hugepagesz=1GB hugepagesz=1G hugepages=1` > > - > > -2. Setup DPDK devices: > > - > > - DPDK devices can be setup using either the VFIO (for DPDK 1.7+) or UIO > > - modules. UIO requires inserting an out of tree driver igb_uio.ko that is > > - available in DPDK. Setup for both methods are described below. > > - > > - * UIO: > > - 1. insert uio.ko: `modprobe uio` > > - 2. insert igb_uio.ko: `insmod $DPDK_BUILD/kmod/igb_uio.ko` > > - 3. Bind network device to igb_uio: > > - `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio eth1` > > - > > - * VFIO: > > - > > - VFIO needs to be supported in the kernel and the BIOS. More > information > > - can be found in the [DPDK Linux GSG]. > > - > > - 1. Insert vfio-pci.ko: `modprobe vfio-pci` > > - 2. Set correct permissions on vfio device: `sudo /usr/bin/chmod a+x > /dev/vfio` > > - and: `sudo /usr/bin/chmod 0666 /dev/vfio/*` > > - 3. Bind network device to vfio-pci: > > - `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=vfio-pci eth1` > > - > > -3. Mount the hugetable filesystem > > - > > - `mount -t hugetlbfs -o pagesize=1G none /dev/hugepages` > > - > > - Ref to http://www.dpdk.org/doc/quick-start for verifying DPDK setup. > > - > > -4. Follow the instructions in [INSTALL.md] to install only the > > - userspace daemons and utilities (via 'make install'). > > - 1. First time only db creation (or clearing): > > - > > - ``` > > - mkdir -p /usr/local/etc/openvswitch > > - mkdir -p /usr/local/var/run/openvswitch > > - rm /usr/local/etc/openvswitch/conf.db > > - ovsdb-tool create /usr/local/etc/openvswitch/conf.db \ > > - /usr/local/share/openvswitch/vswitch.ovsschema > > - ``` > > - > > - 2. Start ovsdb-server > > - > > - ``` > > - ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock > \ > > - --remote=db:Open_vSwitch,Open_vSwitch,manager_options \ > > - --private-key=db:Open_vSwitch,SSL,private_key \ > > - --certificate=Open_vSwitch,SSL,certificate \ > > - --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile > > --detach > > - ``` > > - > > - 3. First time after db creation, initialize: > > - > > - ``` > > - ovs-vsctl --no-wait init > > - ``` > > - > > -5. Start vswitchd: > > - > > - DPDK configuration arguments can be passed to vswitchd via > Open_vSwitch > > - other_config column. The recognized configuration options are listed. > > - Defaults will be provided for all values not explicitly set. > > - > > - * dpdk-init > > - Specifies whether OVS should initialize and support DPDK ports. This is > > - a boolean, and defaults to false. > > - > > - * dpdk-lcore-mask > > - Specifies the CPU cores on which dpdk lcore threads should be spawned. > > - The DPDK lcore threads are used for DPDK library tasks, such as > > - library internal message processing, logging, etc. Value should be in > > - the form of a hex string (so '0x123') similar to the 'taskset' mask > > - input. > > - If not specified, the value will be determined by choosing the lowest > > - CPU core from initial cpu affinity list. Otherwise, the value will be > > - passed directly to the DPDK library. > > - For performance reasons, it is best to set this to a single core on > > - the system, rather than allow lcore threads to float. > > - > > - * dpdk-alloc-mem > > - This sets the total memory to preallocate from hugepages regardless of > > - processor socket. It is recommended to use dpdk-socket-mem instead. > > - > > - * dpdk-socket-mem > > - Comma separated list of memory to pre-allocate from hugepages on > specific > > - sockets. > > - > > - * dpdk-hugepage-dir > > - Directory where hugetlbfs is mounted > > - > > - * dpdk-extra > > - Extra arguments to provide to DPDK EAL, as previously specified on the > > - command line. Do not pass '--no-huge' to the system in this way. Support > > - for running the system without hugepages is nonexistent. > > - > > - * cuse-dev-name > > - Option to set the vhost_cuse character device name. > > - > > - * vhost-sock-dir > > - Option to set the path to the vhost_user unix socket files. > > - > > - NOTE: Changing any of these options requires restarting the ovs- > vswitchd > > - application. > > - > > - Open vSwitch can be started as normal. DPDK will be initialized as long > > - as the dpdk-init option has been set to 'true'. > > - > > - > > - ``` > > - export DB_SOCK=/usr/local/var/run/openvswitch/db.sock > > - ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true > > - ovs-vswitchd unix:$DB_SOCK --pidfile --detach > > - ``` > > - > > - If allocated more than one GB hugepage (as for IVSHMEM), set amount > and > > - use NUMA node 0 memory: > > - > > - ``` > > - ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket- > mem="1024,0" > > - ovs-vswitchd unix:$DB_SOCK --pidfile --detach > > - ``` > > - > > -6. Add bridge & ports > > - > > - To use ovs-vswitchd with DPDK, create a bridge with datapath_type > > - "netdev" in the configuration database. For example: > > - > > - `ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev` > > - > > - Now you can add dpdk devices. OVS expects DPDK device names to start > with > > - "dpdk" and end with a portid. vswitchd should print (in the log file) > > the > > - number of dpdk devices found. > > - > > - ``` > > - ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk > > - ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk > > - ``` > > - > > - Once first DPDK port is added to vswitchd, it creates a Polling thread > > and > > - polls dpdk device in continuous loop. Therefore CPU utilization > > - for that thread is always 100%. > > - > > - Note: creating bonds of DPDK interfaces is slightly different to > > creating > > - bonds of system interfaces. For DPDK, the interface type must be > explicitly > > - set, for example: > > - > > - ``` > > - ovs-vsctl add-bond br0 dpdkbond dpdk0 dpdk1 -- set Interface dpdk0 > type=dpdk -- set Interface dpdk1 type=dpdk > > - ``` > > - > > -7. Add test flows > > - > > - Test flow script across NICs (assuming ovs in /usr/src/ovs): > > - Execute script: > > - > > - ``` > > - #! /bin/sh > > - # Move to command directory > > - cd /usr/src/ovs/utilities/ > > - > > - # Clear current flows > > - ./ovs-ofctl del-flows br0 > > - > > - # Add flows between port 1 (dpdk0) to port 2 (dpdk1) > > - ./ovs-ofctl add-flow br0 in_port=1,action=output:2 > > - ./ovs-ofctl add-flow br0 in_port=2,action=output:1 > > - ``` > > - > > -8. QoS usage example > > - > > - Assuming you have a vhost-user port transmitting traffic consisting of > > - packets of size 64 bytes, the following command would limit the egress > > - transmission rate of the port to ~1,000,000 packets per second: > > - > > - `ovs-vsctl set port vhost-user0 qos=@newqos -- --id=@newqos create > qos > > - type=egress-policer other-config:cir=46000000 other-config:cbs=2048` > > - > > - To examine the QoS configuration of the port: > > - > > - `ovs-appctl -t ovs-vswitchd qos/show vhost-user0` > > - > > - To clear the QoS configuration from the port and ovsdb use the > following: > > - > > - `ovs-vsctl destroy QoS vhost-user0 -- clear Port vhost-user0 qos` > > - > > - For more details regarding egress-policer parameters please refer to the > > - vswitch.xml. > > - > > -Performance Tuning: > > -------------------- > > - > > - 1. PMD affinitization > > - > > - A poll mode driver (pmd) thread handles the I/O of all DPDK > > - interfaces assigned to it. A pmd thread will busy loop through > > - the assigned port/rxq's polling for packets, switch the packets > > - and send to a tx port if required. Typically, it is found that > > - a pmd thread is CPU bound, meaning that the greater the CPU > > - occupancy the pmd thread can get, the better the performance. To > > - that end, it is good practice to ensure that a pmd thread has as > > - many cycles on a core available to it as possible. This can be > > - achieved by affinitizing the pmd thread with a core that has no > > - other workload. See section 7 below for a description of how to > > - isolate cores for this purpose also. > > - > > - The following command can be used to specify the affinity of the > > - pmd thread(s). > > - > > - `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=<hex > string>` > > - > > - By setting a bit in the mask, a pmd thread is created and pinned > > - to the corresponding CPU core. e.g. to run a pmd thread on core 1 > > - > > - `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=2` > > - > > - For more information, please refer to the Open_vSwitch TABLE > section in > > - > > - `man ovs-vswitchd.conf.db` > > - > > - Note, that a pmd thread on a NUMA node is only created if there is > > - at least one DPDK interface from that NUMA node added to OVS. > > - > > - 2. Multiple poll mode driver threads > > - > > - With pmd multi-threading support, OVS creates one pmd thread > > - for each NUMA node by default. However, it can be seen that in > cases > > - where there are multiple ports/rxq's producing traffic, performance > > - can be improved by creating multiple pmd threads running on > separate > > - cores. These pmd threads can then share the workload by each being > > - responsible for different ports/rxq's. Assignment of ports/rxq's to > > - pmd threads is done automatically. > > - > > - The following command can be used to specify the affinity of the > > - pmd threads. > > - > > - `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=<hex > string>` > > - > > - A set bit in the mask means a pmd thread is created and pinned > > - to the corresponding CPU core. e.g. to run pmd threads on core 1 > and 2 > > - > > - `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6` > > - > > - For more information, please refer to the Open_vSwitch TABLE > section in > > - > > - `man ovs-vswitchd.conf.db` > > - > > - For example, when using dpdk and dpdkvhostuser ports in a bi- > directional > > - VM loopback as shown below, spreading the workload over 2 or 4 > pmd > > - threads shows significant improvements as there will be more total > CPU > > - occupancy available. > > - > > - NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1 > > - > > - The following command can be used to confirm that the port/rxq > assignment > > - to pmd threads is as required: > > - > > - `ovs-appctl dpif-netdev/pmd-rxq-show` > > +## <a name="build"></a> 2. Building and Installation > > > > - This can also be checked with: > > +### 2.1 Configure & build the Linux kernel > > > > - ``` > > - top -H > > - taskset -p <pid_of_pmd> > > - ``` > > +On Linux Distros running kernel version >= 3.0, kernel rebuild is not > required > > +and only grub cmdline needs to be updated for enabling IOMMU [VFIO > support - 3.2]. > > +For older kernels, check if kernel is built with UIO, HUGETLBFS, > PROC_PAGE_MONITOR, > > +HPET, HPET_MMAP support. > > > > - To understand where most of the pmd thread time is spent and > whether the > > - caches are being utilized, these commands can be used: > > - > > - ``` > > - # Clear previous stats > > - ovs-appctl dpif-netdev/pmd-stats-clear > > - > > - # Check current stats > > - ovs-appctl dpif-netdev/pmd-stats-show > > - ``` > > - > > - 3. DPDK port Rx Queues > > - > > - `ovs-vsctl set Interface <DPDK interface> options:n_rxq=<integer>` > > - > > - The command above sets the number of rx queues for DPDK > interface. > > - The rx queues are assigned to pmd threads on the same NUMA node > in a > > - round-robin fashion. For more information, please refer to the > > - Open_vSwitch TABLE section in > > - > > - `man ovs-vswitchd.conf.db` > > - > > - 4. Exact Match Cache > > - > > - Each pmd thread contains one EMC. After initial flow setup in the > > - datapath, the EMC contains a single table and provides the lowest > level > > - (fastest) switching for DPDK ports. If there is a miss in the EMC then > > - the next level where switching will occur is the datapath classifier. > > - Missing in the EMC and looking up in the datapath classifier incurs a > > - significant performance penalty. If lookup misses occur in the EMC > > - because it is too small to handle the number of flows, its size can > > - be increased. The EMC size can be modified by editing the define > > - EM_FLOW_HASH_SHIFT in lib/dpif-netdev.c. > > - > > - As mentioned above an EMC is per pmd thread. So an alternative > way of > > - increasing the aggregate amount of possible flow entries in EMC and > > - avoiding datapath classifier lookups is to have multiple pmd threads > > - running. This can be done as described in section 2. > > - > > - 5. Compiler options > > - > > - The default compiler optimization level is '-O2'. Changing this to > > - more aggressive compiler optimizations such as '-O3' or > > - '-Ofast -march=native' with gcc can produce performance gains. > > - > > - 6. Simultaneous Multithreading (SMT) > > - > > - With SMT enabled, one physical core appears as two logical cores > > - which can improve performance. > > - > > - SMT can be utilized to add additional pmd threads without > consuming > > - additional physical cores. Additional pmd threads may be added in > the > > - same manner as described in section 2. If trying to minimize the use > > - of physical cores for pmd threads, care must be taken to set the > > - correct bits in the pmd-cpu-mask to ensure that the pmd threads are > > - pinned to SMT siblings. > > - > > - For example, when using 2x 10 core processors in a dual socket > system > > - with HT enabled, /proc/cpuinfo will report 40 logical cores. To use > > - two logical cores which share the same physical core for pmd > threads, > > - the following command can be used to identify a pair of logical cores. > > - > > - `cat /sys/devices/system/cpu/cpuN/topology/thread_siblings_list` > > - > > - where N is the logical core number. In this example, it would show > that > > - cores 1 and 21 share the same physical core. The pmd-cpu-mask to > enable > > - two pmd threads running on these two logical cores (one physical > core) > > - is. > > - > > - `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=100002` > > - > > - Note that SMT is enabled by the Hyper-Threading section in the > > - BIOS, and as such will apply to the whole system. So the impact of > > - enabling/disabling it for the whole system should be considered > > - e.g. If workloads on the system can scale across multiple cores, > > - SMT may very beneficial. However, if they do not and perform best > > - on a single physical core, SMT may not be beneficial. > > - > > - 7. The isolcpus kernel boot parameter > > - > > - isolcpus can be used on the kernel bootline to isolate cores from the > > - kernel scheduler and hence dedicate them to OVS or other packet > > - forwarding related workloads. For example a Linux kernel boot-line > > - could be: > > - > > - 'GRUB_CMDLINE_LINUX_DEFAULT="quiet hugepagesz=1G > hugepages=4 default_hugepagesz=1G 'intel_iommu=off' isolcpus=1-19"' > > - > > - 8. NUMA/Cluster On Die > > - > > - Ideally inter NUMA datapaths should be avoided where possible as > packets > > - will go across QPI and there may be a slight performance penalty > when > > - compared with intra NUMA datapaths. On Intel Xeon Processor E5 > v3, > > - Cluster On Die is introduced on models that have 10 cores or more. > > - This makes it possible to logically split a socket into two NUMA > regions > > - and again it is preferred where possible to keep critical datapaths > > - within the one cluster. > > - > > - It is good practice to ensure that threads that are in the datapath are > > - pinned to cores in the same NUMA area. e.g. pmd threads and > QEMU vCPUs > > - responsible for forwarding. > > - > > - 9. Rx Mergeable buffers > > - > > - Rx Mergeable buffers is a virtio feature that allows chaining of > multiple > > - virtio descriptors to handle large packet sizes. As such, large packets > > - are handled by reserving and chaining multiple free descriptors > > - together. Mergeable buffer support is negotiated between the virtio > > - driver and virtio device and is supported by the DPDK vhost library. > > - This behavior is typically supported and enabled by default, however > > - in the case where the user knows that rx mergeable buffers are not > needed > > - i.e. jumbo frames are not needed, it can be forced off by adding > > - mrg_rxbuf=off to the QEMU command line options. By not reserving > multiple > > - chains of descriptors it will make more individual virtio descriptors > > - available for rx to the guest using dpdkvhost ports and this can > improve > > - performance. > > - > > - 10. Packet processing in the guest > > - > > - It is good practice whether simply forwarding packets from one > > - interface to another or more complex packet processing in the guest, > > - to ensure that the thread performing this work has as much CPU > > - occupancy as possible. For example when the DPDK sample > application > > - `testpmd` is used to forward packets in the guest, multiple QEMU > vCPU > > - threads can be created. Taskset can then be used to affinitize the > > - vCPU thread responsible for forwarding to a dedicated core not used > > - for other general processing on the host system. > > - > > - 11. DPDK virtio pmd in the guest > > - > > - dpdkvhostcuse or dpdkvhostuser ports can be used to accelerate the > path > > - to the guest using the DPDK vhost library. This library is compatible > with > > - virtio-net drivers in the guest but significantly better performance > can > > - be observed when using the DPDK virtio pmd driver in the guest. The > DPDK > > - `testpmd` application can be used in the guest as an example > application > > - that forwards packet from one DPDK vhost port to another. An > example of > > - running `testpmd` in the guest can be seen here. > > - > > - `./testpmd -c 0x3 -n 4 --socket-mem 512 -- --burst=64 -i -- > txqflags=0xf00 --disable-hw-vlan --forward-mode=io --auto-start` > > - > > - See below information on dpdkvhostcuse and dpdkvhostuser ports. > > - See [DPDK Docs] for more information on `testpmd`. > > +Details system requirements can be found at [DPDK requirements] > > > > +### 2.2 Install DPDK > > + 1. [Download DPDK] and extract the file, for example in to /usr/src > > + and set DPDK_DIR > > > > + ``` > > + cd /usr/src/ > > + unzip dpdk-16.04.zip > > > > -DPDK Rings : > > ------------- > > + export DPDK_DIR=/usr/src/dpdk-16.04 > > + cd $DPDK_DIR > > + ``` > > > > -Following the steps above to create a bridge, you can now add dpdk rings > > -as a port to the vswitch. OVS will expect the DPDK ring device name to > > -start with dpdkr and end with a portid. > > + 2. Configure, Install DPDK > > > > -`ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 type=dpdkr` > > + Build and install the DPDK library. > > > > -DPDK rings client test application > > + ``` > > + export DPDK_BUILD=$DPDK_DIR/x86_64-native-linuxapp-gcc > > + make install T=x86_64-native-linuxapp-gcc DESTDIR=install > > + ``` > > > > -Included in the test directory is a sample DPDK application for testing > > -the rings. This is from the base dpdk directory and modified to work > > -with the ring naming used within ovs. > > + Note: For previous DPDK releases, Set > `CONFIG_RTE_BUILD_COMBINE_LIBS=y` in > > + `config/common_linuxapp` to generate single library file. > > > > -location tests/ovs_client > > +### 2.3 Install OVS > > + OVS can be downloaded in compressed format from the OVS release > page (or) > > + cloned from git repository if user intends to develop and contribute > > + patches upstream. > > > > -To run the client : > > + - [Download OVS] tar ball and extract the file, for example in to > > /usr/src > > + and set OVS_DIR > > > > -``` > > -cd /usr/src/ovs/tests/ > > -ovsclient -c 1 -n 4 --proc-type=secondary -- -n "port id you gave dpdkr" > > -``` > > + ``` > > + cd /usr/src/ > > + tar -zxvf openvswitch-2.5.0.tar.gz > > + export OVS_DIR=/usr/src/openvswitch-2.5.0 > > + ``` > > > > -In the case of the dpdkr example above the "port id you gave dpdkr" is 0. > > + - Clone the Git repository for OVS, for example in to /usr/src > > > > -It is essential to have --proc-type=secondary > > + ``` > > + cd /usr/src/ > > + git clone https://github.com/openvswitch/ovs.git > > + export OVS_DIR=/usr/src/ovs > > + ``` > > > > -The application simply receives an mbuf on the receive queue of the > > -ethernet ring and then places that same mbuf on the transmit ring of > > -the ethernet ring. It is a trivial loopback application. > > + - Install OVS dependencies > > > > -DPDK rings in VM (IVSHMEM shared memory communications) > > -------------------------------------------------------- > > + GNU make, GCC 4.x (or) Clang 3.4 (Mandatory) > > + libssl, libcap-ng, Python 2.7 (Optional) > > + More information can be found at [Build Requirements] > > > > -In addition to executing the client in the host, you can execute it within > > -a guest VM. To do so you will need a patched qemu. You can download > the > > -patch and getting started guide at : > > + - Configure, Install OVS > > > > -https://01.org/packet-processing/downloads > > + ``` > > + cd $OVS_DIR > > + ./boot.sh > > + ./configure --with-dpdk > > + make install > > + ``` > > > > -A general rule of thumb for better performance is that the client > > -application should not be assigned the same dpdk core mask "-c" as > > -the vswitchd. > > +## <a name="ovssetup"></a> 3. Setup OVS with DPDK datapath > > > > -DPDK vhost: > > ------------ > > +### 3.1 Setup Hugepages > > I'd just move the section from the ADVANCED doc to here (at least the 2mb > huge pages, and 2mb huge pages persistence information). It doesn't make > sense I think to repeat it. The 1G and others could be left in advanced > as a performance tuning option (but there's really not much performance > difference between them, afaict).
Agree. I would bring in 2MB hugepages here(persistent and runtime allocation) and remove 2MB info from Advance Guide. Also for persistent allocation I would change it to use sysctl mechanism(writing to /etc/sysctl.d/hugepages.conf) instead of updating the grub cmdline in the next version. > > > -DPDK 16.04 supports two types of vhost: > > + Allocate and mount 2M Huge pages > > > > -1. vhost-user > > -2. vhost-cuse > > + ``` > > + echo N > /proc/sys/vm/nr_hugepages, where N = No. of huge pages > allocated > > + mount -t hugetlbfs none /dev/hugepages > > + ``` > > > > -Whatever type of vhost is enabled in the DPDK build specified, is the type > > -that will be enabled in OVS. By default, vhost-user is enabled in DPDK. > > -Therefore, unless vhost-cuse has been enabled in DPDK, vhost-user ports > > -will be enabled in OVS. > > -Please note that support for vhost-cuse is intended to be deprecated in > OVS > > -in a future release. > > +### 3.2 Setup DPDK devices using VFIO > > > > -DPDK vhost-user: > > ----------------- > > + - Supported with DPDK release >= 1.7 and kernel version >= 3.6 > > + - VFIO needs support from BIOS and kernel. > > + - BIOS changes: > > > > -The following sections describe the use of vhost-user 'dpdkvhostuser' > ports > > -with OVS. > > + Enable VT-d, can be verified from `dmesg | grep -e DMAR -e IOMMU` > output > > > > -DPDK vhost-user Prerequisites: > > -------------------------- > > + - GRUB bootline: > > > > -1. DPDK 16.04 with vhost support enabled as documented in the "Building > and > > - Installing section" > > + Add `iommu=pt intel_iommu=on`, can be verified from `cat > /proc/cmdline` output > > > > -2. QEMU version v2.1.0+ > > + - Load modules and bind the NIC to VFIO driver > > > > - QEMU v2.1.0 will suffice, but it is recommended to use v2.2.0 if > > providing > > - your VM with memory greater than 1GB due to potential issues with > memory > > - mapping larger areas. > > + ``` > > + modprobe vfio-pci > > + sudo /usr/bin/chmod a+x /dev/vfio > > + sudo /usr/bin/chmod 0666 /dev/vfio/* > > + $DPDK_DIR/tools/dpdk_nic_bind.py --bind=vfio-pci eth1 > > + $DPDK_DIR/tools/dpdk_nic_bind.py --status > > + ``` > > > > -Adding DPDK vhost-user ports to the Switch: > > --------------------------------------- > > + Note: If using older DPDK release (or) running kernels < 3.6 UIO drivers > to be used, > > + please check section 4 (DPDK devices using UIO) for the steps. > > > > -Following the steps above to create a bridge, you can now add DPDK > vhost-user > > -as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-user ports can > > -have arbitrary names, except that forward and backward slashes are > prohibited > > -in the names. > > +### 3.3 Setup OVS > > > > - - For vhost-user, the name of the port type is `dpdkvhostuser` > > + 1. DB creation (One time step) > > > > ``` > > - ovs-vsctl add-port br0 vhost-user-1 -- set Interface vhost-user-1 > > - type=dpdkvhostuser > > + mkdir -p /usr/local/etc/openvswitch > > + mkdir -p /usr/local/var/run/openvswitch > > + rm /usr/local/etc/openvswitch/conf.db > > + ovsdb-tool create /usr/local/etc/openvswitch/conf.db \ > > + /usr/local/share/openvswitch/vswitch.ovsschema > > ``` > > > > - This action creates a socket located at > > - `/usr/local/var/run/openvswitch/vhost-user-1`, which you must provide > > - to your VM on the QEMU command line. More instructions on this can > be > > - found in the next section "DPDK vhost-user VM configuration" > > - - If you wish for the vhost-user sockets to be created in a > > sub-directory of > > - `/usr/local/var/run/openvswitch`, you may specify this directory in the > > - ovsdb like so: > > - > > - `./utilities/ovs-vsctl --no-wait \ > > - set Open_vSwitch . other_config:vhost-sock-dir=subdir` > > - > > -DPDK vhost-user VM configuration: > > ---------------------------------- > > -Follow the steps below to attach vhost-user port(s) to a VM. > > - > > -1. Configure sockets. > > - Pass the following parameters to QEMU to attach a vhost-user device: > > - > > - ``` > > - -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost- > user-1 > > - -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce > > - -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1 > > - ``` > > - > > - ...where vhost-user-1 is the name of the vhost-user port added > > - to the switch. > > - Repeat the above parameters for multiple devices, changing the > > - chardev path and id as necessary. Note that a separate and different > > - chardev path needs to be specified for each vhost-user device. For > > - example you have a second vhost-user port named 'vhost-user-2', you > > - append your QEMU command line with an additional set of parameters: > > + 2. Start ovsdb-server > > > > - ``` > > - -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost- > user-2 > > - -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce > > - -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2 > > - ``` > > + No SSL support > > > > -2. Configure huge pages. > > - QEMU must allocate the VM's memory on hugetlbfs. vhost-user ports > access > > - a virtio-net device's virtual rings and packet buffers mapping the VM's > > - physical memory on hugetlbfs. To enable vhost-user ports to map the > VM's > > - memory into their process address space, pass the following paramters > > - to QEMU: > > - > > - ``` > > - -object memory-backend-file,id=mem,size=4096M,mem- > path=/dev/hugepages, > > - share=on > > - -numa node,memdev=mem -mem-prealloc > > - ``` > > - > > -3. Optional: Enable multiqueue support > > - The vhost-user interface must be configured in Open vSwitch with the > > - desired amount of queues with: > > - > > - ``` > > - ovs-vsctl set Interface vhost-user-2 options:n_rxq=<requested queues> > > - ``` > > - > > - QEMU needs to be configured as well. > > - The $q below should match the queues requested in OVS (if $q is more, > > - packets will not be received). > > - The $v is the number of vectors, which is '$q x 2 + 2'. > > + ``` > > + ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock > \ > > + --remote=db:Open_vSwitch,Open_vSwitch,manager_options \ > > + --pidfile --detach > > + ``` > > > > - ``` > > - -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost- > user-2 > > - -netdev type=vhost- > user,id=mynet2,chardev=char2,vhostforce,queues=$q > > - -device virtio-net- > pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=$v > > - ``` > > + SSL support > > > > - If one wishes to use multiple queues for an interface in the guest, the > > - driver in the guest operating system must be configured to do so. It is > > - recommended that the number of queues configured be equal to '$q'. > > + ``` > > + ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock > \ > > + --remote=db:Open_vSwitch,Open_vSwitch,manager_options \ > > + --private-key=db:Open_vSwitch,SSL,private_key \ > > + --certificate=Open_vSwitch,SSL,certificate \ > > + --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile --detach > > + ``` > > > > - For example, this can be done for the Linux kernel virtio-net driver > > with: > > + 3. Initialize DB (One time step) > > > > - ``` > > - ethtool -L <DEV> combined <$q> > > - ``` > > + ``` > > + ovs-vsctl --no-wait init > > + ``` > > > > - A note on the command above: > > + 4. Start vswitchd > > > > - `-L`: Changes the numbers of channels of the specified network device > > + DPDK configuration arguments can be passed to vswitchd via > Open_vSwitch > > + other_config column. The recognized configuration options are listed. > > + Defaults will be provided for all values not explicitly set. > > > > - `combined`: Changes the number of multi-purpose channels. > > + * dpdk-init > > + Specifies whether OVS should initialize and support DPDK ports. This > > is > > + a boolean, and defaults to false. > > > > -DPDK vhost-cuse: > > ----------------- > > + * dpdk-lcore-mask > > + Specifies the CPU cores on which dpdk lcore threads should be > spawned. > > + The DPDK lcore threads are used for DPDK library tasks, such as > > + library internal message processing, logging, etc. Value should be in > > + the form of a hex string (so '0x123') similar to the 'taskset' mask > > + input. > > + If not specified, the value will be determined by choosing the lowest > > + CPU core from initial cpu affinity list. Otherwise, the value will be > > + passed directly to the DPDK library. > > + For performance reasons, it is best to set this to a single core on > > + the system, rather than allow lcore threads to float. > > > > -The following sections describe the use of vhost-cuse 'dpdkvhostcuse' > ports > > -with OVS. > > + * dpdk-alloc-mem > > + This sets the total memory to preallocate from hugepages regardless of > > + processor socket. It is recommended to use dpdk-socket-mem instead. > > > > -DPDK vhost-cuse Prerequisites: > > -------------------------- > > + * dpdk-socket-mem > > + Comma separated list of memory to pre-allocate from hugepages on > specific > > + sockets. > > > > -1. DPDK 16.04 with vhost support enabled as documented in the "Building > and > > - Installing section" > > - As an additional step, you must enable vhost-cuse in DPDK by setting the > > - following additional flag in `config/common_base`: > > + * dpdk-hugepage-dir > > + Directory where hugetlbfs is mounted > > > > - `CONFIG_RTE_LIBRTE_VHOST_USER=n` > > + * dpdk-extra > > + Extra arguments to provide to DPDK EAL, as previously specified on the > > + command line. Do not pass '--no-huge' to the system in this way. > Support > > + for running the system without hugepages is nonexistent. > > > > - Following this, rebuild DPDK as per the instructions in the "Building > > and > > - Installing" section. Finally, rebuild OVS as per step 3 in the "Building > > - and Installing" section - OVS will detect that DPDK has vhost-cuse > libraries > > - compiled and in turn will enable support for it in the switch and > > disable > > - vhost-user support. > > + * cuse-dev-name > > + Option to set the vhost_cuse character device name. > > > > -2. Insert the Cuse module: > > + * vhost-sock-dir > > + Option to set the path to the vhost_user unix socket files. > > > > - `modprobe cuse` > > + NOTE: Changing any of these options requires restarting the ovs- > vswitchd > > + application. > > > > -3. Build and insert the `eventfd_link` module: > > + Open vSwitch can be started as normal. DPDK will be initialized as > > long > > + as the dpdk-init option has been set to 'true'. > > > > ``` > > - cd $DPDK_DIR/lib/librte_vhost/eventfd_link/ > > - make > > - insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko > > + export DB_SOCK=/usr/local/var/run/openvswitch/db.sock > > + ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true > > + ovs-vswitchd unix:$DB_SOCK --pidfile --detach > > ``` > > > > -4. QEMU version v2.1.0+ > > - > > - vhost-cuse will work with QEMU v2.1.0 and above, however it is > recommended to > > - use v2.2.0 if providing your VM with memory greater than 1GB due to > potential > > - issues with memory mapping larger areas. > > - Note: QEMU v1.6.2 will also work, with slightly different command line > parameters, > > - which are specified later in this document. > > - > > -Adding DPDK vhost-cuse ports to the Switch: > > --------------------------------------- > > - > > -Following the steps above to create a bridge, you can now add DPDK > vhost-cuse > > -as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-cuse ports > can have > > -arbitrary names. > > - > > - - For vhost-cuse, the name of the port type is `dpdkvhostcuse` > > + If allocated more than one GB hugepage (as for IVSHMEM), set amount > and > > + use NUMA node 0 memory: > > > > ``` > > - ovs-vsctl add-port br0 vhost-cuse-1 -- set Interface vhost-cuse-1 > > - type=dpdkvhostcuse > > + ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket- > mem="1024,0" > > + ovs-vswitchd unix:$DB_SOCK --pidfile --detach > > ``` > > > > - When attaching vhost-cuse ports to QEMU, the name provided during > the > > - add-port operation must match the ifname parameter on the QEMU > command > > - line. More instructions on this can be found in the next section. > > - > > -DPDK vhost-cuse VM configuration: > > ---------------------------------- > > - > > - vhost-cuse ports use a Linux* character device to communicate with > QEMU. > > - By default it is set to `/dev/vhost-net`. It is possible to reuse this > > - standard device for DPDK vhost, which makes setup a little simpler but > > it > > - is better practice to specify an alternative character device in order > > to > > - avoid any conflicts if kernel vhost is to be used in parallel. > > + To better scale the work loads across cores, Multiple pmd threads can > be > > + created and pinned to CPU cores by explicity specifying pmd-cpu-mask. > > + eg: To spawn 2 pmd threads and pin them to cores 1, 2 > > > > -1. This step is only needed if using an alternative character device. > > - > > - The new character device filename must be specified in the ovsdb: > > - > > - `./utilities/ovs-vsctl --no-wait set Open_vSwitch . \ > > - other_config:cuse-dev-name=my-vhost-net` > > + ``` > > + ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6 > > + ``` > > > > - In the example above, the character device to be used will be > > - `/dev/my-vhost-net`. > > + 5. Create bridge & add DPDK devices > > > > -2. This step is only needed if reusing the standard character device. It > > will > > - conflict with the kernel vhost character device so the user must first > > - remove it. > > + create a bridge with datapath_type "netdev" in the configuration > database > > > > - `rm -rf /dev/vhost-net` > > + `ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev` > > > > -3a. Configure virtio-net adaptors: > > - The following parameters must be passed to the QEMU binary: > > + Now you can add DPDK devices. OVS expects DPDK device names to > start with > > + "dpdk" and end with a portid. vswitchd should print (in the log file) > > the > > + number of dpdk devices found. > > > > ``` > > - -netdev > tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on > > - -device virtio-net-pci,netdev=net1,mac=<mac> > > + ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk > > + ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk > > ``` > > > > - Repeat the above parameters for multiple devices. > > - > > - The DPDK vhost library will negiotiate its own features, so they > > - need not be passed in as command line params. Note that as offloads > are > > - disabled this is the equivalent of setting: > > - > > - `csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off` > > - > > -3b. If using an alternative character device. It must be also explicitly > > - passed to QEMU using the `vhostfd` argument: > > + After the DPDK ports get added to switch, a polling thread > > continuously > polls > > + DPDK devices and consumes 100% of the core as can be checked from > 'top' and 'ps' cmds. > > > > ``` > > - -netdev > tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on, > > - vhostfd=<open_fd> > > - -device virtio-net-pci,netdev=net1,mac=<mac> > > + top -H > > + ps -eLo pid,psr,comm | grep pmd > > ``` > > > > - The open file descriptor must be passed to QEMU running as a child > > - process. This could be done with a simple python script. > > + Note: creating bonds of DPDK interfaces is slightly different to > > creating > > + bonds of system interfaces. For DPDK, the interface type must be > explicitly > > + set, for example: > > > > - ``` > > - #!/usr/bin/python > > - fd = os.open("/dev/usvhost", os.O_RDWR) > > - subprocess.call("qemu-system-x86_64 .... -netdev tap,id=vhostnet0,\ > > - vhost=on,vhostfd=" + fd +"...", shell=True) > > + ``` > > + ovs-vsctl add-bond br0 dpdkbond dpdk0 dpdk1 -- set Interface dpdk0 > type=dpdk -- set Interface dpdk1 type=dpdk > > + ``` > > > > - Alternatively the `qemu-wrap.py` script can be used to automate the > > - requirements specified above and can be used in conjunction with libvirt > if > > - desired. See the "DPDK vhost VM configuration with QEMU wrapper" > section > > - below. > > + 6. PMD thread statistics > > > > -4. Configure huge pages: > > - QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access > a > > - virtio-net device's virtual rings and packet buffers mapping the VM's > > - physical memory on hugetlbfs. To enable vhost-ports to map the VM's > > - memory into their process address space, pass the following parameters > > - to QEMU: > > + ``` > > + # Check current stats > > + ovs-appctl dpif-netdev/pmd-stats-show > > > > - `-object memory-backend-file,id=mem,size=4096M,mem- > path=/dev/hugepages, > > - share=on -numa node,memdev=mem -mem-prealloc` > > + # Show port/rxq assignment > > + ovs-appctl dpif-netdev/pmd-rxq-show > > > > - Note: For use with an earlier QEMU version such as v1.6.2, use the > > - following to configure hugepages instead: > > + # Clear previous stats > > + ovs-appctl dpif-netdev/pmd-stats-clear > > + ``` > > > > - `-mem-path /dev/hugepages -mem-prealloc` > > + 7. Stop vswitchd & Delete bridge > > > > -DPDK vhost-cuse VM configuration with QEMU wrapper: > > ---------------------------------------------------- > > -The QEMU wrapper script automatically detects and calls QEMU with the > > -necessary parameters. It performs the following actions: > > + ``` > > + ovs-appctl -t ovs-vswitchd exit > > + ovs-appctl -t ovsdb-server exit > > + ovs-vsctl del-br br0 > > + ``` > > > > - * Automatically detects the location of the hugetlbfs and inserts this > > - into the command line parameters. > > - * Automatically open file descriptors for each virtio-net device and > > - inserts this into the command line parameters. > > - * Calls QEMU passing both the command line parameters passed to the > > - script itself and those it has auto-detected. > > +## <a name="builddpdk"></a> 4. DPDK in the VM > > > > -Before use, you **must** edit the configuration parameters section of > the > > -script to point to the correct emulator location and set additional > > -settings. Of these settings, `emul_path` and `us_vhost_path` **must** > be > > -set. All other settings are optional. > > +DPDK 'testpmd' application can be run in the Guest VM for high speed > > +packet forwarding between vhost ports. This needs DPDK, testpmd to be > > +compiled along with kernel modules. Below are the steps to be followed > > +for running testpmd application in the VM > > > > -To use directly from the command line simply pass the wrapper some of > the > > -QEMU parameters: it will configure the rest. For example: > > + * Export the DPDK loc $DPDK_LOC to the Guest VM(/dev/sdb on VM) > > + and instantiate the Guest. > > > > -``` > > -qemu-wrap.py -cpu host -boot c -hda <disk image> -m 4096 -smp 4 > > - --enable-kvm -nographic -vnc none -net none -netdev tap,id=net1, > > - script=no,downscript=no,ifname=if1,vhost=on -device virtio-net-pci, > > - netdev=net1,mac=00:00:00:00:00:01 > > -``` > > + ``` > > + export VM_NAME=Centos-vm > > + export GUEST_MEM=4096M > > + export QCOW2_LOC=<Dir of Qcow2> > > + export QCOW2_IMAGE=$QCOW2_LOC/CentOS7_x86_64.qcow2 > > + export DPDK_LOC=/usr/src/dpdk-16.04 > > + export VHOST_SOCK_DIR=/usr/local/var/run/openvswitch > > > > -DPDK vhost-cuse VM configuration with libvirt: > > ----------------------------------------------- > > + qemu-system-x86_64 -name $VM_NAME -cpu host -enable-kvm -m > $GUEST_MEM -object memory-backend- > file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on - > numa node,memdev=mem -mem-prealloc -smp sockets=1,cores=2 -drive > file=$QCOW2_IMAGE -drive file=fat:rw:$DPDK_LOC,snapshot=off -chardev > socket,id=char0,path=$VHOST_SOCK_DIR/dpdkvhostuser0 -netdev > type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net- > pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off -chardev > socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser1 -netdev > type=vhost-user,id=mynet2,chardev=char1,vhostforce -device virtio-net- > pci,mac=00:00:00:00:00:02,netdev=mynet2,mrg_rxbuf=off --nographic - > snapshot > > + ``` > > > > -If you are using libvirt, you must enable libvirt to access the character > > -device by adding it to controllers cgroup for libvirtd using the following > > -steps. > > + * Copy the DPDK Srcs to VM and build DPDK > > > > - 1. In `/etc/libvirt/qemu.conf` add/edit the following lines: > > + ``` > > + mkdir -p /mnt/dpdk > > + mount -o iocharset=utf8 /dev/sdb1 /mnt/dpdk > > + cp -a /mnt/dpdk /root/dpdk > > + cd /root/dpdk/ > > + export DPDK_DIR=/root/dpdk/ > > + export DPDK_BUILD=/root/dpdk/x86_64-native-linuxapp-gcc > > + make install T=x86_64-native-linuxapp-gcc DESTDIR=install > > + ``` > > > > - ``` > > - 1) clear_emulator_capabilities = 0 > > - 2) user = "root" > > - 3) group = "root" > > - 4) cgroup_device_acl = [ > > - "/dev/null", "/dev/full", "/dev/zero", > > - "/dev/random", "/dev/urandom", > > - "/dev/ptmx", "/dev/kvm", "/dev/kqemu", > > - "/dev/rtc", "/dev/hpet", "/dev/net/tun", > > - "/dev/<my-vhost-device>", > > - "/dev/hugepages"] > > - ``` > > + * Build the test-pmd application > > > > - <my-vhost-device> refers to "vhost-net" if using the > > `/dev/vhost-net` > > - device. If you have specificed a different name in the database > > - using the "other_config:cuse-dev-name" parameter, please specify > that > > - filename instead. > > + ``` > > + cd app/test-pmd > > + export RTE_SDK=/root/dpdk > > + export RTE_TARGET=x86_64-native-linuxapp-gcc > > + make > > + ``` > > > > - 2. Disable SELinux or set to permissive mode > > + * Setup Huge pages and DPDK devices using UIO > > > > - 3. Restart the libvirtd process > > - For example, on Fedora: > > + ``` > > + sysctl vm.nr_hugepages=1024 > > + mkdir -p /dev/hugepages > > + mount -t hugetlbfs hugetlbfs /dev/hugepages > > + modprobe uio > > + insmod $DPDK_BUILD/kmod/igb_uio.ko > > + $DPDK_DIR/tools/dpdk_nic_bind.py --status > > + $DPDK_DIR/tools/dpdk_nic_bind.py -b igb_uio 00:03.0 00:04.0 > > + ``` > > > > - `systemctl restart libvirtd.service` > > + vhost ports pci ids can be retrieved using `lspci | grep Ethernet` cmd. > > > > -After successfully editing the configuration, you may launch your > > -vhost-enabled VM. The XML describing the VM can be configured like so > > -within the <qemu:commandline> section: > > +## <a name="ovstc"></a> 5. OVS Testcases > > > > - 1. Set up shared hugepages: > > + Below are few testcases and the list of steps to be followed. > > > > - ``` > > - <qemu:arg value='-object'/> > > - <qemu:arg value='memory-backend-file,id=mem,size=4096M,mem- > path=/dev/hugepages,share=on'/> > > - <qemu:arg value='-numa'/> > > - <qemu:arg value='node,memdev=mem'/> > > - <qemu:arg value='-mem-prealloc'/> > > - ``` > > +### 5.1 PHY-PHY > > > > - 2. Set up your tap devices: > > + The steps (1-5) in 3.3 section will create & initialize DB, start > > vswitchd and > also > > + add DPDK devices to bridge 'br0'. > > > > - ``` > > - <qemu:arg value='-netdev'/> > > - <qemu:arg > value='type=tap,id=net1,script=no,downscript=no,ifname=vhost0,vhost=on' > /> > > - <qemu:arg value='-device'/> > > - <qemu:arg value='virtio-net-pci,netdev=net1,mac=00:00:00:00:00:01'/> > > - ``` > > + 1. Add Test flows to forward packets betwen DPDK port 0 and port 1 > > > > - Repeat for as many devices as are desired, modifying the id, ifname > > - and mac as necessary. > > + ``` > > + # Clear current flows > > + ovs-ofctl del-flows br0 > > > > - Again, if you are using an alternative character device (other than > > - `/dev/vhost-net`), please specify the file descriptor like so: > > + # Add flows between port 1 (dpdk0) to port 2 (dpdk1) > > + ovs-ofctl add-flow br0 in_port=1,action=output:2 > > + ovs-ofctl add-flow br0 in_port=2,action=output:1 > > + ``` > > > > - `<qemu:arg > value='type=tap,id=net3,script=no,downscript=no,ifname=vhost0,vhost=on, > vhostfd=<open_fd>'/>` > > +### 5.2 PHY-VM-PHY [VHOST LOOPBACK] > > > > - Where <open_fd> refers to the open file descriptor of the character > device. > > - Instructions of how to retrieve the file descriptor can be found in > > the > > - "DPDK vhost VM configuration" section. > > - Alternatively, the process is automated with the qemu-wrap.py script, > > - detailed in the next section. > > + The steps (1-5) in 3.3 section will create & initialize DB, start > > vswitchd and > also > > + add DPDK devices to bridge 'br0'. > > > > -Now you may launch your VM using virt-manager, or like so: > > + 1. Add dpdkvhostuser ports to bridge 'br0' > > > > - `virsh create my_vhost_vm.xml` > > + ``` > > + ovs-vsctl add-port br0 dpdkvhostuser0 -- set Interface > dpdkvhostuser0 type=dpdkvhostuser > > + ovs-vsctl add-port br0 dpdkvhostuser1 -- set Interface > dpdkvhostuser1 type=dpdkvhostuser > > + ``` > > > > -DPDK vhost-cuse VM configuration with libvirt and QEMU wrapper: > > ----------------------------------------------------------- > > + 2. Add Test flows to forward packets betwen DPDK devices and VM ports > > > > -To use the qemu-wrapper script in conjuntion with libvirt, follow the > > -steps in the previous section before proceeding with the following steps: > > + ``` > > + # Clear current flows > > + ovs-ofctl del-flows br0 > > > > - 1. Place `qemu-wrap.py` in libvirtd's binary search PATH ($PATH) > > - Ideally in the same directory that the QEMU binary is located. > > + # Add flows > > + ovs-ofctl add-flow br0 idle_timeout=0,in_port=1,action=output:3 > > + ovs-ofctl add-flow br0 idle_timeout=0,in_port=3,action=output:1 > > + ovs-ofctl add-flow br0 idle_timeout=0,in_port=4,action=output:2 > > + ovs-ofctl add-flow br0 idle_timeout=0,in_port=2,action=output:4 > > > > - 2. Ensure that the script has the same owner/group and file permissions > > - as the QEMU binary. > > + # Dump flows > > + ovs-ofctl dump-flows br0 > > + ``` > > > > - 3. Update the VM xml file using "virsh edit VM.xml" > > + 3. start Guest VM > > > > - 1. Set the VM to use the launch script. > > - Set the emulator path contained in the `<emulator><emulator/>` > tags. > > - For example, replace: > > + Guest Configuration > > > > - `<emulator>/usr/bin/qemu-kvm<emulator/>` > > + ``` > > + | configuration | values | comments > > + |----------------------|--------|----------------- > > + | qemu thread affinity | core 5 | taskset 0x20 > > + | memory | 4GB | - > > + | cores | 2 | - > > + | Qcow2 image | CentOS7| - > > + | mrg_rxbuf | off | - > > + | export DPDK sources | yes | -drive file=fat:rw:$DPDK_LOC(seen > > as > /dev/sdb in VM) > > + ``` > > > > - with: > > + ``` > > + export VM_NAME=vhost-vm > > + export GUEST_MEM=4096M > > + export QCOW2_IMAGE=/root/CentOS7_x86_64.qcow2 > > + export DPDK_LOC=/usr/src/dpdk-16.04 > > + export VHOST_SOCK_DIR=/usr/local/var/run/openvswitch > > > > - `<emulator>/usr/bin/qemu-wrap.py<emulator/>` > > + taskset 0x20 qemu-system-x86_64 -name $VM_NAME -cpu host - > enable-kvm -m $GUEST_MEM -object memory-backend- > file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on - > numa node,memdev=mem -mem-prealloc -smp sockets=1,cores=2 -drive > file=$QCOW2_IMAGE -drive file=fat:rw:$DPDK_LOC,snapshot=off -chardev > socket,id=char0,path=$VHOST_SOCK_DIR/dpdkvhostuser0 -netdev > type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net- > pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off -chardev > socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser1 -netdev > type=vhost-user,id=mynet2,chardev=char1,vhostforce -device virtio-net- > pci,mac=00:00:00:00:00:02,netdev=mynet2,mrg_rxbuf=off --nographic - > snapshot > > + ``` > > > > - 4. Edit the Configuration Parameters section of the script to point to > > - the correct emulator location and set any additional options. If you are > > - using a alternative character device name, please set "us_vhost_path" to > the > > - location of that device. The script will automatically detect and insert > > - the correct "vhostfd" value in the QEMU command line arguments. > > + 4. DPDK Packet forwarding in Guest VM > > > > - 5. Use virt-manager to launch the VM > > + To accomplish this, DPDK and testpmd application has to be first > compiled > > + on the VM and the steps have been listed in section 4(DPDK in the > VM). > > > > -Running ovs-vswitchd with DPDK backend inside a VM > > --------------------------------------------------- > > + * Run test-pmd application > > > > -Please note that additional configuration is required if you want to run > > -ovs-vswitchd with DPDK backend inside a QEMU virtual machine. Ovs- > vswitchd > > -creates separate DPDK TX queues for each CPU core available. This > operation > > -fails inside QEMU virtual machine because, by default, VirtIO NIC provided > > -to the guest is configured to support only single TX queue and single RX > > -queue. To change this behavior, you need to turn on 'mq' (multiqueue) > > -property of all virtio-net-pci devices emulated by QEMU and used by > DPDK. > > -You may do it manually (by changing QEMU command line) or, if you use > Libvirt, > > -by adding the following string: > > + ``` > > + cd $DPDK_DIR/app/test-pmd; > > + ./testpmd -c 0x3 -n 4 --socket-mem 1024 -- --burst=64 -i -- > txqflags=0xf00 --disable-hw-vlan > > + set fwd mac_retry > > + start > > + ``` > > > > -`<driver name='vhost' queues='N'/>` > > + * Bind vNIC back to kernel once the test is completed. > > > > -to <interface> sections of all network devices used by DPDK. Parameter > 'N' > > -determines how many queues can be used by the guest. > > + ``` > > + $DPDK_DIR/tools/dpdk_nic_bind.py --bind=virtio-pci 0000:00:03.0 > > + $DPDK_DIR/tools/dpdk_nic_bind.py --bind=virtio-pci 0000:00:04.0 > > + ``` > > + Note: Appropriate PCI IDs to be passed in above example. The PCI IDs > can be > > + retrieved using '$DPDK_DIR/tools/dpdk_nic_bind.py --status' cmd. > > > > -Restrictions: > > -------------- > > +### 5.3 PHY-VM-PHY [IVSHMEM] > > > > - - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue. > > - - Currently DPDK port does not make use any offload functionality. > > - - DPDK-vHost support works with 1G huge pages. > > + IVSHMEM is supported only with 1GB huge pages. The steps for this > testcase are listed > > + in section 5.2(PVP - IVSHMEM) ADVANCED DPDK install guide. > > > > - ivshmem: > > - - If you run Open vSwitch with smaller page sizes (e.g. 2MB), you may be > > - unable to share any rings or mempools with a virtual machine. > > - This is because the current implementation of ivshmem works by > sharing > > - a single 1GB huge page from the host operating system to any guest > > - operating system through the Qemu ivshmem device. When using > smaller > > - page sizes, multiple pages may be required to hold the ring descriptors > > - and buffer pools. The Qemu ivshmem device does not allow you to > share > > - multiple file descriptors to the guest operating system. However, if > > you > > - want to share dpdkr rings with other processes on the host, you can do > > - this with smaller page sizes. > > +## <a name="ovslimits"></a> 6. Limitations > > > > - Platform and Network Interface: > > - - By default with DPDK 16.04, a maximum of 64 TX queues can be used > with an > > - Intel XL710 Network Interface on a platform with more than 64 logical > > - cores. If a user attempts to add an XL710 interface as a DPDK port type > to > > - a system as described above, an error will be reported that > > initialization > > - failed for the 65th queue. OVS will then roll back to the previous > > - successful queue initialization and use that value as the total number > > of > > - TX queues available with queue locking. If a user wishes to use more > than > > - 64 queues and avoid locking, then the > > - `CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_PF` config parameter in > DPDK must be > > - increased to the desired number of queues. Both DPDK and OVS must > be > > - recompiled for this change to take effect. > > + - Supports MTU size 1500, needs few changes in DPDK lib to fix this > > issue. > > + - Currently DPDK ports does not use HW offload functionality. > > + - DPDK IVSHMEM support works with 1G huge pages. > > > > Bug Reporting: > > -------------- > > > > Please report problems to b...@openvswitch.org. > > > > -[INSTALL.userspace.md]:INSTALL.userspace.md > > -[INSTALL.md]:INSTALL.md > > -[DPDK Linux GSG]: > http://www.dpdk.org/doc/guides/linux_gsg/build_dpdk.html#binding-and- > unbinding-network-ports-to-from-the-igb-uioor-vfio-modules > > -[DPDK Docs]: http://dpdk.org/doc > > + > > +[DPDK requirements]: > http://dpdk.org/doc/guides/linux_gsg/sys_reqs.html > > +[Download DPDK]: http://dpdk.org/browse/dpdk/refs/ > > +[Download OVS]: http://openvswitch.org/releases/ > > +[DPDK Supported NICs]: http://dpdk.org/doc/nics > > +[Build Requirements]: > https://github.com/openvswitch/ovs/blob/master/INSTALL.md#build- > requirements _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev