Thanks Aaron for reviewing the install guide. Please see my reply inline.

> -----Original Message-----
> From: Aaron Conole [mailto:acon...@redhat.com]
> Sent: Friday, May 13, 2016 4:55 PM
> To: Bodireddy, Bhanuprakash <bhanuprakash.bodire...@intel.com>
> Cc: dev@openvswitch.org
> Subject: Re: [ovs-dev] [PATCH 1/2] doc: Refactor DPDK install documentation
> 
> Bhanuprakash Bodireddy <bhanuprakash.bodire...@intel.com> writes:
> 
> > Refactor the INSTALL.DPDK in to two documents named INSTALL.DPDK and
> > INSTALL.DPDK-ADVANCED. While INSTALL.DPDK document shall facilitate
> the
> > novice user in setting up the OVS DPDK and running it out of box, the
> > ADVANCED document is targeted at expert users looking for the optimum
> > performance running dpdk datapath.
> >
> > This commit updates INSTALL.DPDK.md document.
> >
> > Signed-off-by: Bhanuprakash Bodireddy
> <bhanuprakash.bodire...@intel.com>
> > ---
> >  INSTALL.DPDK.md | 1193 +++++++++++++++------------------------------------
> ----
> >  1 file changed, 331 insertions(+), 862 deletions(-)
> >
> > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md
> > index 93f92e4..bf646bf 100644
> > --- a/INSTALL.DPDK.md
> > +++ b/INSTALL.DPDK.md
> > @@ -1,1001 +1,470 @@
> > -Using Open vSwitch with DPDK
> > -============================
> > +OVS DPDK INSTALL GUIDE
> > +================================
> >
> > -Open vSwitch can use Intel(R) DPDK lib to operate entirely in
> > -userspace. This file explains how to install and use Open vSwitch in
> > -such a mode.
> > +## Contents
> >
> > -The DPDK support of Open vSwitch is considered experimental.
> > -It has not been thoroughly tested.
> > +1. [Overview](#overview)
> > +2. [Building and Installation](#build)
> > +3. [Setup OVS DPDK datapath](#ovssetup)
> > +4. [DPDK in the VM](#builddpdk)
> > +5. [OVS Testcases](#ovstc)
> > +6. [Limitations ](#ovslimits)
> >
> > -This version of Open vSwitch should be built manually with `configure`
> > -and `make`.
> > +## <a name="overview"></a> 1. Overview
> >
> > -OVS needs a system with 1GB hugepages support.
> > +Open vSwitch can use DPDK lib to operate entirely in userspace.
> > +This file provides information on installation and use of Open vSwitch
> > +using DPDK datapath.  This version of Open vSwitch should be built
> manually
> > +with `configure` and `make`.
> >
> > -Building and Installing:
> > -------------------------
> > +The DPDK support of Open vSwitch is considered 'experimental'.
> >
> > -Required: DPDK 16.04
> > -Optional (if building with vhost-cuse): `fuse`, `fuse-devel` (`libfuse-dev`
> > -on Debian/Ubuntu)
> > +### Prerequisites
> >
> > -1. Configure build & install DPDK:
> > -  1. Set `$DPDK_DIR`
> > +* Required: DPDK 16.04
> > +* Hardware: [DPDK Supported NICs] when physical ports in use
> >
> > -     ```
> > -     export DPDK_DIR=/usr/src/dpdk-16.04
> > -     cd $DPDK_DIR
> > -     ```
> > -
> > -  2. Then run `make install` to build and install the library.
> > -     For default install without IVSHMEM:
> > -
> > -     `make install T=x86_64-native-linuxapp-gcc DESTDIR=install`
> > -
> > -     To include IVSHMEM (shared memory):
> > -
> > -     `make install T=x86_64-ivshmem-linuxapp-gcc DESTDIR=install`
> > -
> > -     For further details refer to http://dpdk.org/
> > -
> > -2. Configure & build the Linux kernel:
> > -
> > -   Refer to intel-dpdk-getting-started-guide.pdf for understanding
> > -   DPDK kernel requirement.
> > -
> > -3. Configure & build OVS:
> > -
> > -   * Non IVSHMEM:
> > -
> > -     `export DPDK_BUILD=$DPDK_DIR/x86_64-native-linuxapp-gcc/`
> > -
> > -   * IVSHMEM:
> > -
> > -     `export DPDK_BUILD=$DPDK_DIR/x86_64-ivshmem-linuxapp-gcc/`
> > -
> > -   ```
> > -   cd $(OVS_DIR)/
> > -   ./boot.sh
> > -   ./configure --with-dpdk=$DPDK_BUILD [CFLAGS="-g -O2 -Wno-cast-
> align"]
> > -   make
> > -   ```
> > -
> > -   Note: 'clang' users may specify the '-Wno-cast-align' flag to suppress
> DPDK cast-align warnings.
> > -
> > -To have better performance one can enable aggressive compiler
> optimizations and
> > -use the special instructions(popcnt, crc32) that may not be available on 
> > all
> > -machines. Instead of typing `make`, type:
> > -
> > -`make CFLAGS='-O3 -march=native'`
> > -
> > -Refer to [INSTALL.userspace.md] for general requirements of building
> userspace OVS.
> > -
> > -Using the DPDK with ovs-vswitchd:
> > ----------------------------------
> > -
> > -1. Setup system boot
> > -   Add the following options to the kernel bootline:
> > -
> > -   `default_hugepagesz=1GB hugepagesz=1G hugepages=1`
> > -
> > -2. Setup DPDK devices:
> > -
> > -   DPDK devices can be setup using either the VFIO (for DPDK 1.7+) or UIO
> > -   modules. UIO requires inserting an out of tree driver igb_uio.ko that is
> > -   available in DPDK. Setup for both methods are described below.
> > -
> > -   * UIO:
> > -     1. insert uio.ko: `modprobe uio`
> > -     2. insert igb_uio.ko: `insmod $DPDK_BUILD/kmod/igb_uio.ko`
> > -     3. Bind network device to igb_uio:
> > -         `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio eth1`
> > -
> > -   * VFIO:
> > -
> > -     VFIO needs to be supported in the kernel and the BIOS. More
> information
> > -     can be found in the [DPDK Linux GSG].
> > -
> > -     1. Insert vfio-pci.ko: `modprobe vfio-pci`
> > -     2. Set correct permissions on vfio device: `sudo /usr/bin/chmod a+x
> /dev/vfio`
> > -        and: `sudo /usr/bin/chmod 0666 /dev/vfio/*`
> > -     3. Bind network device to vfio-pci:
> > -        `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=vfio-pci eth1`
> > -
> > -3. Mount the hugetable filesystem
> > -
> > -   `mount -t hugetlbfs -o pagesize=1G none /dev/hugepages`
> > -
> > -   Ref to http://www.dpdk.org/doc/quick-start for verifying DPDK setup.
> > -
> > -4. Follow the instructions in [INSTALL.md] to install only the
> > -   userspace daemons and utilities (via 'make install').
> > -   1. First time only db creation (or clearing):
> > -
> > -      ```
> > -      mkdir -p /usr/local/etc/openvswitch
> > -      mkdir -p /usr/local/var/run/openvswitch
> > -      rm /usr/local/etc/openvswitch/conf.db
> > -      ovsdb-tool create /usr/local/etc/openvswitch/conf.db  \
> > -             /usr/local/share/openvswitch/vswitch.ovsschema
> > -      ```
> > -
> > -   2. Start ovsdb-server
> > -
> > -      ```
> > -      ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock
> \
> > -          --remote=db:Open_vSwitch,Open_vSwitch,manager_options \
> > -          --private-key=db:Open_vSwitch,SSL,private_key \
> > -          --certificate=Open_vSwitch,SSL,certificate \
> > -          --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile 
> > --detach
> > -      ```
> > -
> > -    3. First time after db creation, initialize:
> > -
> > -       ```
> > -       ovs-vsctl --no-wait init
> > -       ```
> > -
> > -5. Start vswitchd:
> > -
> > -   DPDK configuration arguments can be passed to vswitchd via
> Open_vSwitch
> > -   other_config column. The recognized configuration options are listed.
> > -   Defaults will be provided for all values not explicitly set.
> > -
> > -   * dpdk-init
> > -   Specifies whether OVS should initialize and support DPDK ports. This is
> > -   a boolean, and defaults to false.
> > -
> > -   * dpdk-lcore-mask
> > -   Specifies the CPU cores on which dpdk lcore threads should be spawned.
> > -   The DPDK lcore threads are used for DPDK library tasks, such as
> > -   library internal message processing, logging, etc. Value should be in
> > -   the form of a hex string (so '0x123') similar to the 'taskset' mask
> > -   input.
> > -   If not specified, the value will be determined by choosing the lowest
> > -   CPU core from initial cpu affinity list. Otherwise, the value will be
> > -   passed directly to the DPDK library.
> > -   For performance reasons, it is best to set this to a single core on
> > -   the system, rather than allow lcore threads to float.
> > -
> > -   * dpdk-alloc-mem
> > -   This sets the total memory to preallocate from hugepages regardless of
> > -   processor socket. It is recommended to use dpdk-socket-mem instead.
> > -
> > -   * dpdk-socket-mem
> > -   Comma separated list of memory to pre-allocate from hugepages on
> specific
> > -   sockets.
> > -
> > -   * dpdk-hugepage-dir
> > -   Directory where hugetlbfs is mounted
> > -
> > -   * dpdk-extra
> > -   Extra arguments to provide to DPDK EAL, as previously specified on the
> > -   command line. Do not pass '--no-huge' to the system in this way. Support
> > -   for running the system without hugepages is nonexistent.
> > -
> > -   * cuse-dev-name
> > -   Option to set the vhost_cuse character device name.
> > -
> > -   * vhost-sock-dir
> > -   Option to set the path to the vhost_user unix socket files.
> > -
> > -   NOTE: Changing any of these options requires restarting the ovs-
> vswitchd
> > -   application.
> > -
> > -   Open vSwitch can be started as normal. DPDK will be initialized as long
> > -   as the dpdk-init option has been set to 'true'.
> > -
> > -
> > -   ```
> > -   export DB_SOCK=/usr/local/var/run/openvswitch/db.sock
> > -   ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
> > -   ovs-vswitchd unix:$DB_SOCK --pidfile --detach
> > -   ```
> > -
> > -   If allocated more than one GB hugepage (as for IVSHMEM), set amount
> and
> > -   use NUMA node 0 memory:
> > -
> > -   ```
> > -   ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-
> mem="1024,0"
> > -   ovs-vswitchd unix:$DB_SOCK --pidfile --detach
> > -   ```
> > -
> > -6. Add bridge & ports
> > -
> > -   To use ovs-vswitchd with DPDK, create a bridge with datapath_type
> > -   "netdev" in the configuration database.  For example:
> > -
> > -   `ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev`
> > -
> > -   Now you can add dpdk devices. OVS expects DPDK device names to start
> with
> > -   "dpdk" and end with a portid. vswitchd should print (in the log file) 
> > the
> > -   number of dpdk devices found.
> > -
> > -   ```
> > -   ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
> > -   ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk
> > -   ```
> > -
> > -   Once first DPDK port is added to vswitchd, it creates a Polling thread 
> > and
> > -   polls dpdk device in continuous loop. Therefore CPU utilization
> > -   for that thread is always 100%.
> > -
> > -   Note: creating bonds of DPDK interfaces is slightly different to 
> > creating
> > -   bonds of system interfaces.  For DPDK, the interface type must be
> explicitly
> > -   set, for example:
> > -
> > -   ```
> > -   ovs-vsctl add-bond br0 dpdkbond dpdk0 dpdk1 -- set Interface dpdk0
> type=dpdk -- set Interface dpdk1 type=dpdk
> > -   ```
> > -
> > -7. Add test flows
> > -
> > -   Test flow script across NICs (assuming ovs in /usr/src/ovs):
> > -   Execute script:
> > -
> > -   ```
> > -   #! /bin/sh
> > -   # Move to command directory
> > -   cd /usr/src/ovs/utilities/
> > -
> > -   # Clear current flows
> > -   ./ovs-ofctl del-flows br0
> > -
> > -   # Add flows between port 1 (dpdk0) to port 2 (dpdk1)
> > -   ./ovs-ofctl add-flow br0 in_port=1,action=output:2
> > -   ./ovs-ofctl add-flow br0 in_port=2,action=output:1
> > -   ```
> > -
> > -8. QoS usage example
> > -
> > -   Assuming you have a vhost-user port transmitting traffic consisting of
> > -   packets of size 64 bytes, the following command would limit the egress
> > -   transmission rate of the port to ~1,000,000 packets per second:
> > -
> > -   `ovs-vsctl set port vhost-user0 qos=@newqos -- --id=@newqos create
> qos
> > -   type=egress-policer other-config:cir=46000000 other-config:cbs=2048`
> > -
> > -   To examine the QoS configuration of the port:
> > -
> > -   `ovs-appctl -t ovs-vswitchd qos/show vhost-user0`
> > -
> > -   To clear the QoS configuration from the port and ovsdb use the
> following:
> > -
> > -   `ovs-vsctl destroy QoS vhost-user0 -- clear Port vhost-user0 qos`
> > -
> > -   For more details regarding egress-policer parameters please refer to the
> > -   vswitch.xml.
> > -
> > -Performance Tuning:
> > --------------------
> > -
> > -  1. PMD affinitization
> > -
> > -   A poll mode driver (pmd) thread handles the I/O of all DPDK
> > -   interfaces assigned to it. A pmd thread will busy loop through
> > -   the assigned port/rxq's polling for packets, switch the packets
> > -   and send to a tx port if required. Typically, it is found that
> > -   a pmd thread is CPU bound, meaning that the greater the CPU
> > -   occupancy the pmd thread can get, the better the performance. To
> > -   that end, it is good practice to ensure that a pmd thread has as
> > -   many cycles on a core available to it as possible. This can be
> > -   achieved by affinitizing the pmd thread with a core that has no
> > -   other workload. See section 7 below for a description of how to
> > -   isolate cores for this purpose also.
> > -
> > -   The following command can be used to specify the affinity of the
> > -   pmd thread(s).
> > -
> > -   `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=<hex
> string>`
> > -
> > -   By setting a bit in the mask, a pmd thread is created and pinned
> > -   to the corresponding CPU core. e.g. to run a pmd thread on core 1
> > -
> > -   `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=2`
> > -
> > -   For more information, please refer to the Open_vSwitch TABLE
> section in
> > -
> > -   `man ovs-vswitchd.conf.db`
> > -
> > -   Note, that a pmd thread on a NUMA node is only created if there is
> > -   at least one DPDK interface from that NUMA node added to OVS.
> > -
> > -  2. Multiple poll mode driver threads
> > -
> > -   With pmd multi-threading support, OVS creates one pmd thread
> > -   for each NUMA node by default. However, it can be seen that in
> cases
> > -   where there are multiple ports/rxq's producing traffic, performance
> > -   can be improved by creating multiple pmd threads running on
> separate
> > -   cores. These pmd threads can then share the workload by each being
> > -   responsible for different ports/rxq's. Assignment of ports/rxq's to
> > -   pmd threads is done automatically.
> > -
> > -   The following command can be used to specify the affinity of the
> > -   pmd threads.
> > -
> > -   `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=<hex
> string>`
> > -
> > -   A set bit in the mask means a pmd thread is created and pinned
> > -   to the corresponding CPU core. e.g. to run pmd threads on core 1
> and 2
> > -
> > -   `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6`
> > -
> > -   For more information, please refer to the Open_vSwitch TABLE
> section in
> > -
> > -   `man ovs-vswitchd.conf.db`
> > -
> > -   For example, when using dpdk and dpdkvhostuser ports in a bi-
> directional
> > -   VM loopback as shown below, spreading the workload over 2 or 4
> pmd
> > -   threads shows significant improvements as there will be more total
> CPU
> > -   occupancy available.
> > -
> > -   NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1
> > -
> > -   The following command can be used to confirm that the port/rxq
> assignment
> > -   to pmd threads is as required:
> > -
> > -   `ovs-appctl dpif-netdev/pmd-rxq-show`
> > +## <a name="build"></a> 2. Building and Installation
> >
> > -   This can also be checked with:
> > +### 2.1 Configure & build the Linux kernel
> >
> > -   ```
> > -   top -H
> > -   taskset -p <pid_of_pmd>
> > -   ```
> > +On Linux Distros running kernel version >= 3.0, kernel rebuild is not
> required
> > +and only grub cmdline needs to be updated for enabling IOMMU [VFIO
> support - 3.2].
> > +For older kernels, check if kernel is built with  UIO, HUGETLBFS,
> PROC_PAGE_MONITOR,
> > +HPET, HPET_MMAP support.
> >
> > -   To understand where most of the pmd thread time is spent and
> whether the
> > -   caches are being utilized, these commands can be used:
> > -
> > -   ```
> > -   # Clear previous stats
> > -   ovs-appctl dpif-netdev/pmd-stats-clear
> > -
> > -   # Check current stats
> > -   ovs-appctl dpif-netdev/pmd-stats-show
> > -   ```
> > -
> > -  3. DPDK port Rx Queues
> > -
> > -   `ovs-vsctl set Interface <DPDK interface> options:n_rxq=<integer>`
> > -
> > -   The command above sets the number of rx queues for DPDK
> interface.
> > -   The rx queues are assigned to pmd threads on the same NUMA node
> in a
> > -   round-robin fashion.  For more information, please refer to the
> > -   Open_vSwitch TABLE section in
> > -
> > -   `man ovs-vswitchd.conf.db`
> > -
> > -  4. Exact Match Cache
> > -
> > -   Each pmd thread contains one EMC. After initial flow setup in the
> > -   datapath, the EMC contains a single table and provides the lowest
> level
> > -   (fastest) switching for DPDK ports. If there is a miss in the EMC then
> > -   the next level where switching will occur is the datapath classifier.
> > -   Missing in the EMC and looking up in the datapath classifier incurs a
> > -   significant performance penalty. If lookup misses occur in the EMC
> > -   because it is too small to handle the number of flows, its size can
> > -   be increased. The EMC size can be modified by editing the define
> > -   EM_FLOW_HASH_SHIFT in lib/dpif-netdev.c.
> > -
> > -   As mentioned above an EMC is per pmd thread. So an alternative
> way of
> > -   increasing the aggregate amount of possible flow entries in EMC and
> > -   avoiding datapath classifier lookups is to have multiple pmd threads
> > -   running. This can be done as described in section 2.
> > -
> > -  5. Compiler options
> > -
> > -   The default compiler optimization level is '-O2'. Changing this to
> > -   more aggressive compiler optimizations such as '-O3' or
> > -   '-Ofast -march=native' with gcc can produce performance gains.
> > -
> > -  6. Simultaneous Multithreading (SMT)
> > -
> > -   With SMT enabled, one physical core appears as two logical cores
> > -   which can improve performance.
> > -
> > -   SMT can be utilized to add additional pmd threads without
> consuming
> > -   additional physical cores. Additional pmd threads may be added in
> the
> > -   same manner as described in section 2. If trying to minimize the use
> > -   of physical cores for pmd threads, care must be taken to set the
> > -   correct bits in the pmd-cpu-mask to ensure that the pmd threads are
> > -   pinned to SMT siblings.
> > -
> > -   For example, when using 2x 10 core processors in a dual socket
> system
> > -   with HT enabled, /proc/cpuinfo will report 40 logical cores. To use
> > -   two logical cores which share the same physical core for pmd
> threads,
> > -   the following command can be used to identify a pair of logical cores.
> > -
> > -   `cat /sys/devices/system/cpu/cpuN/topology/thread_siblings_list`
> > -
> > -   where N is the logical core number. In this example, it would show
> that
> > -   cores 1 and 21 share the same physical core. The pmd-cpu-mask to
> enable
> > -   two pmd threads running on these two logical cores (one physical
> core)
> > -   is.
> > -
> > -   `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=100002`
> > -
> > -   Note that SMT is enabled by the Hyper-Threading section in the
> > -   BIOS, and as such will apply to the whole system. So the impact of
> > -   enabling/disabling it for the whole system should be considered
> > -   e.g. If workloads on the system can scale across multiple cores,
> > -   SMT may very beneficial. However, if they do not and perform best
> > -   on a single physical core, SMT may not be beneficial.
> > -
> > -  7. The isolcpus kernel boot parameter
> > -
> > -   isolcpus can be used on the kernel bootline to isolate cores from the
> > -   kernel scheduler and hence dedicate them to OVS or other packet
> > -   forwarding related workloads. For example a Linux kernel boot-line
> > -   could be:
> > -
> > -   'GRUB_CMDLINE_LINUX_DEFAULT="quiet hugepagesz=1G
> hugepages=4 default_hugepagesz=1G 'intel_iommu=off' isolcpus=1-19"'
> > -
> > -  8. NUMA/Cluster On Die
> > -
> > -   Ideally inter NUMA datapaths should be avoided where possible as
> packets
> > -   will go across QPI and there may be a slight performance penalty
> when
> > -   compared with intra NUMA datapaths. On Intel Xeon Processor E5
> v3,
> > -   Cluster On Die is introduced on models that have 10 cores or more.
> > -   This makes it possible to logically split a socket into two NUMA
> regions
> > -   and again it is preferred where possible to keep critical datapaths
> > -   within the one cluster.
> > -
> > -   It is good practice to ensure that threads that are in the datapath are
> > -   pinned to cores in the same NUMA area. e.g. pmd threads and
> QEMU vCPUs
> > -   responsible for forwarding.
> > -
> > -  9. Rx Mergeable buffers
> > -
> > -   Rx Mergeable buffers is a virtio feature that allows chaining of
> multiple
> > -   virtio descriptors to handle large packet sizes. As such, large packets
> > -   are handled by reserving and chaining multiple free descriptors
> > -   together. Mergeable buffer support is negotiated between the virtio
> > -   driver and virtio device and is supported by the DPDK vhost library.
> > -   This behavior is typically supported and enabled by default, however
> > -   in the case where the user knows that rx mergeable buffers are not
> needed
> > -   i.e. jumbo frames are not needed, it can be forced off by adding
> > -   mrg_rxbuf=off to the QEMU command line options. By not reserving
> multiple
> > -   chains of descriptors it will make more individual virtio descriptors
> > -   available for rx to the guest using dpdkvhost ports and this can
> improve
> > -   performance.
> > -
> > -  10. Packet processing in the guest
> > -
> > -   It is good practice whether simply forwarding packets from one
> > -   interface to another or more complex packet processing in the guest,
> > -   to ensure that the thread performing this work has as much CPU
> > -   occupancy as possible. For example when the DPDK sample
> application
> > -   `testpmd` is used to forward packets in the guest, multiple QEMU
> vCPU
> > -   threads can be created. Taskset can then be used to affinitize the
> > -   vCPU thread responsible for forwarding to a dedicated core not used
> > -   for other general processing on the host system.
> > -
> > -  11. DPDK virtio pmd in the guest
> > -
> > -   dpdkvhostcuse or dpdkvhostuser ports can be used to accelerate the
> path
> > -   to the guest using the DPDK vhost library. This library is compatible
> with
> > -   virtio-net drivers in the guest but significantly better performance
> can
> > -   be observed when using the DPDK virtio pmd driver in the guest. The
> DPDK
> > -   `testpmd` application can be used in the guest as an example
> application
> > -   that forwards packet from one DPDK vhost port to another. An
> example of
> > -   running `testpmd` in the guest can be seen here.
> > -
> > -   `./testpmd -c 0x3 -n 4 --socket-mem 512 -- --burst=64 -i --
> txqflags=0xf00 --disable-hw-vlan --forward-mode=io --auto-start`
> > -
> > -   See below information on dpdkvhostcuse and dpdkvhostuser ports.
> > -   See [DPDK Docs] for more information on `testpmd`.
> > +Details system requirements can be found at [DPDK requirements]
> >
> > +### 2.2 Install DPDK
> > +  1. [Download DPDK] and extract the file, for example in to /usr/src
> > +     and set DPDK_DIR
> >
> > +     ```
> > +     cd /usr/src/
> > +     unzip dpdk-16.04.zip
> >
> > -DPDK Rings :
> > -------------
> > +     export DPDK_DIR=/usr/src/dpdk-16.04
> > +     cd $DPDK_DIR
> > +     ```
> >
> > -Following the steps above to create a bridge, you can now add dpdk rings
> > -as a port to the vswitch.  OVS will expect the DPDK ring device name to
> > -start with dpdkr and end with a portid.
> > +  2. Configure, Install DPDK
> >
> > -`ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 type=dpdkr`
> > +     Build and install the DPDK library.
> >
> > -DPDK rings client test application
> > +     ```
> > +     export DPDK_BUILD=$DPDK_DIR/x86_64-native-linuxapp-gcc
> > +     make install T=x86_64-native-linuxapp-gcc DESTDIR=install
> > +     ```
> >
> > -Included in the test directory is a sample DPDK application for testing
> > -the rings.  This is from the base dpdk directory and modified to work
> > -with the ring naming used within ovs.
> > +     Note: For previous DPDK releases, Set
> `CONFIG_RTE_BUILD_COMBINE_LIBS=y` in
> > +     `config/common_linuxapp` to generate single library file.
> >
> > -location tests/ovs_client
> > +### 2.3 Install OVS
> > +  OVS can be downloaded in compressed format from the OVS release
> page (or)
> > +  cloned from git repository if user intends to develop and contribute
> > +  patches upstream.
> >
> > -To run the client :
> > +  - [Download OVS] tar ball and extract the file, for example in to 
> > /usr/src
> > +     and set OVS_DIR
> >
> > -```
> > -cd /usr/src/ovs/tests/
> > -ovsclient -c 1 -n 4 --proc-type=secondary -- -n "port id you gave dpdkr"
> > -```
> > +     ```
> > +     cd /usr/src/
> > +     tar -zxvf openvswitch-2.5.0.tar.gz
> > +     export OVS_DIR=/usr/src/openvswitch-2.5.0
> > +     ```
> >
> > -In the case of the dpdkr example above the "port id you gave dpdkr" is 0.
> > +  - Clone the Git repository for OVS, for example in to /usr/src
> >
> > -It is essential to have --proc-type=secondary
> > +     ```
> > +     cd /usr/src/
> > +     git clone https://github.com/openvswitch/ovs.git
> > +     export OVS_DIR=/usr/src/ovs
> > +     ```
> >
> > -The application simply receives an mbuf on the receive queue of the
> > -ethernet ring and then places that same mbuf on the transmit ring of
> > -the ethernet ring.  It is a trivial loopback application.
> > +  - Install OVS dependencies
> >
> > -DPDK rings in VM (IVSHMEM shared memory communications)
> > --------------------------------------------------------
> > +     GNU make, GCC 4.x (or) Clang 3.4  (Mandatory)
> > +     libssl, libcap-ng, Python 2.7  (Optional)
> > +     More information can be found at [Build Requirements]
> >
> > -In addition to executing the client in the host, you can execute it within
> > -a guest VM. To do so you will need a patched qemu.  You can download
> the
> > -patch and getting started guide at :
> > +  - Configure, Install OVS
> >
> > -https://01.org/packet-processing/downloads
> > +     ```
> > +     cd $OVS_DIR
> > +     ./boot.sh
> > +     ./configure --with-dpdk
> > +     make install
> > +     ```
> >
> > -A general rule of thumb for better performance is that the client
> > -application should not be assigned the same dpdk core mask "-c" as
> > -the vswitchd.
> > +## <a name="ovssetup"></a> 3. Setup OVS with DPDK datapath
> >
> > -DPDK vhost:
> > ------------
> > +### 3.1 Setup Hugepages
> 
> I'd just move the section from the ADVANCED doc to here (at least the 2mb
> huge pages, and 2mb huge pages persistence information). It doesn't make
> sense I think to repeat it. The 1G and others could be left in advanced
> as a performance tuning option (but there's really not much performance
> difference between them, afaict).

Agree. I would bring in 2MB hugepages here(persistent and runtime allocation)
and remove 2MB info from Advance Guide. Also for persistent allocation I would 
change it 
to use sysctl mechanism(writing to /etc/sysctl.d/hugepages.conf) instead of 
updating
the grub cmdline in the next version.

> 
> > -DPDK 16.04 supports two types of vhost:
> > +  Allocate and mount 2M Huge pages
> >
> > -1. vhost-user
> > -2. vhost-cuse
> > +  ```
> > +  echo N > /proc/sys/vm/nr_hugepages, where N = No. of huge pages
> allocated
> > +  mount -t hugetlbfs none /dev/hugepages
> > +  ```
> >
> > -Whatever type of vhost is enabled in the DPDK build specified, is the type
> > -that will be enabled in OVS. By default, vhost-user is enabled in DPDK.
> > -Therefore, unless vhost-cuse has been enabled in DPDK, vhost-user ports
> > -will be enabled in OVS.
> > -Please note that support for vhost-cuse is intended to be deprecated in
> OVS
> > -in a future release.
> > +### 3.2 Setup DPDK devices using VFIO
> >
> > -DPDK vhost-user:
> > -----------------
> > +  - Supported with DPDK release >= 1.7 and kernel version >= 3.6
> > +  - VFIO needs support from BIOS and kernel.
> > +  - BIOS changes:
> >
> > -The following sections describe the use of vhost-user 'dpdkvhostuser'
> ports
> > -with OVS.
> > +    Enable VT-d, can be verified from `dmesg | grep -e DMAR -e IOMMU`
> output
> >
> > -DPDK vhost-user Prerequisites:
> > --------------------------
> > +  - GRUB bootline:
> >
> > -1. DPDK 16.04 with vhost support enabled as documented in the "Building
> and
> > -   Installing section"
> > +    Add `iommu=pt intel_iommu=on`, can be verified from `cat
> /proc/cmdline` output
> >
> > -2. QEMU version v2.1.0+
> > +  - Load modules and bind the NIC to VFIO driver
> >
> > -   QEMU v2.1.0 will suffice, but it is recommended to use v2.2.0 if 
> > providing
> > -   your VM with memory greater than 1GB due to potential issues with
> memory
> > -   mapping larger areas.
> > +    ```
> > +    modprobe vfio-pci
> > +    sudo /usr/bin/chmod a+x /dev/vfio
> > +    sudo /usr/bin/chmod 0666 /dev/vfio/*
> > +    $DPDK_DIR/tools/dpdk_nic_bind.py --bind=vfio-pci eth1
> > +    $DPDK_DIR/tools/dpdk_nic_bind.py --status
> > +    ```
> >
> > -Adding DPDK vhost-user ports to the Switch:
> > ---------------------------------------
> > +  Note: If using older DPDK release (or) running kernels < 3.6 UIO drivers
> to be used,
> > +  please check section 4 (DPDK devices using UIO) for the steps.
> >
> > -Following the steps above to create a bridge, you can now add DPDK
> vhost-user
> > -as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-user ports can
> > -have arbitrary names, except that forward and backward slashes are
> prohibited
> > -in the names.
> > +### 3.3 Setup OVS
> >
> > -  -  For vhost-user, the name of the port type is `dpdkvhostuser`
> > +  1. DB creation (One time step)
> >
> >       ```
> > -     ovs-vsctl add-port br0 vhost-user-1 -- set Interface vhost-user-1
> > -     type=dpdkvhostuser
> > +     mkdir -p /usr/local/etc/openvswitch
> > +     mkdir -p /usr/local/var/run/openvswitch
> > +     rm /usr/local/etc/openvswitch/conf.db
> > +     ovsdb-tool create /usr/local/etc/openvswitch/conf.db  \
> > +            /usr/local/share/openvswitch/vswitch.ovsschema
> >       ```
> >
> > -     This action creates a socket located at
> > -     `/usr/local/var/run/openvswitch/vhost-user-1`, which you must provide
> > -     to your VM on the QEMU command line. More instructions on this can
> be
> > -     found in the next section "DPDK vhost-user VM configuration"
> > -  - If you wish for the vhost-user sockets to be created in a 
> > sub-directory of
> > -    `/usr/local/var/run/openvswitch`, you may specify this directory in the
> > -    ovsdb like so:
> > -
> > -      `./utilities/ovs-vsctl --no-wait \
> > -        set Open_vSwitch . other_config:vhost-sock-dir=subdir`
> > -
> > -DPDK vhost-user VM configuration:
> > ----------------------------------
> > -Follow the steps below to attach vhost-user port(s) to a VM.
> > -
> > -1. Configure sockets.
> > -   Pass the following parameters to QEMU to attach a vhost-user device:
> > -
> > -   ```
> > -   -chardev socket,id=char1,path=/usr/local/var/run/openvswitch/vhost-
> user-1
> > -   -netdev type=vhost-user,id=mynet1,chardev=char1,vhostforce
> > -   -device virtio-net-pci,mac=00:00:00:00:00:01,netdev=mynet1
> > -   ```
> > -
> > -   ...where vhost-user-1 is the name of the vhost-user port added
> > -   to the switch.
> > -   Repeat the above parameters for multiple devices, changing the
> > -   chardev path and id as necessary. Note that a separate and different
> > -   chardev path needs to be specified for each vhost-user device. For
> > -   example you have a second vhost-user port named 'vhost-user-2', you
> > -   append your QEMU command line with an additional set of parameters:
> > +  2. Start ovsdb-server
> >
> > -   ```
> > -   -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-
> user-2
> > -   -netdev type=vhost-user,id=mynet2,chardev=char2,vhostforce
> > -   -device virtio-net-pci,mac=00:00:00:00:00:02,netdev=mynet2
> > -   ```
> > +     No SSL support
> >
> > -2. Configure huge pages.
> > -   QEMU must allocate the VM's memory on hugetlbfs. vhost-user ports
> access
> > -   a virtio-net device's virtual rings and packet buffers mapping the VM's
> > -   physical memory on hugetlbfs. To enable vhost-user ports to map the
> VM's
> > -   memory into their process address space, pass the following paramters
> > -   to QEMU:
> > -
> > -   ```
> > -   -object memory-backend-file,id=mem,size=4096M,mem-
> path=/dev/hugepages,
> > -   share=on
> > -   -numa node,memdev=mem -mem-prealloc
> > -   ```
> > -
> > -3. Optional: Enable multiqueue support
> > -   The vhost-user interface must be configured in Open vSwitch with the
> > -   desired amount of queues with:
> > -
> > -   ```
> > -   ovs-vsctl set Interface vhost-user-2 options:n_rxq=<requested queues>
> > -   ```
> > -
> > -   QEMU needs to be configured as well.
> > -   The $q below should match the queues requested in OVS (if $q is more,
> > -   packets will not be received).
> > -   The $v is the number of vectors, which is '$q x 2 + 2'.
> > +     ```
> > +     ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock
> \
> > +         --remote=db:Open_vSwitch,Open_vSwitch,manager_options \
> > +         --pidfile --detach
> > +     ```
> >
> > -   ```
> > -   -chardev socket,id=char2,path=/usr/local/var/run/openvswitch/vhost-
> user-2
> > -   -netdev type=vhost-
> user,id=mynet2,chardev=char2,vhostforce,queues=$q
> > -   -device virtio-net-
> pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=$v
> > -   ```
> > +     SSL support
> >
> > -   If one wishes to use multiple queues for an interface in the guest, the
> > -   driver in the guest operating system must be configured to do so. It is
> > -   recommended that the number of queues configured be equal to '$q'.
> > +     ```
> > +     ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock
> \
> > +         --remote=db:Open_vSwitch,Open_vSwitch,manager_options \
> > +         --private-key=db:Open_vSwitch,SSL,private_key \
> > +         --certificate=Open_vSwitch,SSL,certificate \
> > +         --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile --detach
> > +     ```
> >
> > -   For example, this can be done for the Linux kernel virtio-net driver 
> > with:
> > +  3. Initialize DB (One time step)
> >
> > -   ```
> > -   ethtool -L <DEV> combined <$q>
> > -   ```
> > +     ```
> > +     ovs-vsctl --no-wait init
> > +     ```
> >
> > -   A note on the command above:
> > +  4. Start vswitchd
> >
> > -   `-L`: Changes the numbers of channels of the specified network device
> > +     DPDK configuration arguments can be passed to vswitchd via
> Open_vSwitch
> > +     other_config column. The recognized configuration options are listed.
> > +     Defaults will be provided for all values not explicitly set.
> >
> > -   `combined`: Changes the number of multi-purpose channels.
> > +     * dpdk-init
> > +     Specifies whether OVS should initialize and support DPDK ports. This 
> > is
> > +     a boolean, and defaults to false.
> >
> > -DPDK vhost-cuse:
> > -----------------
> > +     * dpdk-lcore-mask
> > +     Specifies the CPU cores on which dpdk lcore threads should be
> spawned.
> > +     The DPDK lcore threads are used for DPDK library tasks, such as
> > +     library internal message processing, logging, etc. Value should be in
> > +     the form of a hex string (so '0x123') similar to the 'taskset' mask
> > +     input.
> > +     If not specified, the value will be determined by choosing the lowest
> > +     CPU core from initial cpu affinity list. Otherwise, the value will be
> > +     passed directly to the DPDK library.
> > +     For performance reasons, it is best to set this to a single core on
> > +     the system, rather than allow lcore threads to float.
> >
> > -The following sections describe the use of vhost-cuse 'dpdkvhostcuse'
> ports
> > -with OVS.
> > +     * dpdk-alloc-mem
> > +     This sets the total memory to preallocate from hugepages regardless of
> > +     processor socket. It is recommended to use dpdk-socket-mem instead.
> >
> > -DPDK vhost-cuse Prerequisites:
> > --------------------------
> > +     * dpdk-socket-mem
> > +     Comma separated list of memory to pre-allocate from hugepages on
> specific
> > +     sockets.
> >
> > -1. DPDK 16.04 with vhost support enabled as documented in the "Building
> and
> > -   Installing section"
> > -   As an additional step, you must enable vhost-cuse in DPDK by setting the
> > -   following additional flag in `config/common_base`:
> > +     * dpdk-hugepage-dir
> > +     Directory where hugetlbfs is mounted
> >
> > -   `CONFIG_RTE_LIBRTE_VHOST_USER=n`
> > +     * dpdk-extra
> > +     Extra arguments to provide to DPDK EAL, as previously specified on the
> > +     command line. Do not pass '--no-huge' to the system in this way.
> Support
> > +     for running the system without hugepages is nonexistent.
> >
> > -   Following this, rebuild DPDK as per the instructions in the "Building 
> > and
> > -   Installing" section. Finally, rebuild OVS as per step 3 in the "Building
> > -   and Installing" section - OVS will detect that DPDK has vhost-cuse
> libraries
> > -   compiled and in turn will enable support for it in the switch and 
> > disable
> > -   vhost-user support.
> > +     * cuse-dev-name
> > +     Option to set the vhost_cuse character device name.
> >
> > -2. Insert the Cuse module:
> > +     * vhost-sock-dir
> > +     Option to set the path to the vhost_user unix socket files.
> >
> > -     `modprobe cuse`
> > +     NOTE: Changing any of these options requires restarting the ovs-
> vswitchd
> > +     application.
> >
> > -3. Build and insert the `eventfd_link` module:
> > +     Open vSwitch can be started as normal. DPDK will be initialized as 
> > long
> > +     as the dpdk-init option has been set to 'true'.
> >
> >       ```
> > -     cd $DPDK_DIR/lib/librte_vhost/eventfd_link/
> > -     make
> > -     insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko
> > +     export DB_SOCK=/usr/local/var/run/openvswitch/db.sock
> > +     ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true
> > +     ovs-vswitchd unix:$DB_SOCK --pidfile --detach
> >       ```
> >
> > -4. QEMU version v2.1.0+
> > -
> > -   vhost-cuse will work with QEMU v2.1.0 and above, however it is
> recommended to
> > -   use v2.2.0 if providing your VM with memory greater than 1GB due to
> potential
> > -   issues with memory mapping larger areas.
> > -   Note: QEMU v1.6.2 will also work, with slightly different command line
> parameters,
> > -   which are specified later in this document.
> > -
> > -Adding DPDK vhost-cuse ports to the Switch:
> > ---------------------------------------
> > -
> > -Following the steps above to create a bridge, you can now add DPDK
> vhost-cuse
> > -as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-cuse ports
> can have
> > -arbitrary names.
> > -
> > -  -  For vhost-cuse, the name of the port type is `dpdkvhostcuse`
> > +     If allocated more than one GB hugepage (as for IVSHMEM), set amount
> and
> > +     use NUMA node 0 memory:
> >
> >       ```
> > -     ovs-vsctl add-port br0 vhost-cuse-1 -- set Interface vhost-cuse-1
> > -     type=dpdkvhostcuse
> > +     ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket-
> mem="1024,0"
> > +     ovs-vswitchd unix:$DB_SOCK --pidfile --detach
> >       ```
> >
> > -     When attaching vhost-cuse ports to QEMU, the name provided during
> the
> > -     add-port operation must match the ifname parameter on the QEMU
> command
> > -     line. More instructions on this can be found in the next section.
> > -
> > -DPDK vhost-cuse VM configuration:
> > ----------------------------------
> > -
> > -   vhost-cuse ports use a Linux* character device to communicate with
> QEMU.
> > -   By default it is set to `/dev/vhost-net`. It is possible to reuse this
> > -   standard device for DPDK vhost, which makes setup a little simpler but 
> > it
> > -   is better practice to specify an alternative character device in order 
> > to
> > -   avoid any conflicts if kernel vhost is to be used in parallel.
> > +     To better scale the work loads across cores, Multiple pmd threads can
> be
> > +     created and pinned to CPU cores by explicity specifying pmd-cpu-mask.
> > +     eg: To spawn 2 pmd threads and pin them to cores 1, 2
> >
> > -1. This step is only needed if using an alternative character device.
> > -
> > -   The new character device filename must be specified in the ovsdb:
> > -
> > -        `./utilities/ovs-vsctl --no-wait set Open_vSwitch . \
> > -                          other_config:cuse-dev-name=my-vhost-net`
> > +     ```
> > +     ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6
> > +     ```
> >
> > -   In the example above, the character device to be used will be
> > -   `/dev/my-vhost-net`.
> > +  5. Create bridge & add DPDK devices
> >
> > -2. This step is only needed if reusing the standard character device. It 
> > will
> > -   conflict with the kernel vhost character device so the user must first
> > -   remove it.
> > +     create a bridge with datapath_type "netdev" in the configuration
> database
> >
> > -       `rm -rf /dev/vhost-net`
> > +     `ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev`
> >
> > -3a. Configure virtio-net adaptors:
> > -   The following parameters must be passed to the QEMU binary:
> > +     Now you can add DPDK devices. OVS expects DPDK device names to
> start with
> > +     "dpdk" and end with a portid. vswitchd should print (in the log file) 
> > the
> > +     number of dpdk devices found.
> >
> >       ```
> > -     -netdev
> tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on
> > -     -device virtio-net-pci,netdev=net1,mac=<mac>
> > +     ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk
> > +     ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk
> >       ```
> >
> > -     Repeat the above parameters for multiple devices.
> > -
> > -     The DPDK vhost library will negiotiate its own features, so they
> > -     need not be passed in as command line params. Note that as offloads
> are
> > -     disabled this is the equivalent of setting:
> > -
> > -     `csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off`
> > -
> > -3b. If using an alternative character device. It must be also explicitly
> > -    passed to QEMU using the `vhostfd` argument:
> > +     After the DPDK ports get added to switch, a polling thread 
> > continuously
> polls
> > +     DPDK devices and consumes 100% of the core as can be checked from
> 'top' and 'ps' cmds.
> >
> >       ```
> > -     -netdev
> tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on,
> > -     vhostfd=<open_fd>
> > -     -device virtio-net-pci,netdev=net1,mac=<mac>
> > +     top -H
> > +     ps -eLo pid,psr,comm | grep pmd
> >       ```
> >
> > -     The open file descriptor must be passed to QEMU running as a child
> > -     process. This could be done with a simple python script.
> > +     Note: creating bonds of DPDK interfaces is slightly different to 
> > creating
> > +     bonds of system interfaces.  For DPDK, the interface type must be
> explicitly
> > +     set, for example:
> >
> > -       ```
> > -       #!/usr/bin/python
> > -       fd = os.open("/dev/usvhost", os.O_RDWR)
> > -       subprocess.call("qemu-system-x86_64 .... -netdev tap,id=vhostnet0,\
> > -                        vhost=on,vhostfd=" + fd +"...", shell=True)
> > +     ```
> > +     ovs-vsctl add-bond br0 dpdkbond dpdk0 dpdk1 -- set Interface dpdk0
> type=dpdk -- set Interface dpdk1 type=dpdk
> > +     ```
> >
> > -   Alternatively the `qemu-wrap.py` script can be used to automate the
> > -   requirements specified above and can be used in conjunction with libvirt
> if
> > -   desired. See the "DPDK vhost VM configuration with QEMU wrapper"
> section
> > -   below.
> > +  6. PMD thread statistics
> >
> > -4. Configure huge pages:
> > -   QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access
> a
> > -   virtio-net device's virtual rings and packet buffers mapping the VM's
> > -   physical memory on hugetlbfs. To enable vhost-ports to map the VM's
> > -   memory into their process address space, pass the following parameters
> > -   to QEMU:
> > +     ```
> > +     # Check current stats
> > +       ovs-appctl dpif-netdev/pmd-stats-show
> >
> > -     `-object memory-backend-file,id=mem,size=4096M,mem-
> path=/dev/hugepages,
> > -      share=on -numa node,memdev=mem -mem-prealloc`
> > +     # Show port/rxq assignment
> > +       ovs-appctl dpif-netdev/pmd-rxq-show
> >
> > -   Note: For use with an earlier QEMU version such as v1.6.2, use the
> > -   following to configure hugepages instead:
> > +     # Clear previous stats
> > +       ovs-appctl dpif-netdev/pmd-stats-clear
> > +     ```
> >
> > -     `-mem-path /dev/hugepages -mem-prealloc`
> > +  7. Stop vswitchd & Delete bridge
> >
> > -DPDK vhost-cuse VM configuration with QEMU wrapper:
> > ----------------------------------------------------
> > -The QEMU wrapper script automatically detects and calls QEMU with the
> > -necessary parameters. It performs the following actions:
> > +     ```
> > +     ovs-appctl -t ovs-vswitchd exit
> > +     ovs-appctl -t ovsdb-server exit
> > +     ovs-vsctl del-br br0
> > +     ```
> >
> > -  * Automatically detects the location of the hugetlbfs and inserts this
> > -    into the command line parameters.
> > -  * Automatically open file descriptors for each virtio-net device and
> > -    inserts this into the command line parameters.
> > -  * Calls QEMU passing both the command line parameters passed to the
> > -    script itself and those it has auto-detected.
> > +## <a name="builddpdk"></a> 4. DPDK in the VM
> >
> > -Before use, you **must** edit the configuration parameters section of
> the
> > -script to point to the correct emulator location and set additional
> > -settings. Of these settings, `emul_path` and `us_vhost_path` **must**
> be
> > -set. All other settings are optional.
> > +DPDK 'testpmd' application can be run in the Guest VM for high speed
> > +packet forwarding between vhost ports. This needs DPDK, testpmd to be
> > +compiled along with kernel modules. Below are the steps to be followed
> > +for running testpmd application in the VM
> >
> > -To use directly from the command line simply pass the wrapper some of
> the
> > -QEMU parameters: it will configure the rest. For example:
> > +  * Export the DPDK loc $DPDK_LOC to the Guest VM(/dev/sdb on VM)
> > +    and instantiate the Guest.
> >
> > -```
> > -qemu-wrap.py -cpu host -boot c -hda <disk image> -m 4096 -smp 4
> > -  --enable-kvm -nographic -vnc none -net none -netdev tap,id=net1,
> > -  script=no,downscript=no,ifname=if1,vhost=on -device virtio-net-pci,
> > -  netdev=net1,mac=00:00:00:00:00:01
> > -```
> > +  ```
> > +  export VM_NAME=Centos-vm
> > +  export GUEST_MEM=4096M
> > +  export QCOW2_LOC=<Dir of Qcow2>
> > +  export QCOW2_IMAGE=$QCOW2_LOC/CentOS7_x86_64.qcow2
> > +  export DPDK_LOC=/usr/src/dpdk-16.04
> > +  export VHOST_SOCK_DIR=/usr/local/var/run/openvswitch
> >
> > -DPDK vhost-cuse VM configuration with libvirt:
> > -----------------------------------------------
> > +  qemu-system-x86_64 -name $VM_NAME -cpu host -enable-kvm -m
> $GUEST_MEM -object memory-backend-
> file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on -
> numa node,memdev=mem -mem-prealloc -smp sockets=1,cores=2 -drive
> file=$QCOW2_IMAGE -drive file=fat:rw:$DPDK_LOC,snapshot=off -chardev
> socket,id=char0,path=$VHOST_SOCK_DIR/dpdkvhostuser0 -netdev
> type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-
> pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off -chardev
> socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser1 -netdev
> type=vhost-user,id=mynet2,chardev=char1,vhostforce -device virtio-net-
> pci,mac=00:00:00:00:00:02,netdev=mynet2,mrg_rxbuf=off --nographic -
> snapshot
> > +  ```
> >
> > -If you are using libvirt, you must enable libvirt to access the character
> > -device by adding it to controllers cgroup for libvirtd using the following
> > -steps.
> > +  * Copy the DPDK Srcs to VM and build DPDK
> >
> > -     1. In `/etc/libvirt/qemu.conf` add/edit the following lines:
> > +  ```
> > +  mkdir -p /mnt/dpdk
> > +  mount -o iocharset=utf8 /dev/sdb1 /mnt/dpdk
> > +  cp -a /mnt/dpdk /root/dpdk
> > +  cd /root/dpdk/
> > +  export DPDK_DIR=/root/dpdk/
> > +  export DPDK_BUILD=/root/dpdk/x86_64-native-linuxapp-gcc
> > +  make install T=x86_64-native-linuxapp-gcc DESTDIR=install
> > +  ```
> >
> > -        ```
> > -        1) clear_emulator_capabilities = 0
> > -        2) user = "root"
> > -        3) group = "root"
> > -        4) cgroup_device_acl = [
> > -               "/dev/null", "/dev/full", "/dev/zero",
> > -               "/dev/random", "/dev/urandom",
> > -               "/dev/ptmx", "/dev/kvm", "/dev/kqemu",
> > -               "/dev/rtc", "/dev/hpet", "/dev/net/tun",
> > -               "/dev/<my-vhost-device>",
> > -               "/dev/hugepages"]
> > -        ```
> > +  * Build the test-pmd application
> >
> > -        <my-vhost-device> refers to "vhost-net" if using the 
> > `/dev/vhost-net`
> > -        device. If you have specificed a different name in the database
> > -        using the "other_config:cuse-dev-name" parameter, please specify
> that
> > -        filename instead.
> > +  ```
> > +  cd app/test-pmd
> > +  export RTE_SDK=/root/dpdk
> > +  export RTE_TARGET=x86_64-native-linuxapp-gcc
> > +  make
> > +  ```
> >
> > -     2. Disable SELinux or set to permissive mode
> > +  * Setup Huge pages and DPDK devices using UIO
> >
> > -     3. Restart the libvirtd process
> > -        For example, on Fedora:
> > +  ```
> > +  sysctl vm.nr_hugepages=1024
> > +  mkdir -p /dev/hugepages
> > +  mount -t hugetlbfs hugetlbfs /dev/hugepages
> > +  modprobe uio
> > +  insmod $DPDK_BUILD/kmod/igb_uio.ko
> > +  $DPDK_DIR/tools/dpdk_nic_bind.py --status
> > +  $DPDK_DIR/tools/dpdk_nic_bind.py -b igb_uio 00:03.0 00:04.0
> > +  ```
> >
> > -          `systemctl restart libvirtd.service`
> > +  vhost ports pci ids can be retrieved using `lspci | grep Ethernet` cmd.
> >
> > -After successfully editing the configuration, you may launch your
> > -vhost-enabled VM. The XML describing the VM can be configured like so
> > -within the <qemu:commandline> section:
> > +## <a name="ovstc"></a> 5. OVS Testcases
> >
> > -     1. Set up shared hugepages:
> > +  Below are few testcases and the list of steps to be followed.
> >
> > -     ```
> > -     <qemu:arg value='-object'/>
> > -     <qemu:arg value='memory-backend-file,id=mem,size=4096M,mem-
> path=/dev/hugepages,share=on'/>
> > -     <qemu:arg value='-numa'/>
> > -     <qemu:arg value='node,memdev=mem'/>
> > -     <qemu:arg value='-mem-prealloc'/>
> > -     ```
> > +### 5.1 PHY-PHY
> >
> > -     2. Set up your tap devices:
> > +  The steps (1-5) in 3.3 section will create & initialize DB, start 
> > vswitchd and
> also
> > +  add DPDK devices to bridge 'br0'.
> >
> > -     ```
> > -     <qemu:arg value='-netdev'/>
> > -     <qemu:arg
> value='type=tap,id=net1,script=no,downscript=no,ifname=vhost0,vhost=on'
> />
> > -     <qemu:arg value='-device'/>
> > -     <qemu:arg value='virtio-net-pci,netdev=net1,mac=00:00:00:00:00:01'/>
> > -     ```
> > +  1. Add Test flows to forward packets betwen DPDK port 0 and port 1
> >
> > -     Repeat for as many devices as are desired, modifying the id, ifname
> > -     and mac as necessary.
> > +       ```
> > +       # Clear current flows
> > +       ovs-ofctl del-flows br0
> >
> > -     Again, if you are using an alternative character device (other than
> > -     `/dev/vhost-net`), please specify the file descriptor like so:
> > +       # Add flows between port 1 (dpdk0) to port 2 (dpdk1)
> > +       ovs-ofctl add-flow br0 in_port=1,action=output:2
> > +       ovs-ofctl add-flow br0 in_port=2,action=output:1
> > +       ```
> >
> > -     `<qemu:arg
> value='type=tap,id=net3,script=no,downscript=no,ifname=vhost0,vhost=on,
> vhostfd=<open_fd>'/>`
> > +### 5.2 PHY-VM-PHY [VHOST LOOPBACK]
> >
> > -     Where <open_fd> refers to the open file descriptor of the character
> device.
> > -     Instructions of how to retrieve the file descriptor can be found in 
> > the
> > -     "DPDK vhost VM configuration" section.
> > -     Alternatively, the process is automated with the qemu-wrap.py script,
> > -     detailed in the next section.
> > +  The steps (1-5) in 3.3 section will create & initialize DB, start 
> > vswitchd and
> also
> > +  add DPDK devices to bridge 'br0'.
> >
> > -Now you may launch your VM using virt-manager, or like so:
> > +  1. Add dpdkvhostuser ports to bridge 'br0'
> >
> > -    `virsh create my_vhost_vm.xml`
> > +       ```
> > +       ovs-vsctl add-port br0 dpdkvhostuser0 -- set Interface
> dpdkvhostuser0 type=dpdkvhostuser
> > +       ovs-vsctl add-port br0 dpdkvhostuser1 -- set Interface
> dpdkvhostuser1 type=dpdkvhostuser
> > +       ```
> >
> > -DPDK vhost-cuse VM configuration with libvirt and QEMU wrapper:
> > -----------------------------------------------------------
> > +  2. Add Test flows to forward packets betwen DPDK devices and VM ports
> >
> > -To use the qemu-wrapper script in conjuntion with libvirt, follow the
> > -steps in the previous section before proceeding with the following steps:
> > +       ```
> > +       # Clear current flows
> > +       ovs-ofctl del-flows br0
> >
> > -  1. Place `qemu-wrap.py` in libvirtd's binary search PATH ($PATH)
> > -     Ideally in the same directory that the QEMU binary is located.
> > +       # Add flows
> > +       ovs-ofctl add-flow br0 idle_timeout=0,in_port=1,action=output:3
> > +       ovs-ofctl add-flow br0 idle_timeout=0,in_port=3,action=output:1
> > +       ovs-ofctl add-flow br0 idle_timeout=0,in_port=4,action=output:2
> > +       ovs-ofctl add-flow br0 idle_timeout=0,in_port=2,action=output:4
> >
> > -  2. Ensure that the script has the same owner/group and file permissions
> > -     as the QEMU binary.
> > +       # Dump flows
> > +       ovs-ofctl dump-flows br0
> > +       ```
> >
> > -  3. Update the VM xml file using "virsh edit VM.xml"
> > +  3. start Guest VM
> >
> > -       1. Set the VM to use the launch script.
> > -          Set the emulator path contained in the `<emulator><emulator/>`
> tags.
> > -          For example, replace:
> > +       Guest Configuration
> >
> > -            `<emulator>/usr/bin/qemu-kvm<emulator/>`
> > +       ```
> > +       | configuration        | values | comments
> > +       |----------------------|--------|-----------------
> > +       | qemu thread affinity | core 5 | taskset 0x20
> > +       | memory               | 4GB    | -
> > +       | cores                | 2      | -
> > +       | Qcow2 image          | CentOS7| -
> > +       | mrg_rxbuf            | off    | -
> > +       | export DPDK sources  | yes    | -drive file=fat:rw:$DPDK_LOC(seen 
> > as
> /dev/sdb in VM)
> > +       ```
> >
> > -            with:
> > +       ```
> > +       export VM_NAME=vhost-vm
> > +       export GUEST_MEM=4096M
> > +       export QCOW2_IMAGE=/root/CentOS7_x86_64.qcow2
> > +       export DPDK_LOC=/usr/src/dpdk-16.04
> > +       export VHOST_SOCK_DIR=/usr/local/var/run/openvswitch
> >
> > -            `<emulator>/usr/bin/qemu-wrap.py<emulator/>`
> > +       taskset 0x20 qemu-system-x86_64 -name $VM_NAME -cpu host -
> enable-kvm -m $GUEST_MEM -object memory-backend-
> file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on -
> numa node,memdev=mem -mem-prealloc -smp sockets=1,cores=2 -drive
> file=$QCOW2_IMAGE -drive file=fat:rw:$DPDK_LOC,snapshot=off -chardev
> socket,id=char0,path=$VHOST_SOCK_DIR/dpdkvhostuser0 -netdev
> type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net-
> pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off -chardev
> socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser1 -netdev
> type=vhost-user,id=mynet2,chardev=char1,vhostforce -device virtio-net-
> pci,mac=00:00:00:00:00:02,netdev=mynet2,mrg_rxbuf=off --nographic -
> snapshot
> > +       ```
> >
> > -  4. Edit the Configuration Parameters section of the script to point to
> > -  the correct emulator location and set any additional options. If you are
> > -  using a alternative character device name, please set "us_vhost_path" to
> the
> > -  location of that device. The script will automatically detect and insert
> > -  the correct "vhostfd" value in the QEMU command line arguments.
> > +  4. DPDK Packet forwarding in Guest VM
> >
> > -  5. Use virt-manager to launch the VM
> > +     To accomplish this, DPDK and testpmd application has to be first
> compiled
> > +     on the VM and the steps have been listed in section 4(DPDK in the
> VM).
> >
> > -Running ovs-vswitchd with DPDK backend inside a VM
> > ---------------------------------------------------
> > +       * Run test-pmd application
> >
> > -Please note that additional configuration is required if you want to run
> > -ovs-vswitchd with DPDK backend inside a QEMU virtual machine. Ovs-
> vswitchd
> > -creates separate DPDK TX queues for each CPU core available. This
> operation
> > -fails inside QEMU virtual machine because, by default, VirtIO NIC provided
> > -to the guest is configured to support only single TX queue and single RX
> > -queue. To change this behavior, you need to turn on 'mq' (multiqueue)
> > -property of all virtio-net-pci devices emulated by QEMU and used by
> DPDK.
> > -You may do it manually (by changing QEMU command line) or, if you use
> Libvirt,
> > -by adding the following string:
> > +       ```
> > +       cd $DPDK_DIR/app/test-pmd;
> > +       ./testpmd -c 0x3 -n 4 --socket-mem 1024 -- --burst=64 -i --
> txqflags=0xf00 --disable-hw-vlan
> > +       set fwd mac_retry
> > +       start
> > +       ```
> >
> > -`<driver name='vhost' queues='N'/>`
> > +       * Bind vNIC back to kernel once the test is completed.
> >
> > -to <interface> sections of all network devices used by DPDK. Parameter
> 'N'
> > -determines how many queues can be used by the guest.
> > +       ```
> > +       $DPDK_DIR/tools/dpdk_nic_bind.py --bind=virtio-pci 0000:00:03.0
> > +       $DPDK_DIR/tools/dpdk_nic_bind.py --bind=virtio-pci 0000:00:04.0
> > +       ```
> > +       Note: Appropriate PCI IDs to be passed in above example. The PCI IDs
> can be
> > +       retrieved using '$DPDK_DIR/tools/dpdk_nic_bind.py --status' cmd.
> >
> > -Restrictions:
> > --------------
> > +### 5.3 PHY-VM-PHY [IVSHMEM]
> >
> > -  - Work with 1500 MTU, needs few changes in DPDK lib to fix this issue.
> > -  - Currently DPDK port does not make use any offload functionality.
> > -  - DPDK-vHost support works with 1G huge pages.
> > +  IVSHMEM is supported only with 1GB huge pages. The steps for this
> testcase are listed
> > +  in section 5.2(PVP - IVSHMEM) ADVANCED DPDK install guide.
> >
> > -  ivshmem:
> > -  - If you run Open vSwitch with smaller page sizes (e.g. 2MB), you may be
> > -    unable to share any rings or mempools with a virtual machine.
> > -    This is because the current implementation of ivshmem works by
> sharing
> > -    a single 1GB huge page from the host operating system to any guest
> > -    operating system through the Qemu ivshmem device. When using
> smaller
> > -    page sizes, multiple pages may be required to hold the ring descriptors
> > -    and buffer pools. The Qemu ivshmem device does not allow you to
> share
> > -    multiple file descriptors to the guest operating system. However, if 
> > you
> > -    want to share dpdkr rings with other processes on the host, you can do
> > -    this with smaller page sizes.
> > +## <a name="ovslimits"></a> 6. Limitations
> >
> > -  Platform and Network Interface:
> > -  - By default with DPDK 16.04, a maximum of 64 TX queues can be used
> with an
> > -    Intel XL710 Network Interface on a platform with more than 64 logical
> > -    cores. If a user attempts to add an XL710 interface as a DPDK port type
> to
> > -    a system as described above, an error will be reported that 
> > initialization
> > -    failed for the 65th queue. OVS will then roll back to the previous
> > -    successful queue initialization and use that value as the total number 
> > of
> > -    TX queues available with queue locking. If a user wishes to use more
> than
> > -    64 queues and avoid locking, then the
> > -    `CONFIG_RTE_LIBRTE_I40E_QUEUE_NUM_PER_PF` config parameter in
> DPDK must be
> > -    increased to the desired number of queues. Both DPDK and OVS must
> be
> > -    recompiled for this change to take effect.
> > +  - Supports MTU size 1500, needs few changes in DPDK lib to fix this 
> > issue.
> > +  - Currently DPDK ports does not use HW offload functionality.
> > +  - DPDK IVSHMEM support works with 1G huge pages.
> >
> >  Bug Reporting:
> >  --------------
> >
> >  Please report problems to b...@openvswitch.org.
> >
> > -[INSTALL.userspace.md]:INSTALL.userspace.md
> > -[INSTALL.md]:INSTALL.md
> > -[DPDK Linux GSG]:
> http://www.dpdk.org/doc/guides/linux_gsg/build_dpdk.html#binding-and-
> unbinding-network-ports-to-from-the-igb-uioor-vfio-modules
> > -[DPDK Docs]: http://dpdk.org/doc
> > +
> > +[DPDK requirements]:
> http://dpdk.org/doc/guides/linux_gsg/sys_reqs.html
> > +[Download DPDK]: http://dpdk.org/browse/dpdk/refs/
> > +[Download OVS]: http://openvswitch.org/releases/
> > +[DPDK Supported NICs]: http://dpdk.org/doc/nics
> > +[Build Requirements]:
> https://github.com/openvswitch/ovs/blob/master/INSTALL.md#build-
> requirements
_______________________________________________
dev mailing list
dev@openvswitch.org
http://openvswitch.org/mailman/listinfo/dev

Reply via email to