Thanks Flavio for reviewing the install guide in detail. My comments inline.
>-----Original Message----- >From: Flavio Leitner [mailto:f...@sysclose.org] >Sent: Tuesday, May 31, 2016 9:44 PM >To: Bodireddy, Bhanuprakash <bhanuprakash.bodire...@intel.com> >Cc: dev@openvswitch.org; Traynor, Kevin <kevin.tray...@intel.com> >Subject: Re: [ovs-dev] [PATCH 1/2] doc: Refactor DPDK install documentation > > >Hi, > >Thanks for doing this. >I have some comments inline. >fbl > > >On Thu, May 26, 2016 at 01:46:42PM +0100, Bhanuprakash Bodireddy wrote: >> Refactor the INSTALL.DPDK in to two documents named INSTALL.DPDK and >> INSTALL.DPDK-ADVANCED. While INSTALL.DPDK document shall facilitate >the >> novice user in setting up the OVS DPDK and running it out of box, the >> ADVANCED document is targeted at expert users looking for the optimum >> performance running dpdk datapath. >> >> This commit updates INSTALL.DPDK.md document. >> >> Signed-off-by: Bhanuprakash Bodireddy ><bhanuprakash.bodire...@intel.com> >> --- >> INSTALL.DPDK.md | 1299 ++++++++++++++++++-------------------------------- >----- >> 1 file changed, 429 insertions(+), 870 deletions(-) >> >> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md >> index 68735cc..561631f 100644 >> --- a/INSTALL.DPDK.md >> +++ b/INSTALL.DPDK.md >> @@ -1,1020 +1,579 @@ >> -Using Open vSwitch with DPDK >> -============================ >> +OVS DPDK INSTALL GUIDE >> +================================ >> >> - `./testpmd -c 0x3 -n 4 --socket-mem 512 -- --burst=64 -i -- >txqflags=0xf00 --disable-hw-vlan --forward-mode=io --auto-start` >> + Note: For IVSHMEM, Set `export DPDK_TARGET=x86_64-ivshmem- >linuxapp-gcc` >> >> - See below information on dpdkvhostcuse and dpdkvhostuser ports. >> - See [DPDK Docs] for more information on `testpmd`. >> +### 2.3 Install OVS > >It seems to me that this section could be better. We have a good INSTALL.md >file covering all options, additional details and also have pointers to more >specifics like how to do in Fedora or Debian. > >For instance, Fedora spec file in branch master allows you to build with >DPDK support with a simple command line: > > $ make rpm-fedora RPMBUILD_OPT="--with dpdk" > >Nothing wrong documenting a generic recipe, but I missed the other >options. Perhaps something like: > >2.3 Install OVS > OVS can be installed using different methods. The only requirement to >install >with DPDK support enabled is to pass an extra argument to ./configure. You >can find >additional information in INSTALL.md or more specific instructions for a >distribution >in the other INSTALL.*.md files available in the repository. This documents >focus on >a generic recipe that should work for most cases.... Good point, I will rework this section keeping your comments in mind. Also I would add hyperlinks to INSTALL.md and would redirect users doing distribution specific builds to respective pages. > >I am sure it can be reworded in a better way, but it shows my point. > > >> + OVS can be downloaded in compressed format from the OVS release >page (or) >> + cloned from git repository if user intends to develop and contribute >> + patches upstream. >> >> + - [Download OVS] tar ball and extract the file, for example in to /usr/src >> + and set OVS_DIR >> >> + ``` >> + wget -O ovs.tar https://github.com/openvswitch/ovs/tarball/master >> + mkdir -p /usr/src/ovs >> + tar -xvf ovs.tar -C /usr/src/ovs --strip-components=1 >> + export OVS_DIR=/usr/src/ovs >> + ``` >> >> -DPDK Rings : >> ------------- >> + - Clone the Git repository for OVS, for example in to /usr/src >> >> -Following the steps above to create a bridge, you can now add dpdk rings >> -as a port to the vswitch. OVS will expect the DPDK ring device name to >> -start with dpdkr and end with a portid. >> + ``` >> + cd /usr/src/ >> + git clone https://github.com/openvswitch/ovs.git >> + export OVS_DIR=/usr/src/ovs >> + ``` >> >> -`ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 type=dpdkr` >> + - Install OVS dependencies >> >> -DPDK rings client test application >> + GNU make, GCC 4.x (or) Clang 3.4 (Mandatory) >> + libssl, libcap-ng, Python 2.7 (Optional) >> + More information can be found at [Build Requirements] >> >> -Included in the test directory is a sample DPDK application for testing >> -the rings. This is from the base dpdk directory and modified to work >> -with the ring naming used within ovs. >> + - Configure, Install OVS >> >> -location tests/ovs_client >> + ``` >> + cd $OVS_DIR >> + ./boot.sh >> + ./configure --with-dpdk=$DPDK_BUILD >> + make install >> + ``` >> >> -To run the client : >> + Note: Passing DPDK_BUILD can be skipped if DPDK library is installed in >> + standard locations i.e `./configure --with-dpdk` should suffice. >> >> -``` >> -cd /usr/src/ovs/tests/ >> -ovsclient -c 1 -n 4 --proc-type=secondary -- -n "port id you gave dpdkr" >> -``` >> +## <a name="ovssetup"></a> 3. Setup OVS with DPDK datapath >> >> -In the case of the dpdkr example above the "port id you gave dpdkr" is 0. >> +### 3.1 Setup Hugepages >> >> -It is essential to have --proc-type=secondary >> + Allocate and mount 2M Huge pages: >> >> -The application simply receives an mbuf on the receive queue of the >> -ethernet ring and then places that same mbuf on the transmit ring of >> -the ethernet ring. It is a trivial loopback application. >> + - For persistent allocation of huge pages, write to hugepages.conf file >> + in /etc/sysctl.d >> >> -DPDK rings in VM (IVSHMEM shared memory communications) >> -------------------------------------------------------- >> + `echo 'vm.nr_hugepages=2048' > /etc/sysctl.d/hugepages.conf` >> >> -In addition to executing the client in the host, you can execute it within >> -a guest VM. To do so you will need a patched qemu. You can download the >> -patch and getting started guide at : >> + - For run-time allocation of huge pages >> >> -https://01.org/packet-processing/downloads >> + `sysctl -w vm.nr_hugepages=N` where N = No. of 2M huge pages >allocated >> >> -A general rule of thumb for better performance is that the client >> -application should not be assigned the same dpdk core mask "-c" as >> -the vswitchd. >> + - To verify hugepage configuration >> >> -DPDK vhost: >> ------------ >> + `grep HugePages_ /proc/meminfo` >> >> -DPDK 16.04 supports two types of vhost: >> + - Mount hugepages > >I'd say something 'Mount hugepages if not already mounted by default' >otherwise it can be double mounted and that would hide libvirt dir. Agree. I will add this. > > >> >> -1. vhost-user >> -2. vhost-cuse >> + `mount -t hugetlbfs none /dev/hugepages` >> >> -Whatever type of vhost is enabled in the DPDK build specified, is the type >> -that will be enabled in OVS. By default, vhost-user is enabled in DPDK. >> -Therefore, unless vhost-cuse has been enabled in DPDK, vhost-user ports >> -will be enabled in OVS. >> - ``` >> + SSL support >> >> - If one wishes to use multiple queues for an interface in the guest, the >> - driver in the guest operating system must be configured to do so. It is >> - recommended that the number of queues configured be equal to '$q'. >> + ``` >> + ovsdb-server --remote=punix:/usr/local/var/run/openvswitch/db.sock \ >> + --remote=db:Open_vSwitch,Open_vSwitch,manager_options \ >> + --private-key=db:Open_vSwitch,SSL,private_key \ >> + --certificate=Open_vSwitch,SSL,certificate \ >> + --bootstrap-ca-cert=db:Open_vSwitch,SSL,ca_cert --pidfile --detach >> + ``` >> >> - For example, this can be done for the Linux kernel virtio-net driver >> with: >> + 3. Initialize DB (One time step) >> >> - ``` >> - ethtool -L <DEV> combined <$q> >> - ``` >> + ``` >> + ovs-vsctl --no-wait init >> + ``` >> >> - A note on the command above: >> + 4. Start vswitchd >> > >This section can be simplified by just listing the main options and >pointing to ovs-vswitchd.conf.db(5) for descriptions. Infact I had the same impression when I took changes to new install guide. I will rework and simplify this section. > > >> - `-L`: Changes the numbers of channels of the specified network device >> + DPDK configuration arguments can be passed to vswitchd via >Open_vSwitch >> + other_config column. The recognized configuration options are listed. >> + Defaults will be provided for all values not explicitly set. >> >> - `combined`: Changes the number of multi-purpose channels. >> + * dpdk-init >> + Specifies whether OVS should initialize and support DPDK ports. This is >> + a boolean, and defaults to false. >> >> -DPDK vhost-cuse: >> ----------------- >> + * dpdk-lcore-mask >> + Specifies the CPU cores on which dpdk lcore threads should be >spawned. >> + The DPDK lcore threads are used for DPDK library tasks, such as >> + library internal message processing, logging, etc. Value should be in >> + the form of a hex string (so '0x123') similar to the 'taskset' mask >> + input. >> + If not specified, the value will be determined by choosing the lowest >> + CPU core from initial cpu affinity list. Otherwise, the value will be >> + passed directly to the DPDK library. >> + For performance reasons, it is best to set this to a single core on >> + the system, rather than allow lcore threads to float. >> >> -The following sections describe the use of vhost-cuse 'dpdkvhostcuse' >ports >> -with OVS. >> + * dpdk-alloc-mem >> + This sets the total memory to preallocate from hugepages regardless of >> + processor socket. It is recommended to use dpdk-socket-mem instead. >> >> -DPDK vhost-cuse Prerequisites: >> -------------------------- >> + * dpdk-socket-mem >> + Comma separated list of memory to pre-allocate from hugepages on >specific >> + sockets. >> >> -1. DPDK 16.04 with vhost support enabled as documented in the "Building >and >> - Installing section" >> - As an additional step, you must enable vhost-cuse in DPDK by setting the >> - following additional flag in `config/common_base`: >> + * dpdk-hugepage-dir >> + Directory where hugetlbfs is mounted >> >> - `CONFIG_RTE_LIBRTE_VHOST_USER=n` >> + * dpdk-extra >> + Extra arguments to provide to DPDK EAL, as previously specified on the >> + command line. Do not pass '--no-huge' to the system in this way. >Support >> + for running the system without hugepages is nonexistent. >> >> - Following this, rebuild DPDK as per the instructions in the "Building and >> - Installing" section. Finally, rebuild OVS as per step 3 in the "Building >> - and Installing" section - OVS will detect that DPDK has vhost-cuse >> libraries >> - compiled and in turn will enable support for it in the switch and disable >> - vhost-user support. >> + * cuse-dev-name >> + Option to set the vhost_cuse character device name. >> >> -2. Insert the Cuse module: >> + * vhost-sock-dir >> + Option to set the path to the vhost_user unix socket files. >> >> - `modprobe cuse` >> + NOTE: Changing any of these options requires restarting the ovs- >vswitchd >> + application. >> >> -3. Build and insert the `eventfd_link` module: >> + Open vSwitch can be started as normal. DPDK will be initialized as long >> + as the dpdk-init option has been set to 'true'. >> >> ``` >> - cd $DPDK_DIR/lib/librte_vhost/eventfd_link/ >> - make >> - insmod $DPDK_DIR/lib/librte_vhost/eventfd_link.ko >> + export DB_SOCK=/usr/local/var/run/openvswitch/db.sock >> + ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true >> + ovs-vswitchd unix:$DB_SOCK --pidfile --detach >> ``` >> >> -4. QEMU version v2.1.0+ >> - >> - vhost-cuse will work with QEMU v2.1.0 and above, however it is >recommended to >> - use v2.2.0 if providing your VM with memory greater than 1GB due to >potential >> - issues with memory mapping larger areas. >> - Note: QEMU v1.6.2 will also work, with slightly different command line >parameters, >> - which are specified later in this document. >> - >> -Adding DPDK vhost-cuse ports to the Switch: >> --------------------------------------- >> - >> -Following the steps above to create a bridge, you can now add DPDK vhost- >cuse >> -as a port to the vswitch. Unlike DPDK ring ports, DPDK vhost-cuse ports can >have >> -arbitrary names. >> - >> - - For vhost-cuse, the name of the port type is `dpdkvhostcuse` >> + If allocated more than one GB hugepage (as for IVSHMEM), set amount >and >> + use NUMA node 0 memory. For details on using ivshmem with DPDK, >refer to >> + [OVS Testcases]. >> >> ``` >> - ovs-vsctl add-port br0 vhost-cuse-1 -- set Interface vhost-cuse-1 >> - type=dpdkvhostcuse >> + ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-socket- >mem="1024,0" >> + ovs-vswitchd unix:$DB_SOCK --pidfile --detach >> ``` >> >> - When attaching vhost-cuse ports to QEMU, the name provided during >the >> - add-port operation must match the ifname parameter on the QEMU >command >> - line. More instructions on this can be found in the next section. >> - >> -DPDK vhost-cuse VM configuration: >> ---------------------------------- >> - >> - vhost-cuse ports use a Linux* character device to communicate with >QEMU. >> - By default it is set to `/dev/vhost-net`. It is possible to reuse this >> - standard device for DPDK vhost, which makes setup a little simpler but it >> - is better practice to specify an alternative character device in order to >> - avoid any conflicts if kernel vhost is to be used in parallel. >> + To better scale the work loads across cores, Multiple pmd threads can >> be >> + created and pinned to CPU cores by explicity specifying pmd-cpu-mask. >> + eg: To spawn 2 pmd threads and pin them to cores 1, 2 >> >> -1. This step is only needed if using an alternative character device. >> + ``` >> + ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6 >> + ``` >> >> - The new character device filename must be specified in the ovsdb: >> + 5. Create bridge & add DPDK devices >> >> - `./utilities/ovs-vsctl --no-wait set Open_vSwitch . \ >> - other_config:cuse-dev-name=my-vhost-net` >> + create a bridge with datapath_type "netdev" in the configuration >database >> >> - In the example above, the character device to be used will be >> - `/dev/my-vhost-net`. >> + `ovs-vsctl add-br br0 -- set bridge br0 datapath_type=netdev` >> >> -2. This step is only needed if reusing the standard character device. It >> will >> - conflict with the kernel vhost character device so the user must first >> - remove it. >> + Now you can add DPDK devices. OVS expects DPDK device names to >start with >> + "dpdk" and end with a portid. vswitchd should print (in the log file) >> the >> + number of dpdk devices found. >> >> - `rm -rf /dev/vhost-net` >> + ``` >> + ovs-vsctl add-port br0 dpdk0 -- set Interface dpdk0 type=dpdk >> + ovs-vsctl add-port br0 dpdk1 -- set Interface dpdk1 type=dpdk >> + ``` >> >> -3a. Configure virtio-net adaptors: >> - The following parameters must be passed to the QEMU binary: >> + After the DPDK ports get added to switch, a polling thread continuously >polls >> + DPDK devices and consumes 100% of the core as can be checked from >'top' and 'ps' cmds. >> >> ``` >> - -netdev >tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on >> - -device virtio-net-pci,netdev=net1,mac=<mac> >> + top -H >> + ps -eLo pid,psr,comm | grep pmd >> ``` >> >> - Repeat the above parameters for multiple devices. >> - >> - The DPDK vhost library will negiotiate its own features, so they >> - need not be passed in as command line params. Note that as offloads >are >> - disabled this is the equivalent of setting: >> + Note: creating bonds of DPDK interfaces is slightly different to >> creating >> + bonds of system interfaces. For DPDK, the interface type must be >explicitly >> + set, for example: >> >> - `csum=off,gso=off,guest_tso4=off,guest_tso6=off,guest_ecn=off` >> + ``` >> + ovs-vsctl add-bond br0 dpdkbond dpdk0 dpdk1 -- set Interface dpdk0 >type=dpdk -- set Interface dpdk1 type=dpdk >> + ``` >> >> -3b. If using an alternative character device. It must be also explicitly >> - passed to QEMU using the `vhostfd` argument: >> + 6. PMD thread statistics >> >> ``` >> - -netdev >tap,id=<id>,script=no,downscript=no,ifname=<name>,vhost=on, >> - vhostfd=<open_fd> >> - -device virtio-net-pci,netdev=net1,mac=<mac> >> - ``` >> + # Check current stats >> + ovs-appctl dpif-netdev/pmd-stats-show >> >> - The open file descriptor must be passed to QEMU running as a child >> - process. This could be done with a simple python script. >> + # Show port/rxq assignment >> + ovs-appctl dpif-netdev/pmd-rxq-show >> >> - ``` >> - #!/usr/bin/python >> - fd = os.open("/dev/usvhost", os.O_RDWR) >> - subprocess.call("qemu-system-x86_64 .... -netdev tap,id=vhostnet0,\ >> - vhost=on,vhostfd=" + fd +"...", shell=True) >> + # Clear previous stats >> + ovs-appctl dpif-netdev/pmd-stats-clear >> + ``` >> >> - Alternatively the `qemu-wrap.py` script can be used to automate the >> - requirements specified above and can be used in conjunction with libvirt >if >> - desired. See the "DPDK vhost VM configuration with QEMU wrapper" >section >> - below. >> + 7. Stop vswitchd & Delete bridge >> >> -4. Configure huge pages: >> - QEMU must allocate the VM's memory on hugetlbfs. Vhost ports access a >> - virtio-net device's virtual rings and packet buffers mapping the VM's >> - physical memory on hugetlbfs. To enable vhost-ports to map the VM's >> - memory into their process address space, pass the following parameters >> - to QEMU: >> + ``` >> + ovs-appctl -t ovs-vswitchd exit >> + ovs-appctl -t ovsdb-server exit >> + ovs-vsctl del-br br0 >> + ``` > > >I think you need to delete br0 before stopping ovsdb-server. > > >> >> - `-object memory-backend-file,id=mem,size=4096M,mem- >path=/dev/hugepages, >> - share=on -numa node,memdev=mem -mem-prealloc` >> +## <a name="builddpdk"></a> 4. DPDK in the VM >> >> - Note: For use with an earlier QEMU version such as v1.6.2, use the >> - following to configure hugepages instead: >> +DPDK 'testpmd' application can be run in the Guest VM for high speed >> +packet forwarding between vhostuser ports. This needs DPDK, testpmd to >be >> +compiled along with kernel modules. Below are the steps for setting up >> +the testpmd application in the VM. More information on the vhostuser >ports >> +can be found in [Vhost Walkthrough]. > > >This looks way too complicated for a beginners guide. I think you can >assume that the VM has networking connectivity or even better that the >user knows how to put a tarball inside of the VM and then take from there. Point taken. Will simplify this. > > >> >> - `-mem-path /dev/hugepages -mem-prealloc` >> + * Export the DPDK loc $DPDK_LOC to the Guest VM(/dev/sdb on VM) >> + and instantiate the Guest. >> -DPDK vhost-cuse VM configuration with libvirt and QEMU wrapper: >> ----------------------------------------------------------- >> + # Dump flows >> + ovs-ofctl dump-flows br0 >> + ``` >> >> -To use the qemu-wrapper script in conjuntion with libvirt, follow the >> -steps in the previous section before proceeding with the following steps: >> + 3. Instantiate Guest VM using Qemu cmdline >> >> - 1. Place `qemu-wrap.py` in libvirtd's binary search PATH ($PATH) >> - Ideally in the same directory that the QEMU binary is located. >> + Guest Configuration >> >> - 2. Ensure that the script has the same owner/group and file permissions >> - as the QEMU binary. >> + ``` >> + | configuration | values | comments >> + |----------------------|--------|----------------- >> + | qemu version | 2.2.0 | >> + | qemu thread affinity | core 5 | taskset 0x20 >> + | memory | 4GB | - >> + | cores | 2 | - >> + | Qcow2 image | CentOS7| - >> + | mrg_rxbuf | off | - >> + | export DPDK sources | yes | -drive file=fat:rw:$DPDK_LOC(seen >> as >/dev/sdb in VM) >> + ``` >> >> - 3. Update the VM xml file using "virsh edit VM.xml" >> + ``` > >You had a subsection called 'Guest configuration', I think here >deserves another subsection, e.g.: 'Guest Starting Command" Good observation. I will create another subsection here. > >> + export VM_NAME=vhost-vm >> + export GUEST_MEM=3072M >> + export QCOW2_IMAGE=/root/CentOS7_x86_64.qcow2 >> + export DPDK_LOC=/usr/src/dpdk-16.04 >> + export VHOST_SOCK_DIR=/usr/local/var/run/openvswitch >> >> - 1. Set the VM to use the launch script. >> - Set the emulator path contained in the `<emulator><emulator/>` >> tags. >> - For example, replace: >> + taskset 0x20 qemu-system-x86_64 -name $VM_NAME -cpu host - >enable-kvm -m $GUEST_MEM -object memory-backend- >file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on - >numa node,memdev=mem -mem-prealloc -smp sockets=1,cores=2 -drive >file=$QCOW2_IMAGE -drive file=fat:rw:$DPDK_LOC,snapshot=off -chardev >socket,id=char0,path=$VHOST_SOCK_DIR/dpdkvhostuser0 -netdev >type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net- Regards, Bhanu Prakash. _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev