Re: [ovs-dev] [PATCH] netdev-dpdk: Set pmd thread priority
Thanks for looking in to the patch Kevin, please see my reply inline. > -Original Message- > From: Traynor, Kevin > Sent: Monday, April 25, 2016 6:08 PM > To: Bodireddy, Bhanuprakash ; > dev@openvswitch.org > Subject: Re: [ovs-dev] [PATCH] netdev-dpdk: Set pmd thread priority > > On 21/04/2016 16:16, Bhanuprakash Bodireddy wrote: > > Set the DPDK pmd thread scheduling policy to SCHED_RR and static > > priority to highest priority value of the policy. This is to deal with > > pmd thread starvation case where another cpu hogging process can get > > scheduled/affinitized to the same core where pmd is running there by > > significantly impacting the datapath performance. > > > > The realtime scheduling policy is applied only when CPU mask is passed > > to 'pmd-cpu-mask'. The exception to this is 'pmd-cpu-mask=1', where > > the policy and priority shall not be applied to pmd thread spawned on > core0. > > For example: > > > > * In the absence of pmd-cpu-mask or if pmd-cpu-mask=1, one pmd > >thread shall be created and affinitized to 'core 0' with default > >scheduling policy and priority applied. > > > > * If pmd-cpu-mask is specified with CPU mask > 1, one or more pmd > >threads shall be spawned on the corresponding core(s) in the mask > >and real time scheduling policy SCHED_RR and highest static > >priority is applied to the pmd thread(s). > > > > To reproduce use following commands: > > > > ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6 taskset 0x2 > > cat /dev/zero > /dev/null & > > Even though it seems the most likely case - I'm not sure that we can always > assume the user who put the non-OVS process on the core did so by mistake > and would want us to increase our priority. You've got a point. When the user explicitly sets pmd-cpu-mask, he is likely looking at Performance and wants to pin pmd threads to dedicated cores and unlikely to run any other process on the pmd cores. Having said that, I have come across cases with HT enabled, users ran in to issues understanding 'thread_sliblings_list' and wrongly affinitize qemu threads to the cores running pmds using 'taskset' there by significantly impacting the datapath performance. This patch will mitigate such issues. > > > > > Signed-off-by: Bhanuprakash Bodireddy > > > --- > > lib/dpif-netdev.c | 9 + > > lib/netdev-dpdk.c | 14 ++ > > lib/netdev-dpdk.h | 1 + > > 3 files changed, 24 insertions(+) > > > > diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c > > index 1e8a37c..4a46816 100644 > > --- a/lib/dpif-netdev.c > > +++ b/lib/dpif-netdev.c > > @@ -2670,6 +2670,15 @@ pmd_thread_main(void *f_) > > /* Stores the pmd thread's 'pmd' to 'per_pmd_key'. */ > > ovsthread_setspecific(pmd->dp->per_pmd_key, pmd); > > pmd_thread_setaffinity_cpu(pmd->core_id); > > + > > +#ifdef DPDK_NETDEV > > +/* Set pmd thread's scheduling policy to SCHED_RR and priority to > > + * highest priority of SCHED_RR policy, In absence of pmd-cpu-mask (or) > > + * pmd-cpu-mask=1, default scheduling policy and priority shall > > + * apply to pmd thread */ > > + if (pmd->core_id) > > +pmd_thread_setpriority(); > > Similar to above, I don't think we can assume anything special about > core 0. This type of change sounds like something that would be better > done at a layer above vswitch which has more system wide knowledge. I stand to be corrected, Its very uncommon to isolate core0 or run HPC apps/threads on the core 0. Also on multicore systems to improve application performance and mitigate Interrupts, IRQs get explicity affinitized to Core 0. For this reasons I treat Core 0 special. > fwiw, it would be cleaner to remove the #ifdef from here and create a > dummy fn in netdev-dpdk.h, also the 'if' needs {} Agree. > > +#endif > > reload: > > emc_cache_init(&pmd->flow_cache); > > > > diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c > > index 208c5f5..6518c87 100644 > > --- a/lib/netdev-dpdk.c > > +++ b/lib/netdev-dpdk.c > > @@ -2926,6 +2926,20 @@ pmd_thread_setaffinity_cpu(unsigned cpu) > > return 0; > > } > > > > +void > > +pmd_thread_setpriority(void) > > +{ > > +struct sched_param threadparam; > > +int err; > > + > > +memset(&threadparam, 0, sizeof(threadparam)); > > +threadparam.sched
Re: [ovs-dev] [PATCH 1/2] doc: Refactor DPDK install documentation
Thanks Herbert for reviewing the DPDK install guide in detail. My comments inline. > -Original Message- > From: Thomas F Herbert [mailto:thomasfherb...@gmail.com] > Sent: Thursday, May 12, 2016 6:21 PM > To: Bodireddy, Bhanuprakash ; > dev@openvswitch.org > Subject: Re: [ovs-dev] [PATCH 1/2] doc: Refactor DPDK install documentation > > On 5/9/16 2:32 AM, Bhanuprakash Bodireddy wrote: > > Refactor the INSTALL.DPDK in to two documents named INSTALL.DPDK and > > INSTALL.DPDK-ADVANCED. While INSTALL.DPDK document shall facilitate > the > > novice user in setting up the OVS DPDK and running it out of box, the > > ADVANCED document is targeted at expert users looking for the optimum > > performance running dpdk datapath. > > > > This commit updates INSTALL.DPDK.md document. > > > > Signed-off-by: Bhanuprakash Bodireddy > > > --- > > INSTALL.DPDK.md | 1193 +++--- > - > > 1 file changed, 331 insertions(+), 862 deletions(-) > > > > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md > > index 93f92e4..bf646bf 100644 > > --- a/INSTALL.DPDK.md > > +++ b/INSTALL.DPDK.md > > @@ -1,1001 +1,470 @@ > > -Using Open vSwitch with DPDK > > - > > +OVS DPDK INSTALL GUIDE > > + > > > > -Open vSwitch can use Intel(R) DPDK lib to operate entirely in > > -userspace. This file explains how to install and use Open vSwitch in > > -such a mode. > > +## Contents > > > > -The DPDK support of Open vSwitch is considered experimental. > > -It has not been thoroughly tested. > > +1. [Overview](#overview) > > +2. [Building and Installation](#build) > > +3. [Setup OVS DPDK datapath](#ovssetup) > I wonder if the following 3 sections be in the advanced guide? with a > note here to refer to the advanced guide for configuration in the VM, > testcases and limitations? While I agree with you on this suggestion, I saw few questions on the ML Where the beginners had issues setting up the guest and running basic test cases using OVS DPDK and hence I thought it's better to have the test cases and limitations in the beginner document instead of redirecting the user to Advanced guide. > > +4. [DPDK in the VM](#builddpdk) > > +5. [OVS Testcases](#ovstc) > > +6. [Limitations ](#ovslimits) > > > > -This version of Open vSwitch should be built manually with `configure` > > -and `make`. > > +## 1. Overview > > > > -OVS needs a system with 1GB hugepages support. > > +Open vSwitch can use DPDK lib to operate entirely in userspace. > > +This file provides information on installation and use of Open vSwitch > > +using DPDK datapath. This version of Open vSwitch should be built > manually > > +with `configure` and `make`. > > > > -Building and Installing: > > - > > +The DPDK support of Open vSwitch is considered 'experimental'. > Isn't it time to remove this statement and not just put the word in quotes? I too want this line to be removed. Would like to have feedback on this from the community. > > > > -Required: DPDK 16.04 > > -Optional (if building with vhost-cuse): `fuse`, `fuse-devel` (`libfuse-dev` > > -on Debian/Ubuntu) > > +### Prerequisites > > > > -1. Configure build & install DPDK: > > - 1. Set `$DPDK_DIR` > > +* Required: DPDK 16.04 > > +* Hardware: [DPDK Supported NICs] when physical ports in use > > > > - ``` > > - export DPDK_DIR=/usr/src/dpdk-16.04 > > - cd $DPDK_DIR > > - ``` > > - > > - 2. Then run `make install` to build and install the library. > > - For default install without IVSHMEM: > > - > > - `make install T=x86_64-native-linuxapp-gcc DESTDIR=install` > > - > > - To include IVSHMEM (shared memory): > > - > > - `make install T=x86_64-ivshmem-linuxapp-gcc DESTDIR=install` > > - > > - For further details refer to http://dpdk.org/ > > - > > -2. Configure & build the Linux kernel: > > - > > - Refer to intel-dpdk-getting-started-guide.pdf for understanding > > - DPDK kernel requirement. > > - > > -3. Configure & build OVS: > > - > > - * Non IVSHMEM: > > - > > - `export DPDK_BUILD=$DPDK_DIR/x86_64-native-linuxapp-gcc/` > > - > > - * IVSHMEM: > > - > > - `export DPDK_BUILD=$DPDK_DIR/x86_64-ivshmem-linuxapp-gcc/` > > - > > - ``` > > - cd $(OVS_DIR)/ > > - ./boot.sh > > - ./configure --with-d
Re: [ovs-dev] [PATCH 2/2] doc: Refactor DPDK install guide, add ADVANCED doc
Thanks Herbert for the review, please see my reply inline. > -Original Message- > From: Thomas F Herbert [mailto:thomasfherb...@gmail.com] > Sent: Thursday, May 12, 2016 6:56 PM > To: Bodireddy, Bhanuprakash ; > dev@openvswitch.org > Subject: Re: [ovs-dev] [PATCH 2/2] doc: Refactor DPDK install guide, add > ADVANCED doc > > On 5/9/16 2:32 AM, Bhanuprakash Bodireddy wrote: > > Add INSTALL.DPDK-ADVANCED document that is forked off from original > > INSTALL.DPDK guide. This document is targeted at users looking for > > optimum performance on OVS using dpdk datapath. > Thanks for this effort. > > > > Signed-off-by: Bhanuprakash Bodireddy > > > > --- > > INSTALL.DPDK-ADVANCED.md | 809 > +++ > > 1 file changed, 809 insertions(+) > > create mode 100644 INSTALL.DPDK-ADVANCED.md > > > > diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md > new > > file mode 100644 index 000..dd09d36 > > --- /dev/null > > +++ b/INSTALL.DPDK-ADVANCED.md > > @@ -0,0 +1,809 @@ > > +OVS DPDK ADVANCED INSTALL GUIDE > > += > > + > > +## Contents > > + > > +1. [Overview](#overview) > > +2. [Building Shared Library](#build) > > +3. [System configuration](#sysconf) > > +4. [Performance Tuning](#perftune) > > +5. [OVS Testcases](#ovstc) > > +6. [Vhost Walkthrough](#vhost) > > +7. [QOS](#qos) > > +8. [Static Code Analysis](#staticanalyzer) 9. [Vsperf](#vsperf) > > + > > +## 1. Overview > > + > > +The Advanced Install Guide explains how to improve OVS performance > > +using DPDK datapath. This guide also provides information on tuning, > > +system configuration, troubleshooting, static code analysis and testcases. > > + > > +## 2. Building Shared Library > > + > > +DPDK can be built as static or shared library and shall be linked by > > +applications using DPDK datapath. The section lists steps to build > > +shared library and dynamically link DPDK against OVS. > > + > > +Note: Minor performance loss is seen with OVS when using shared DPDK > > +library as compared to static library. > > + > > +Check section 2.2, 2.3 of INSTALL.DPDK on download instructions for > > +DPDK and OVS. > > + > > + * Configure the DPDK library > > + > > + Set `CONFIG_RTE_BUILD_SHARED_LIB=y` in `config/common_base` to > > + generate shared DPDK library > > + > > + > > + * Build and install DPDK > > + > > +For Default install (without IVSHMEM), set `export > DPDK_TARGET=x86_64-native-linuxapp-gcc` > > +For IVSHMEM case, set `export > > + DPDK_TARGET=x86_64-ivshmem-linuxapp-gcc` > > + > > +``` > > +export DPDK_DIR=/usr/src/dpdk-16.04 > > +export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET > > +make install T=$DPDK_TARGET DESTDIR=install > > +``` > > + > > + * Build, Install and Setup OVS. > > + > > + Export the DPDK shared library location and setup OVS as listed in > > + section 3.3 of INSTALL.DPDK. > > + > > + `export LD_LIBRARY_PATH=$DPDK_DIR/x86_64-native-linuxapp-gcc/lib` > > + > > +## 3. System Configuration > > + > > +To achieve optimal OVS performance, the system can be configured and > > +that includes BIOS tweaks, Grub cmdline additions, better > > +understanding of NUMA nodes and apt selection of PCIe slots for NIC > placement. > > + > > +### 3.1 Recommended BIOS settings > > + > > + ``` > > + | Settings | values| comments > > + |---|---|--- > > + | C3 power state| Disabled | - > > + | C6 power state| Disabled | - > > + | MLC Streamer | Enabled | - > > + | MLC Spacial prefetcher| Enabled | - > > + | DCU Data prefetcher | Enabled | - > > + | DCA | Enabled | - > > + | CPU power and performance | Performance - > > + | Memory RAS and perf | | - > > +config-> NUMA optimized | Enabled | - > > + ``` > > + > > +### 3.2 PCIe Slot Selection > > + > > +The fastpath performance also depends on factors like the NIC > > +placement, Channel speeds between PCIe slot and CPU, proximity of > > +PCIe slot to the CPU cores running DPDK application. Listed below are > > +the steps to identify right PCIe slot. > > + > > +- Retrieve host details using cmd `dmidecode -t baseboard | grep > > +"Product Name"`
Re: [ovs-dev] [PATCH 1/2] doc: Refactor DPDK install documentation
Thanks Aaron for reviewing the install guide. Please see my reply inline. > -Original Message- > From: Aaron Conole [mailto:acon...@redhat.com] > Sent: Friday, May 13, 2016 4:55 PM > To: Bodireddy, Bhanuprakash > Cc: dev@openvswitch.org > Subject: Re: [ovs-dev] [PATCH 1/2] doc: Refactor DPDK install documentation > > Bhanuprakash Bodireddy writes: > > > Refactor the INSTALL.DPDK in to two documents named INSTALL.DPDK and > > INSTALL.DPDK-ADVANCED. While INSTALL.DPDK document shall facilitate > the > > novice user in setting up the OVS DPDK and running it out of box, the > > ADVANCED document is targeted at expert users looking for the optimum > > performance running dpdk datapath. > > > > This commit updates INSTALL.DPDK.md document. > > > > Signed-off-by: Bhanuprakash Bodireddy > > > --- > > INSTALL.DPDK.md | 1193 +++ > > > 1 file changed, 331 insertions(+), 862 deletions(-) > > > > diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md > > index 93f92e4..bf646bf 100644 > > --- a/INSTALL.DPDK.md > > +++ b/INSTALL.DPDK.md > > @@ -1,1001 +1,470 @@ > > -Using Open vSwitch with DPDK > > - > > +OVS DPDK INSTALL GUIDE > > + > > > > -Open vSwitch can use Intel(R) DPDK lib to operate entirely in > > -userspace. This file explains how to install and use Open vSwitch in > > -such a mode. > > +## Contents > > > > -The DPDK support of Open vSwitch is considered experimental. > > -It has not been thoroughly tested. > > +1. [Overview](#overview) > > +2. [Building and Installation](#build) > > +3. [Setup OVS DPDK datapath](#ovssetup) > > +4. [DPDK in the VM](#builddpdk) > > +5. [OVS Testcases](#ovstc) > > +6. [Limitations ](#ovslimits) > > > > -This version of Open vSwitch should be built manually with `configure` > > -and `make`. > > +## 1. Overview > > > > -OVS needs a system with 1GB hugepages support. > > +Open vSwitch can use DPDK lib to operate entirely in userspace. > > +This file provides information on installation and use of Open vSwitch > > +using DPDK datapath. This version of Open vSwitch should be built > manually > > +with `configure` and `make`. > > > > -Building and Installing: > > - > > +The DPDK support of Open vSwitch is considered 'experimental'. > > > > -Required: DPDK 16.04 > > -Optional (if building with vhost-cuse): `fuse`, `fuse-devel` (`libfuse-dev` > > -on Debian/Ubuntu) > > +### Prerequisites > > > > -1. Configure build & install DPDK: > > - 1. Set `$DPDK_DIR` > > +* Required: DPDK 16.04 > > +* Hardware: [DPDK Supported NICs] when physical ports in use > > > > - ``` > > - export DPDK_DIR=/usr/src/dpdk-16.04 > > - cd $DPDK_DIR > > - ``` > > - > > - 2. Then run `make install` to build and install the library. > > - For default install without IVSHMEM: > > - > > - `make install T=x86_64-native-linuxapp-gcc DESTDIR=install` > > - > > - To include IVSHMEM (shared memory): > > - > > - `make install T=x86_64-ivshmem-linuxapp-gcc DESTDIR=install` > > - > > - For further details refer to http://dpdk.org/ > > - > > -2. Configure & build the Linux kernel: > > - > > - Refer to intel-dpdk-getting-started-guide.pdf for understanding > > - DPDK kernel requirement. > > - > > -3. Configure & build OVS: > > - > > - * Non IVSHMEM: > > - > > - `export DPDK_BUILD=$DPDK_DIR/x86_64-native-linuxapp-gcc/` > > - > > - * IVSHMEM: > > - > > - `export DPDK_BUILD=$DPDK_DIR/x86_64-ivshmem-linuxapp-gcc/` > > - > > - ``` > > - cd $(OVS_DIR)/ > > - ./boot.sh > > - ./configure --with-dpdk=$DPDK_BUILD [CFLAGS="-g -O2 -Wno-cast- > align"] > > - make > > - ``` > > - > > - Note: 'clang' users may specify the '-Wno-cast-align' flag to suppress > DPDK cast-align warnings. > > - > > -To have better performance one can enable aggressive compiler > optimizations and > > -use the special instructions(popcnt, crc32) that may not be available on > > all > > -machines. Instead of typing `make`, type: > > - > > -`make CFLAGS='-O3 -march=native'` > > - > > -Refer to [INSTALL.userspace.md] for general requirements of building > userspace OVS. > > - > > -Using the
Re: [ovs-dev] [PATCH 2/2] doc: Refactor DPDK install guide, add ADVANCED doc
Thanks Aaron for reviewing the Advanced install guide in detail. Please see my reply inline. > -Original Message- > From: Aaron Conole [mailto:acon...@redhat.com] > Sent: Friday, May 13, 2016 4:47 PM > To: Bodireddy, Bhanuprakash > Cc: dev@openvswitch.org > Subject: Re: [ovs-dev] [PATCH 2/2] doc: Refactor DPDK install guide, add > ADVANCED doc > > Hi Bhanuprakash, > > Bhanuprakash Bodireddy writes: > > > Add INSTALL.DPDK-ADVANCED document that is forked off from original > > INSTALL.DPDK guide. This document is targeted at users looking for > > optimum performance on OVS using dpdk datapath. > > > > Signed-off-by: Bhanuprakash Bodireddy > > > > --- > > INSTALL.DPDK-ADVANCED.md | 809 > > +++ > > 1 file changed, 809 insertions(+) > > create mode 100644 INSTALL.DPDK-ADVANCED.md > > > > diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md > new > > file mode 100644 index 000..dd09d36 > > --- /dev/null > > +++ b/INSTALL.DPDK-ADVANCED.md > > @@ -0,0 +1,809 @@ > > +OVS DPDK ADVANCED INSTALL GUIDE > > += > > + > > +## Contents > > + > > +1. [Overview](#overview) > > +2. [Building Shared Library](#build) > > +3. [System configuration](#sysconf) > > +4. [Performance Tuning](#perftune) > > +5. [OVS Testcases](#ovstc) > > +6. [Vhost Walkthrough](#vhost) > > I actually think the vhostuser part of this should be in the INSTALL.DPDK.md; > I > think the simplest start of using dpdk and qemu is through vhost-user > sockets. While this is good point and completely agree with you, the vhostuser ports, socket configuration and other details w.r.t qemu cmdline like hugepages and multiqueue are moved to Advanced doc to free the Beginner from all the clutter as he can simply copy paste the qemu cmdline and get the VM up with minimal effort without going through the nitty-gritty details of the vhost. Having said that, I would add VM libvirt configuration in to Beginner Document while moving the explanation To the Advance guide. I would add a line to the 5.2 P-V-P(vhost loopback) test case & 4. DPDK in the VM section that the details on vhost user can be found in INSTALL.DPDK-ADVANCE.md with hyperlink enabled. Hope you agree to this. > > > +7. [QOS](#qos) > > +8. [Static Code Analysis](#staticanalyzer) 9. [Vsperf](#vsperf) > > + > > +## 1. Overview > > + > > +The Advanced Install Guide explains how to improve OVS performance > > +using DPDK datapath. This guide also provides information on tuning, > > +system > > configuration, > > +troubleshooting, static code analysis and testcases. > > + > > +## 2. Building Shared Library > > + > > +DPDK can be built as static or shared library and shall be linked by > > +applications using DPDK datapath. The section lists steps to build > > +shared library > > and dynamically > > +link DPDK against OVS. > > + > > +Note: Minor performance loss is seen with OVS when using shared DPDK > > +library as compared to static library. > > + > > +Check section 2.2, 2.3 of INSTALL.DPDK on download instructions for > > +DPDK and OVS. > > + > > + * Configure the DPDK library > > + > > + Set `CONFIG_RTE_BUILD_SHARED_LIB=y` in `config/common_base` to > > + generate shared DPDK library > > + > > + > > + * Build and install DPDK > > + > > + For Default install (without IVSHMEM), set `export > > DPDK_TARGET=x86_64-native-linuxapp-gcc` > > +For IVSHMEM case, set `export > > + DPDK_TARGET=x86_64-ivshmem-linuxapp-gcc` > > + > > +``` > > +export DPDK_DIR=/usr/src/dpdk-16.04 > > +export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET > > +make install T=$DPDK_TARGET DESTDIR=install > > +``` > > + > > + * Build, Install and Setup OVS. > > + > > + Export the DPDK shared library location and setup OVS as listed in > > + section 3.3 of INSTALL.DPDK. > > + > > + `export LD_LIBRARY_PATH=$DPDK_DIR/x86_64-native-linuxapp-gcc/lib` > > + > > +## 3. System Configuration > > + > > +To achieve optimal OVS performance, the system can be configured and > > +that includes BIOS tweaks, Grub cmdline additions, better > > +understanding of NUMA nodes and apt selection of PCIe slots for NIC > placement. > > + > > +### 3.1 Recommended BIOS settings > > + > > + ``` > > + | Settings | values| comments > > + |---|---|--- > > + | C3 pow
Re: [ovs-dev] [PATCH v4 0/2] doc: Refactor DPDK install guide
Hi, Apologies for top posting. Want to know if there any more comments on the V4 Install guide. please let me know so that I'll work on it. Regards, Bhanu Prakash. > -Original Message- > From: Bodireddy, Bhanuprakash > Sent: Tuesday, May 17, 2016 3:49 PM > To: dev@openvswitch.org > Cc: thomasfherb...@gmail.com; acon...@redhat.com; Bodireddy, > Bhanuprakash > Subject: [PATCH v4 0/2] doc: Refactor DPDK install guide > > This patchset refactors the present INSTALL.DPDK.md guide. > > The INSTALL guide is split in to two documents named INSTALL.DPDK and > INSTALL.DPDK-ADVANCED. The former document is simplified with emphasis > on installation, basic testcases and targets novice users. Sections on system > configuration, performance tuning, vhost walkthrough and IVSHMEM are > moved to DPDK-ADVANCED guide. > > Reviewers can review the rendered form here > https://github.com/bbodired/ovs/blob/master/INSTALL.DPDK.md > https://github.com/bbodired/ovs/blob/master/INSTALL.DPDK- > ADVANCED.md > > V3->v4 > * Refactor hugepage section in Beginner and Advanced guides > * Added Guest libvirt configuration for vhostuser ports > * General formatting changes, enable hyperlinks and wording changes > > v2->v3: > * Rebased > > v1->v2: > * Rebased > * Update DPDK version to 16.04 > * Add vsperf section in ADVANCED Guide > > Bhanuprakash Bodireddy (2): > doc: Refactor DPDK install documentation > doc: Refactor DPDK install guide, add ADVANCED doc > > INSTALL.DPDK-ADVANCED.md | 840 ++ > INSTALL.DPDK.md | 1277 > +++--- > 2 files changed, 1265 insertions(+), 852 deletions(-) create mode 100644 > INSTALL.DPDK-ADVANCED.md > > -- > 2.4.11 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] doc: Refactor DPDK install documentation
Thanks Mauricio for the review, my comments inline. From: Mauricio Vásquez [mailto:mauricio.vasquezber...@studenti.polito.it] Sent: Monday, May 23, 2016 9:50 PM To: Bodireddy, Bhanuprakash Cc: dev@openvswitch.org Subject: Re: [ovs-dev] [PATCH 1/2] doc: Refactor DPDK install documentation Hi Bhanuprakash, Some comments inline. On Tue, May 17, 2016 at 4:49 PM, Bhanuprakash Bodireddy mailto:bhanuprakash.bodire...@intel.com>> wrote: Refactor the INSTALL.DPDK in to two documents named INSTALL.DPDK and INSTALL.DPDK-ADVANCED. While INSTALL.DPDK document shall facilitate the novice user in setting up the OVS DPDK and running it out of box, the ADVANCED document is targeted at expert users looking for the optimum performance running dpdk datapath. This commit updates INSTALL.DPDK.md<http://INSTALL.DPDK.md> document. Signed-off-by: Bhanuprakash Bodireddy mailto:bhanuprakash.bodire...@intel.com>> --- INSTALL.DPDK.md<http://INSTALL.DPDK.md> | 1277 ++- 1 file changed, 425 insertions(+), 852 deletions(-) diff --git a/INSTALL.DPDK.md<http://INSTALL.DPDK.md> b/INSTALL.DPDK.md<http://INSTALL.DPDK.md> index 93f92e4..f939652 100644 --- a/INSTALL.DPDK.md<http://INSTALL.DPDK.md> +++ b/INSTALL.DPDK.md<http://INSTALL.DPDK.md> @@ -1,1001 +1,574 @@ -Using Open vSwitch with DPDK - +OVS DPDK INSTALL GUIDE + -Open vSwitch can use Intel(R) DPDK lib to operate entirely in -userspace. This file explains how to install and use Open vSwitch in -such a mode. +## Contents -The DPDK support of Open vSwitch is considered experimental. -It has not been thoroughly tested. +1. [Overview](#overview) +2. [Building and Installation](#build) +3. [Setup OVS DPDK datapath](#ovssetup) +4. [DPDK in the VM](#builddpdk) +5. [OVS Testcases](#ovstc) +6. [Limitations ](#ovslimits) -This version of Open vSwitch should be built manually with `configure` -and `make`. +## 1. Overview -OVS needs a system with 1GB hugepages support. +Open vSwitch can use DPDK lib to operate entirely in userspace. +This file provides information on installation and use of Open vSwitch +using DPDK datapath. This version of Open vSwitch should be built manually +with `configure` and `make`. -Building and Installing: - +The DPDK support of Open vSwitch is considered 'experimental'. -Required: DPDK 16.04 -Optional (if building with vhost-cuse): `fuse`, `fuse-devel` (`libfuse-dev` -on Debian/Ubuntu) +### Prerequisites -1. Configure build & install DPDK: - 1. Set `$DPDK_DIR` +* Required: DPDK 16.04 +* Hardware: [DPDK Supported NICs] when physical ports in use - ``` - export DPDK_DIR=/usr/src/dpdk-16.04 - cd $DPDK_DIR - ``` - - 2. Then run `make install` to build and install the library. - For default install without IVSHMEM: - - `make install T=x86_64-native-linuxapp-gcc DESTDIR=install` - - To include IVSHMEM (shared memory): - - `make install T=x86_64-ivshmem-linuxapp-gcc DESTDIR=install` - - For further details refer to http://dpdk.org/ - -2. Configure & build the Linux kernel: - - Refer to intel-dpdk-getting-started-guide.pdf for understanding - DPDK kernel requirement. - -3. Configure & build OVS: - - * Non IVSHMEM: - - `export DPDK_BUILD=$DPDK_DIR/x86_64-native-linuxapp-gcc/` - - * IVSHMEM: - - `export DPDK_BUILD=$DPDK_DIR/x86_64-ivshmem-linuxapp-gcc/` - - ``` - cd $(OVS_DIR)/ - ./boot.sh - ./configure --with-dpdk=$DPDK_BUILD [CFLAGS="-g -O2 -Wno-cast-align"] - make - ``` - - Note: 'clang' users may specify the '-Wno-cast-align' flag to suppress DPDK cast-align warnings. - -To have better performance one can enable aggressive compiler optimizations and -use the special instructions(popcnt, crc32) that may not be available on all -machines. Instead of typing `make`, type: - -`make CFLAGS='-O3 -march=native'` - -Refer to [INSTALL.userspace.md<http://INSTALL.userspace.md>] for general requirements of building userspace OVS. - -Using the DPDK with ovs-vswitchd: -- - -1. Setup system boot - Add the following options to the kernel bootline: - - `default_hugepagesz=1GB hugepagesz=1G hugepages=1` - -2. Setup DPDK devices: - - DPDK devices can be setup using either the VFIO (for DPDK 1.7+) or UIO - modules. UIO requires inserting an out of tree driver igb_uio.ko that is - available in DPDK. Setup for both methods are described below. - - * UIO: - 1. insert uio.ko: `modprobe uio` - 2. insert igb_uio.ko: `insmod $DPDK_BUILD/kmod/igb_uio.ko` - 3. Bind network device to igb_uio: - `$DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio eth1` - - * VFIO: - - VFIO needs to be supported in the kernel and the BIOS. More information - can be found in the [DPDK Linux GSG]. - - 1. Insert v
Re: [ovs-dev] [PATCH 2/2] doc: Refactor DPDK install guide, add ADVANCED doc
Thanks Mauricio for the review, my comments inline. From: Mauricio Vásquez [mailto:mauricio.vasquezber...@studenti.polito.it] Sent: Monday, May 23, 2016 10:19 PM To: Bodireddy, Bhanuprakash Cc: dev@openvswitch.org Subject: Re: [ovs-dev] [PATCH 2/2] doc: Refactor DPDK install guide, add ADVANCED doc On Tue, May 17, 2016 at 4:49 PM, Bhanuprakash Bodireddy mailto:bhanuprakash.bodire...@intel.com>> wrote: Add INSTALL.DPDK-ADVANCED document that is forked off from original INSTALL.DPDK guide. This document is targeted at users looking for optimum performance on OVS using dpdk datapath. Signed-off-by: Bhanuprakash Bodireddy mailto:bhanuprakash.bodire...@intel.com>> --- INSTALL.DPDK-ADVANCED.md<http://INSTALL.DPDK-ADVANCED.md> | 840 +++ 1 file changed, 840 insertions(+) create mode 100644 INSTALL.DPDK-ADVANCED.md<http://INSTALL.DPDK-ADVANCED.md> diff --git a/INSTALL.DPDK-ADVANCED.md<http://INSTALL.DPDK-ADVANCED.md> b/INSTALL.DPDK-ADVANCED.md<http://INSTALL.DPDK-ADVANCED.md> new file mode 100644 index 000..c87ce78 --- /dev/null +++ b/INSTALL.DPDK-ADVANCED.md<http://INSTALL.DPDK-ADVANCED.md> @@ -0,0 +1,840 @@ +OVS DPDK ADVANCED INSTALL GUIDE += + +## Contents + +1. [Overview](#overview) +2. [Building Shared Library](#build) +3. [System configuration](#sysconf) +4. [Performance Tuning](#perftune) +5. [OVS Testcases](#ovstc) +6. [Vhost Walkthrough](#vhost) +7. [QOS](#qos) +8. [Static Code Analysis](#staticanalyzer) +9. [Vsperf](#vsperf) + +## 1. Overview + +The Advanced Install Guide explains how to improve OVS performance using +DPDK datapath. This guide also provides information on tuning, system configuration, +troubleshooting, static code analysis and testcases. + +## 2. Building Shared Library + +DPDK can be built as static or shared library and shall be linked by applications +using DPDK datapath. The section lists steps to build shared library and dynamically +link DPDK against OVS. + +Note: Minor performance loss is seen with OVS when using shared DPDK library as +compared to static library. + +Check section 2.2, 2.3 of INSTALL.DPDK on download instructions +for DPDK and OVS. + What about using a reference? [OK] + * Configure the DPDK library + + Set `CONFIG_RTE_BUILD_SHARED_LIB=y` in `config/common_base` + to generate shared DPDK library + + + * Build and install DPDK + +For Default install (without IVSHMEM), set `export DPDK_TARGET=x86_64-native-linuxapp-gcc` +For IVSHMEM case, set `export DPDK_TARGET=x86_64-ivshmem-linuxapp-gcc` + +``` +export DPDK_DIR=/usr/src/dpdk-16.04 +export DPDK_BUILD=$DPDK_DIR/$DPDK_TARGET +make install T=$DPDK_TARGET DESTDIR=install +``` + + * Build, Install and Setup OVS. + + Export the DPDK shared library location and setup OVS as listed in + section 3.3 of INSTALL.DPDK. + + `export LD_LIBRARY_PATH=$DPDK_DIR/x86_64-native-linuxapp-gcc/lib` + +## 3. System Configuration + +To achieve optimal OVS performance, the system can be configured and that includes +BIOS tweaks, Grub cmdline additions, better understanding of NUMA nodes and +apt selection of PCIe slots for NIC placement. + +### 3.1 Recommended BIOS settings + + ``` + | Settings | values| comments + |---|---|--- + | C3 power state| Disabled | - + | C6 power state| Disabled | - + | MLC Streamer | Enabled | - + | MLC Spacial prefetcher| Enabled | - + | DCU Data prefetcher | Enabled | - + | DCA | Enabled | - + | CPU power and performance | Performance - + | Memory RAS and perf | | - +config-> NUMA optimized | Enabled | - + ``` + +### 3.2 PCIe Slot Selection + +The fastpath performance also depends on factors like the NIC placement, +Channel speeds between PCIe slot and CPU, proximity of PCIe slot to the CPU +cores running DPDK application. Listed below are the steps to identify +right PCIe slot. + +- Retrieve host details using cmd `dmidecode -t baseboard | grep "Product Name"` +- Download the technical specification for Product listed eg: S2600WT2. +- Check the Product Architecture Overview on the Riser slot placement, + CPU sharing info and also PCIe channel speeds. + + example: On S2600WT, CPU1 and CPU2 share Riser Slot 1 with Channel speed between + CPU1 and Riser Slot1 at 32GB/s, CPU2 and Riser Slot1 at 16GB/s. Running DPDK app + on CPU1 cores and NIC inserted in to Riser card Slots will optimize OVS performance + in this case. + +- Check the Riser Card #1 - Root Port mapping information, on the available slots + and individual bus speeds. In S2600WT slot 1, slot 2 has high bus speeds and are + potential slots for NIC placement. + +### 3.3 Advanced Hugepage setup + + Allocate and mount 1G Huge pages: + + - For persistent allocation of huge p
Re: [ovs-dev] [RFC PATCH] netdev-dpdk: Remove vhost send retries when no packets have been sent.
>-Original Message- >From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Kevin >Traynor >Sent: Monday, May 23, 2016 3:46 PM >To: dev@openvswitch.org >Cc: i.maxim...@samsung.com; Traynor, Kevin >Subject: [ovs-dev] [RFC PATCH] netdev-dpdk: Remove vhost send retries >when no packets have been sent. > >If the guest is connected but not servicing the virt queue, this leads to vhost >send retries until timeout. This is fine in isolation but if there are other >high >rate queues also being serviced by the same PMD it can lead to a performance >hit on those queues. Change to only retry when at least some packets have >been successfully sent on the previous attempt. Thanks for the patch kevin, I verified the patch and this seems to fix the problem I reported. Though the below(now removed) retry logic looks fair (given a 100 micro second timeout) and causes no issues in isolation, it leads to serious problems when scaling the VMs. On a multi VM setup with explicit flows set, I see the aggregate throughput nearly collapses to few hundred thousand packets when few of my guests are not servicing their queues and sitting idle. The profiling shows significant cycles are spent in __netdev_dpdk_vhost_send() enqueuing the packets on the vhost ports which are not drained there by triggering the retry logic and wasting cpu cycles there by significantly impacting the aggregate throughput. I am not sure of corner cases where the retry logic is still needed, otherwise you can treat this as Acked. Acked-by: Bhanuprakash Bodireddy > >Reported-by: Bhanuprakash Bodireddy > >Signed-off-by: Kevin Traynor >--- > >Sending for discussion in this mailing list thread >http://openvswitch.org/pipermail/dev/2016-May/071115.html > > lib/netdev-dpdk.c | 28 +--- > 1 files changed, 1 insertions(+), 27 deletions(-) > >diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index c7217ea..c39ce6c >100644 >--- a/lib/netdev-dpdk.c >+++ b/lib/netdev-dpdk.c >@@ -110,11 +110,6 @@ BUILD_ASSERT_DECL((MAX_NB_MBUF / >ROUND_DOWN_POW2(MAX_NB_MBUF/MIN_NB_MBUF)) > static char *cuse_dev_name = NULL;/* Character device cuse_dev_name. >*/ > static char *vhost_sock_dir = NULL; /* Location of vhost-user sockets */ > >-/* >- * Maximum amount of time in micro seconds to try and enqueue to vhost. >- */ >-#define VHOST_ENQ_RETRY_USECS 100 >- > static const struct rte_eth_conf port_conf = { > .rxmode = { > .mq_mode = ETH_MQ_RX_RSS, >@@ -1261,7 +1256,6 @@ __netdev_dpdk_vhost_send(struct netdev >*netdev, int qid, > struct rte_mbuf **cur_pkts = (struct rte_mbuf **) pkts; > unsigned int total_pkts = cnt; > unsigned int qos_pkts = cnt; >-uint64_t start = 0; > > qid = dev->tx_q[qid % dev->real_n_txq].map; > >@@ -1290,27 +1284,7 @@ __netdev_dpdk_vhost_send(struct netdev >*netdev, int qid, > /* Prepare for possible next iteration.*/ > cur_pkts = &cur_pkts[tx_pkts]; > } else { >-uint64_t timeout = VHOST_ENQ_RETRY_USECS * rte_get_timer_hz() / >1E6; >-unsigned int expired = 0; >- >-if (!start) { >-start = rte_get_timer_cycles(); >-} >- >-/* >- * Unable to enqueue packets to vhost interface. >- * Check available entries before retrying. >- */ >-while (!rte_vring_available_entries(virtio_dev, vhost_qid)) { >-if (OVS_UNLIKELY((rte_get_timer_cycles() - start) > timeout)) >{ >-expired = 1; >-break; >-} >-} >-if (expired) { >-/* break out of main loop. */ >-break; >-} >+break; > } > } while (cnt); > >-- >1.7.4.1 > >___ >dev mailing list >dev@openvswitch.org >http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] netdev-dpdk: NUMA Aware vHost User
>-Original Message- >From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Ciara Loftus >Sent: Tuesday, May 24, 2016 2:15 PM >To: dev@openvswitch.org >Subject: [ovs-dev] [PATCH] netdev-dpdk: NUMA Aware vHost User > >This commit allows for vHost User memory from QEMU, DPDK and OVS, as >well as the servicing PMD, to all come from the same socket. > >The socket id of a vhost-user port used to be set to that of the master lcore. >Now it is possible to update the socket id if it is detected (during VM boot) >that the vhost device memory is not on this node. If this is the case, a new >mempool is created from the new node, and the PMD thread currently >servicing the port will no longer, in favour of a thread from the new node (if >enabled in the pmd-cpu-mask). > >To avail of this functionality, one must enable the >CONFIG_RTE_LIBRTE_VHOST_NUMA DPDK configuration option. > >Signed-off-by: Ciara Loftus >--- > .travis.yml | 3 +++ > INSTALL.DPDK.md | 8 ++-- > NEWS| 3 +++ > acinclude.m4| 2 +- > lib/netdev-dpdk.c | 37 ++--- > rhel/openvswitch-fedora.spec.in | 1 + > 6 files changed, 48 insertions(+), 6 deletions(-) > >diff --git a/.travis.yml b/.travis.yml >index ee2cf21..faba325 100644 >--- a/.travis.yml >+++ b/.travis.yml >@@ -11,10 +11,13 @@ addons: > packages: > - bc > - gcc-multilib >+ - libnuma1 >+ - libnuma-dev > - libssl-dev > - llvm-dev > - libjemalloc1 > - libjemalloc-dev >+ - numactl > > before_install: ./.travis/${TRAVIS_OS_NAME}-prepare.sh > >diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index 93f92e4..bbe0234 >100644 >--- a/INSTALL.DPDK.md >+++ b/INSTALL.DPDK.md >@@ -16,7 +16,7 @@ OVS needs a system with 1GB hugepages support. > Building and Installing: > > >-Required: DPDK 16.04 >+Required: DPDK 16.04, libnuma The change above makes libnuma mandatory to build OVS with DPDK datapath. The config option CONFIG_RTE_LIBRTE_VHOST_NUMA is disabled by default in DPDK-16.04 and hence steps to enable this option and build DPDK may have to be captured in "Configure build & Install DPDK" section of the install guide. > Optional (if building with vhost-cuse): `fuse`, `fuse-devel` (`libfuse-dev` > on >Debian/Ubuntu) > >@@ -443,7 +443,11 @@ Performance Tuning: > > It is good practice to ensure that threads that are in the datapath are > pinned to cores in the same NUMA area. e.g. pmd threads and QEMU >vCPUs >- responsible for forwarding. >+ responsible for forwarding. If DPDK is built with >+ CONFIG_RTE_LIBRTE_VHOST_NUMA=y, vHost User ports >automatically >+ detect the NUMA socket of the QEMU vCPUs and will be serviced by a >PMD >+ from the same node provided a core on this node is enabled in the >+ pmd-cpu-mask. > > 9. Rx Mergeable buffers > >diff --git a/NEWS b/NEWS >index 4e81cad..24ca39f 100644 >--- a/NEWS >+++ b/NEWS >@@ -32,6 +32,9 @@ Post-v2.5.0 > * DB entries have been added for many of the DPDK EAL command line >arguments. Additional arguments can be passed via the dpdk-extra >entry. >+ * PMD threads servicing vHost User ports can now come from the NUMA >+ node that device memory is located on if >CONFIG_RTE_LIBRTE_VHOST_NUMA >+ is enabled in DPDK. >- ovs-benchmark: This utility has been removed due to lack of use and > bitrot. >- ovs-appctl: >diff --git a/acinclude.m4 b/acinclude.m4 index f3de855..99ddf04 100644 >--- a/acinclude.m4 >+++ b/acinclude.m4 >@@ -218,7 +218,7 @@ AC_DEFUN([OVS_CHECK_DPDK], [ > DPDKLIB_FOUND=false > save_LIBS=$LIBS > for extras in "" "-ldl"; do >-LIBS="$DPDK_LIB $extras $save_LIBS $DPDK_EXTRA_LIB" >+LIBS="$DPDK_LIB $extras $save_LIBS $DPDK_EXTRA_LIB -lnuma" The above change makes libnuma mandatory for configuring OVS using DPDK datapath while ' CONFIG_RTE_LIBRTE_VHOST_NUMA' is disabled by default. IMHO, can we check if LIBRTE_VHOST_NUMA is enabled(from rte_config.h) and append "lnuma" only when it is true. This is inline with how we handle VHOST CUSE case. > AC_LINK_IFELSE( >[AC_LANG_PROGRAM([#include > #include ], diff --git > a/lib/netdev-dpdk.c >b/lib/netdev-dpdk.c index 0d1b8c9..ad6c4bb 100644 >--- a/lib/netdev-dpdk.c >+++ b/lib/netdev-dpdk.c >@@ -30,6 +30,7 @@ > #include > #include > #include >+#include > > #include "dirs.h" > #include "dp-packet.h" >@@ -378,6 +379,9 @@ struct netdev_dpdk { > * netdev_dpdk*_reconfigure() is called */ > int requested_n_txq; > int requested_n_rxq; >+ >+/* Socket ID detected when vHost device is brought up */ >+int requested_socket_id; > }; > > struct netdev_rxq_dpdk { >@@ -747,6 +751,7 @@ netdev_dpdk_init(struct netdev *netdev, unsigned int >port_no, > } > > dev->socket_id = sid < 0 ? SOCKET0 : sid; >+
Re: [ovs-dev] If 1 KVM Guest loads the virtio-pci, on top of dpdkvhostuser OVS socket interface, it slows down everything!
I could reproduce the issue and this can be fixed as below Firstly, the throughput issues observed with other VMs when a new VM is started can be fixed using the patch in the thread http://openvswitch.org/pipermail/dev/2016-May/071615.html. I have put up an explanation in this thread for the cause of issue especially with multi VM setup on OVS DPDK. On a Multi VM setup even with the above patch applied, one might see aggregate throughput difference when vNIC is bind to igb_uio vs virtio-pci, this is for the fact that the interrupt overhead is significantly higher when virtio-pci is in use. More importantly if you have setup explicit flows matching VM's MAC/IP, disabling the flows to the VM that are idle would improve the aggregate throughput and lessen the burden on the pmd thread. 'watch -d ./utilities/ovs-appctl dpctl/show -s' will show no. of packet stats. Regards, Bhanu Prakash. >-Original Message- >From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Christian >Ehrhardt >Sent: Wednesday, May 25, 2016 7:08 AM >To: Martinx - ジェームズ >Cc: ; dev ; >qemu-sta...@nongnu.org >Subject: Re: [ovs-dev] If 1 KVM Guest loads the virtio-pci, on top of >dpdkvhostuser OVS socket interface, it slows down everything! > >Hi again, >another forgotten case. > >I currently I lack the HW to fully reproduce this, but the video summary is >pretty good and shows the issue in an impressive way. > >Also the description is good and here as well I wonder if anybody else could >reproduce this. >Any hints / insights are welcome. > >P.S. and also again - two list cross posting, but here as well it is yet >unclear >which it belongs to so I'll keep it as well > >Christian Ehrhardt >Software Engineer, Ubuntu Server >Canonical Ltd > >On Sun, May 22, 2016 at 6:35 PM, Martinx - ジェームズ > >wrote: > >> Guys, >> >> I'm seeing a strange problem here, in my OVS+DPDK deployment, on top >> of Ubuntu 16.04 (DPDK 2.2 and OVS 2.5). >> >> Here is what I'm trying to do: run OVS with DPDK at the host, for KVM >> Guests that also, will be running more DPDK Apps. >> >> The host have 2 x 10G NICs, for OVS+DPDK and each KVM Guest receives >> its own VLAN tagged traffic (or all tags). >> >> There is an IXIA Traffic Generator sending 10G of traffic on both >> directions (20G total). >> >> Exemplifying, the problem is, lets say that I already have 2 VMs (or >> 10) running DPDK Apps (on top of dpdkvhostuser), everything is working >> as expected, then, if I boot the 3rd (or 11) KVM Guest, the OVS+DPDK >> bridge at the host, slows down, a lot! The 3rd (or 11) VM affects not >> only the host, but also, all the other neighbors VMs!!! >> >> NOTE: This problem appear since the boot of VM 1. >> >> Soon as you, inside of the 3rd VM, bind the VirtIO NIC to the >> DPDK-Compative Drivers, the speed comes back to normal. If you bind it >> back to "virtio-pci", boom! The OVS+DPDK at the host and all VMs loses >> too much speed. >> >> This problem is detailed at the following bug report: >> >> -- >> The OVS+DPDK dpdkvhostuser socket bridge, only works as expected, if >> the KVM Guest also have DPDK drivers loaded: >> >> https://bugs.launchpad.net/ubuntu/+source/openvswitch/+bug/1577256 >> -- >> >> Also, I've recorded a ~15 min screen cast video about this problem, >> so, you guys can see exactly what is happening here. >> >> >https://www.youtube.com/v/yHnaSikd9XY?version=3&vq=hd720&autoplay= >1 >> >> * At 5:25, I'm starting a VM that will boot up and load a DPDK App; >> >> * At 5:33, OVS+DPDK is messed up, it loses speed; >>The KVM running with virtio-pci drivers breaks OVS+DPDK at the >> host; >> >> * At 6:50, DPDK inside of the KVM guest loads up its drivers, kicking >> "virtio-pci", speed back to normal at the host; >> >> * At 7:43, started another KVM Guest, now, while virtio-pci driver is >> running, the OVS+DPDK at the host and the other VM, are very, very >> slow; >> >> * At 8:52, the second VM loads up DPDK Drivers, kicking virtio-pci, >> the speed is back to normal at the host, and on the other VM too; >> >> * At 10:00, the Ubuntu VM loads up virtio-pci drivers on its boot, >> the speed dropped at the hosts and on the other VMs; >> >> * 11:57, I'm starting "service dpdk start" inside of the Ubuntu >> guest, to kick up virtio-pci, and bang! Speed is back to normal >> everywhere; >> >> * 12:51, I'm trying to unbind the DPDK Drivers and return the >> virtio-pci, I forgot the syntax while recording the video, which is: >> "dpdk_nic_bind -b virtio-pci", so, I just rebooted it. But both >> "reboot" or "rebind to virtio-pci" triggers the bug. >> >> >> NOTE: I tried to subscriber to qemu-devel but, it is not working, I'm >> not receiving the confirmation e-mail, while qemu-stable worked. I >> don't know if it worth sending it to Linux Kernel too... >> >> >> Regards, >> Thiago >> >___ >dev mailing list >dev@openvswitch.org >http://openvswitch.org/mailman/listinfo/dev ___
Re: [ovs-dev] [PATCH v5 0/2] doc: Refactor DPDK install guide
Adding the reviewers of the INSTALL guide here, mistakenly had '--suppress-cc=all'. Bhanuprakash. >-Original Message- >From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of >Bhanuprakash Bodireddy >Sent: Thursday, May 26, 2016 1:47 PM >To: dev@openvswitch.org >Cc: Traynor, Kevin >Subject: [ovs-dev] [PATCH v5 0/2] doc: Refactor DPDK install guide > >This patchset refactors the present INSTALL.DPDK.md guide. > >The INSTALL guide is split in to two documents named INSTALL.DPDK and >INSTALL.DPDK-ADVANCED. The former document is simplified with emphasis >on installation, basic testcases and targets novice users. Sections on system >configuration, performance tuning, vhost walkthrough and IVSHMEM are >moved to DPDK-ADVANCED guide. > >Reviewers can review the rendered form here >https://github.com/bbodired/ovs/blob/master/INSTALL.DPDK.md >https://github.com/bbodired/ovs/blob/master/INSTALL.DPDK- >ADVANCED.md > >v4->v5: >* Rebased >* Add Ingress Policing Example in Rate Limiting section, Advanced Guide >* Minor fixes in Install DPDK, INSTALL OVS sections of INSTALL.DPDK.md > >V3->v4: >* Refactor hugepage section in Beginner and Advanced guides >* Added Guest libvirt configuration for vhostuser ports >* General formatting changes, enable hyperlinks and wording changes > >v2->v3: >* Rebased > >v1->v2: >* Rebased >* Update DPDK version to 16.04 >* Add vsperf section in ADVANCED Guide > >Bhanuprakash Bodireddy (2): > doc: Refactor DPDK install documentation > doc: Refactor DPDK install guide, add ADVANCED doc > > INSTALL.DPDK-ADVANCED.md | 863 ++ > INSTALL.DPDK.md | 1299 +++--- > 2 files changed, 1292 insertions(+), 870 deletions(-) create mode 100644 >INSTALL.DPDK-ADVANCED.md > >-- >2.4.11 > >___ >dev mailing list >dev@openvswitch.org >http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] If 1 KVM Guest loads the virtio-pci, on top of dpdkvhostuser OVS socket interface, it slows down everything!
From: Martinx - ジェームズ [mailto:thiagocmarti...@gmail.com] Sent: Monday, May 30, 2016 5:01 PM To: Bodireddy, Bhanuprakash Cc: Christian Ehrhardt ; ; dev ; qemu-sta...@nongnu.org Subject: Re: [ovs-dev] If 1 KVM Guest loads the virtio-pci, on top of dpdkvhostuser OVS socket interface, it slows down everything! Hello Bhanu, I'm a little bit confused, you said that the problem can be fixed but, later, you also said that: "On a Multi VM setup even with the above patch applied, one might see aggregate throughput difference when vNIC is bind to igb_uio vs virtio-pci"... My idea is to use OVS with DPDK in a multi-vm environment but, based on your answer, this is not possible, because the VM A, can interfere with VM B... Is that true even with that patch applied, can you confirm this? [BHANU] With the patch applied the issue should be fixed. Without the patch, VM A can interfere with VM B when VM A isn’t processing its queues which eventually triggers vhost send retries until timeout (100ms in this case). This cause the pmd thread to slow down and that affects the other Virtual Machines(VM B) on the host as happened in your case. The patch that I pointed will remove the retry logic completely. Also in your case, packets are sent to the idle VM (no packets drained from the virt queues inside) which triggered the issue and affected the neighboring VMs. consider sending the traffic to the newly booted VM after the forwarding is enabled inside the guest. I don't think that diverting the traffic from a VM that loaded virtio-pci drivers is a doable solution (since you can't predict what the owners of the VMs will be doing), also, specially because in my env, the DPDK App is a L2 bridge, so, it receives traffic that is not destined to it (might be even harder to try to do this)... I have all the required hardware to keep testing this, so, let me know when you guys (Intel / Canonical) have newer versions, I'll test it with pleasure! :-) [BHANU] Apply the patch from the thread and this should resolve the issue reported. Thanks! Thiago On 25 May 2016 at 11:00, Bodireddy, Bhanuprakash mailto:bhanuprakash.bodire...@intel.com>> wrote: I could reproduce the issue and this can be fixed as below Firstly, the throughput issues observed with other VMs when a new VM is started can be fixed using the patch in the thread http://openvswitch.org/pipermail/dev/2016-May/071615.html. I have put up an explanation in this thread for the cause of issue especially with multi VM setup on OVS DPDK. On a Multi VM setup even with the above patch applied, one might see aggregate throughput difference when vNIC is bind to igb_uio vs virtio-pci, this is for the fact that the interrupt overhead is significantly higher when virtio-pci is in use. More importantly if you have setup explicit flows matching VM's MAC/IP, disabling the flows to the VM that are idle would improve the aggregate throughput and lessen the burden on the pmd thread. 'watch -d ./utilities/ovs-appctl dpctl/show -s' will show no. of packet stats. Regards, Bhanu Prakash. >-Original Message- >From: dev >[mailto:dev-boun...@openvswitch.org<mailto:dev-boun...@openvswitch.org>] On >Behalf Of Christian >Ehrhardt >Sent: Wednesday, May 25, 2016 7:08 AM >To: Martinx - ジェームズ >mailto:thiagocmarti...@gmail.com>> >Cc: mailto:dev@openvswitch.org>> >mailto:dev@openvswitch.org>>; dev >mailto:d...@dpdk.org>>; >qemu-sta...@nongnu.org<mailto:qemu-sta...@nongnu.org> >Subject: Re: [ovs-dev] If 1 KVM Guest loads the virtio-pci, on top of >dpdkvhostuser OVS socket interface, it slows down everything! > >Hi again, >another forgotten case. > >I currently I lack the HW to fully reproduce this, but the video summary is >pretty good and shows the issue in an impressive way. > >Also the description is good and here as well I wonder if anybody else could >reproduce this. >Any hints / insights are welcome. > >P.S. and also again - two list cross posting, but here as well it is yet >unclear >which it belongs to so I'll keep it as well > >Christian Ehrhardt >Software Engineer, Ubuntu Server >Canonical Ltd > >On Sun, May 22, 2016 at 6:35 PM, Martinx - ジェームズ >mailto:thiagocmarti...@gmail.com>> >wrote: > >> Guys, >> >> I'm seeing a strange problem here, in my OVS+DPDK deployment, on top >> of Ubuntu 16.04 (DPDK 2.2 and OVS 2.5). >> >> Here is what I'm trying to do: run OVS with DPDK at the host, for KVM >> Guests that also, will be running more DPDK Apps. >> >> The host have 2 x 10G NICs, for OVS+DPDK and each KVM Guest receives >> its own VLAN tagged traffic (or all tags). >> >> There is an IXIA Traffic Generator sending 10G of traffic on both >>
Re: [ovs-dev] [PATCH 2/2] doc: Refactor DPDK install guide, add ADVANCED doc
Hello Mauricio, Comments inline. >+ >+IVSHMEM will not work with 2MB hugepages. It will work only with 1GB huge >pages. >Is this true? >I have used ivshmem with 2MB for a while without facing any problem. >AFAIK one can’t use the dpdk rings using 2MB pages when sharing with >VM. This is due to known limitation where sharing multiple file descriptors to >guest isn’t supported. For the same reason 1GB hugepages needs to be >allocated and exposed to the guest through ivshmem device as a single 1GB >page. >Do you mean, you could do the same with 2MB rings ? I would be interested >in the steps. > > >Yes, we use 2MB huge pages. What we did was to create a command line >generator for qemu, (https://github.com/netgroup-polito/un- >orchestrator/blob/master/orchestrator/compute_controller/plugins/kvm- >libvirt/cmdline_generator/cmdline_generator.c), it uses the IVHSMEM library >from DPDK to expose the rte_rings and the mempools to the guests. >It has to be used with a modified version of QEMU. There is a patch that >applies on qemu v2.2.1: (https://github.com/netgroup-polito/un- >orchestrator/blob/master/orchestrator/compute_controller/plugins/kvm- >libvirt/patches/ivshmem-qemu-2.2.1.patch) [BHANU]: well, I was not quite successful doing this with 2MB hugepages, but the same works with 1GB hugepages on the host. The cmdline_generator utility generated the below args, which I passed to qemu. Also note that I have patched Qemu v2.2.1 with 'ivshmem-qemu-2.2.1.patch'. " -device ivshmem,size=4M,shm=fd:/dev/hugepages/rtemap_0:0x0:0x20:/dev/zero:0x0:0x1fc000:/var/run/.dpdk_ivshmem_metadata_cmdline:0x0:0x4000" Can you list the steps here? > >+ >+ The steps (1-5) in 3.3 section of INSTALL.DPDK guide will create & >initialize >DB, >+ start vswitchd and add dpdk devices to bridge br0. >+ >+ 1. Add DPDK ring port to the bridge >+ >+ ``` >+ ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 type=dpdkr >+ ``` >+ >+ 2. Copy runtime configuration to VM, To achieve this copy the files to a >temporary >+ directory, say /tmp/rte_config and export the directory to the VM >+ >+ ``` >+ mkdir /tmp/rte_config >+ chmod 644 /tmp/rte_config >+ cp -a /run/.rte_config /run/.rte_hugepage_info /tmp/rte_config >+ ``` >+ >+ 3. Build modified Qemu >+ >+ ``` >+ cd /usr/src/ >+ wget https://github.com/01org/dpdk-ovs/archive/development.zip >+ unzip development.zip >+ cd dpdk-ovs-development/qemu >+ ./configure --target-list=x86_64-softmmu --enable-debug --extra- >cflags='-g' >+ make -j 4 >+ ``` >+ >+ 4. start Guest VM >+ >+ ``` >+ export VM_NAME=ivshmem-vm >+ export QCOW2_IMAGE=CentOS7_x86_64.qcow2 >+ export QEMU_BIN=/usr/src/dpdk-ovs-development/qemu/x86_64- >softmmu/qemu-system-x86_64 >+ >+ taskset 0x20 $QEMU_BIN -cpu host -smp 2,cores=2 -hda >$QCOW2_IMAGE -drive file=fat:rw:/tmp/rte_config,snapshot=off -m 4096M - >-enable-kvm -name $VM_NAME -nographic -vnc :2 -pidfile /tmp/vm1.pid - >mem-path /dev/hugepages -mem-prealloc -device >ivshmem,size=1024M,shm=fd:/dev/hugepages/rtemap_0:0x0:0x4000 >+ ``` >+ >I am very curious about this way of sharing memory with a VM using ivshmem. >I haven't seen any similar thing before. >Why not to use the ivshmem library >(http://dpdk.org/doc/guides/prog_guide/ivshmem_lib.html) to generate the >qemu command line? >Yes, this can also be used to generate qemu command line. > >I think the best way to expose dpdkr ports to the guests is using the >IVSHMEM library, in this way many of the steps described on this guide are >not necessary. I could try to document it, the only problem I'm facing now if >the lack of time, then it would take some days. [BHANU] Any help in simplifying setting up the ivshmem would be highly appreciated. Can you share the steps so that I can incorporate in the next version and send it across (or) we can do that as an incremental patch once the refactored install guide is up streamed. Regards, Bhanu Prakash. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] --with-dpdk configure option issue
>-Original Message- >From: Mauricio Vásquez [mailto:mauriciovasquezber...@gmail.com] >Sent: Wednesday, June 1, 2016 11:17 AM >To: ovs dev >Cc: Bodireddy, Bhanuprakash >Subject: --with-dpdk configure option issue > >Dear All, >I noticed that when I run the command >./configure --with-dpdk=$SOME_NON_EXISTING_ENV_VAR >it does not give me an error, somewhere it says: >"checking whether dpdk datapath is enabled... no" but there is not an explicit >error. >I think this behavior should be avoided, an explicit error should be printed to >avoid any possible confusion, as for example when DPDK_BUILD is not set. Thanks for reporting this issue. This is treated more a misconfiguration than a bug. Please see below. The configure script was modified to handle auto discovery of DPDK library if present in standard search paths with ' ./configure --with-dpdk' option. It also handles other valid options as listed below in case (a),(b),(c), (e). All the below options will set string 'with_dpdk' in OVS_CHECK_DPDK function in acinclude.m4. (a) ./configure --with-dpdk=$DPDK_BUILD [ $with_dpdk will be a valid $DPDK_BUILD dir] (b) ./configure --with-dpdk=$DPDK_BUILD/install [ $with_dpdk will be a valid $DPDK_BUILD/install dir] (c) ./configure --without-dpdk. [$with_dpdk will be 'no'] (d) ./configure --with-dpdk="" [$with_dpdk will be an empty string] (e) ./configure [ $with_dpdk will be an empty string] In case (d), when empty string is passed to --with-dpdk option, it's not known if user has invoked case (d) or case (e). Hence I throw dpdk datapath isn't enabled as part of configuration. i.e "checking whether dpdk datapath is enabled... no". > >Bhanuprakash, I CC'ed you because you are author of 40b5ea86319f >("acinclude: Autodetect DPDK location when configuring OVS"), then I think >you know how to fix it. >Mauricio V, ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] OVS is failing while starting with DPDK
>-Original Message- >From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Sheroo >Pratap >Sent: Wednesday, June 1, 2016 9:48 AM >To: ovs dev >Subject: [ovs-dev] OVS is failing while starting with DPDK > >Hi All, > > I am trying to start OVS with DPDK but it is failing with below error. > >root@osboxes:/home/osboxes/ovs# ovs-vswitchd unix:$DB_SOCK --pidfile -- >detach 2016-06-01T08:25:32Z|1|ovs_numa|INFO|Discovered 1 CPU cores >on NUMA node 0 2016-06-01T08:25:32Z|2|ovs_numa|INFO|Discovered 1 >NUMA nodes and 1 CPU cores >2016-06- >01T08:25:32Z|3|reconnect|INFO|unix:/usr/local/var/run/openvswitch/d >b.sock: >connecting... >2016-06- >01T08:25:32Z|4|reconnect|INFO|unix:/usr/local/var/run/openvswitch/d >b.sock: >connected >*2016-06-01T08:25:32Z|5|dpdk|ERR|DPDK not supported in this copy of >Open vSwitch.* root@osboxes:/home/osboxes/ovs# > >Below is the steps is have followed so far. > >1) My OVS version is master ovs i.e. 2.5.90, ubuntu 15.10 and kernel version is >4.6 >2) DPDK version is 16.04 (as recommended in INSTALL.DPDK.md of master >ovs) >3) Following are the steps as have executed as mentioned in >INSTALL.DPDK.md >1) export DPDK_DIR=/usr/src/dpdk-16.04 >2) cd $DPDK_DIR >3) make install T=x86_64-native-linuxapp-gcc DESTDIR=install >4) export DPDK_BUILD=$DPDK_DIR/x86_64-native-linuxapp-gcc/ >5) cd $(OVS_DIR)/ > 6) ./boot.sh > 7) ./configure --with-dpdk=$DPDK_BUILD CFLAGS="-g -O2 -Wno-cast- >align" > 8) make CFLAGS='-O3 -march=native > 9) added below line in /etc/default/grub > default_hugepagesz=1GB hugepagesz=1G hugepages=1 > Snapshot of grub > GRUB_DISTRIBUTOR=`lsb_release -i -s 2> /dev/null || echo Debian` > GRUB_CMDLINE_LINUX_DEFAULT="quiet splash >default_hugepagesz=1GB hugepagesz=1G hugepages=1" > GRUB_CMDLINE_LINUX="" > 10) modprobe uio > 11) insmod $DPDK_BUILD/kmod/igb_uio.ko > 12) $DPDK_DIR/tools/dpdk_nic_bind.py --bind=igb_uio eth0 > 13) mount -t hugetlbfs -o pagesize=2MB none /dev/hugepages > 14) mkdir -p /usr/local/etc/openvswitch > 15) mkdir -p /usr/local/var/run/openvswitch > 16) rm /usr/local/etc/openvswitch/conf.db > 17) ovsdb-tool create /usr/local/etc/openvswitch/conf.db \ > /usr/local/share/openvswitch/vswitch.ovsschema > > 18) removed SSL certificates while configuraing ovsdb : > ovsdb-server >--remote=punix:/usr/local/var/run/openvswitch/db.sock >--remote=db:Open_vSwitch,Open_vSwitch,manager_options --pidfile -- >detach > 19) ovs-vsctl --no-wait init > 20) export DB_SOCK=/usr/local/var/run/openvswitch/db.sock > 21) ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-init=true > > > > > > > >* 22) faling here : ovs-vswitchd unix:$DB_SOCK --pidfile >--detachroot@osboxes:/home/osboxes/ovs# ovs-vswitchd >unix:$DB_SOCK --pidfile --detach >2016-06-01T08:25:32Z|1|ovs_numa|INFO|Discovered 1 CPU cores on >NUMA >node 02016-06-01T08:25:32Z|2|ovs_numa|INFO|Discovered 1 >NUMA nodes and 1 CPU cores >2016-06- >01T08:25:32Z|3|reconnect|INFO|unix:/usr/local/var/run/openvswitch/d >b.sock: >connecting... >2016-06- >01T08:25:32Z|4|reconnect|INFO|unix:/usr/local/var/run/openvswitch/d >b.sock: >connected2016-06-01T08:25:32Z|5|dpdk|ERR|DPDK not supported >in this copy of Open vSwitch.* This is more likely OVS configuration issue and check the 'DPDK_BUILD' env variable if it is set appropriately. Check this thread for more information: http://openvswitch.org/pipermail/dev/2016-June/071955.html > >Please anyone can help, after executing step 22 getting error "DPDK not >supported in this copy of Open vSwitch" > >Thanks and Regards > S Pratap >___ >dev mailing list >dev@openvswitch.org >http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] --with-dpdk configure option issue
>-Original Message- >From: Mauricio Vásquez [mailto:mauriciovasquezber...@gmail.com] >Sent: Wednesday, June 1, 2016 3:37 PM >To: Bodireddy, Bhanuprakash >Cc: ovs dev >Subject: Re: --with-dpdk configure option issue > > > >On Wed, Jun 1, 2016 at 3:29 PM, Bodireddy, Bhanuprakash > wrote: >>-Original Message- >>From: Mauricio Vásquez [mailto:mauriciovasquezber...@gmail.com] >>Sent: Wednesday, June 1, 2016 11:17 AM >>To: ovs dev >>Cc: Bodireddy, Bhanuprakash >>Subject: --with-dpdk configure option issue >> >>Dear All, >>I noticed that when I run the command >>./configure --with-dpdk=$SOME_NON_EXISTING_ENV_VAR >>it does not give me an error, somewhere it says: >>"checking whether dpdk datapath is enabled... no" but there is not an >explicit >>error. >>I think this behavior should be avoided, an explicit error should be printed >>to >>avoid any possible confusion, as for example when DPDK_BUILD is not set. > >Thanks for reporting this issue. This is treated more a misconfiguration than >a >bug. Please see below. >The configure script was modified to handle auto discovery of DPDK library if >present in standard search paths with ' ./configure --with-dpdk' option. >It also handles other valid options as listed below in case (a),(b),(c), (e). >All the >below options will set string 'with_dpdk' in OVS_CHECK_DPDK function in >acinclude.m4. > >(a) ./configure --with-dpdk=$DPDK_BUILD [ $with_dpdk >will be a >valid $DPDK_BUILD dir] >(b) ./configure --with-dpdk=$DPDK_BUILD/install [ $with_dpdk will be a >valid $DPDK_BUILD/install dir] >(c) ./configure --without-dpdk. >[$with_dpdk will be >'no'] >(d) ./configure --with-dpdk="" >[$with_dpdk will be an >empty string] >(e) ./configure > [ $with_dpdk will be an >empty string] > >In case (d), when empty string is passed to --with-dpdk option, it's not known >if user has invoked case (d) or case (e). Hence I throw dpdk datapath isn't >enabled as part of configuration. >i.e "checking whether dpdk datapath is enabled... no". > >I had a look at the autotools documentation and I think there is a way to >distinguish between cases (d) and (e). >What do you think about something like this? > >diff --git a/acinclude.m4 b/acinclude.m4 >index f3de855..9314d82 100644 >--- a/acinclude.m4 >+++ b/acinclude.m4 >@@ -161,10 +161,11 @@ dnl Configure DPDK source tree > AC_DEFUN([OVS_CHECK_DPDK], [ > AC_ARG_WITH([dpdk], > [AC_HELP_STRING([--with-dpdk=/path/to/dpdk], >- [Specify the DPDK build directory])]) >+ [Specify the DPDK build directory])], >+ [have_dpdk=true]) > > AC_MSG_CHECKING([whether dpdk datapath is enabled]) >- if test -z "$with_dpdk" || test "$with_dpdk" = no; then >+ if test "$have_dpdk" != true; then > AC_MSG_RESULT([no]) > DPDKLIB_FOUND=false > else This looks fine, but for one minor issue when using ./configure --without-dpdk option. Below it says dpdk datapath is still enabled and errors out.. checking whether dpdk datapath is enabled... yes checking for no/include/rte_config.h... no checking for no/include/dpdk/rte_config.h... no configure: error: Could not find DPDK libraries in no/lib I don’t think anyone would use '--without-dpdk' explicitly to avoid building DPDK datapath with OVS. Regards, Bhanu Prakash. > >> >>Bhanuprakash, I CC'ed you because you are author of 40b5ea86319f >>("acinclude: Autodetect DPDK location when configuring OVS"), then I think >>you know how to fix it. >>Mauricio V, ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] --with-dpdk configure option issue
>-Original Message- >From: Mauricio Vásquez [mailto:mauriciovasquezber...@gmail.com] >Sent: Wednesday, June 1, 2016 4:28 PM >To: Bodireddy, Bhanuprakash >Cc: ovs dev >Subject: Re: --with-dpdk configure option issue > > > >On Wed, Jun 1, 2016 at 5:14 PM, Bodireddy, Bhanuprakash > wrote: >>-Original Message- >>From: Mauricio Vásquez [mailto:mauriciovasquezber...@gmail.com] >>Sent: Wednesday, June 1, 2016 3:37 PM >>To: Bodireddy, Bhanuprakash >>Cc: ovs dev >>Subject: Re: --with-dpdk configure option issue >> >> >> >>On Wed, Jun 1, 2016 at 3:29 PM, Bodireddy, Bhanuprakash >> wrote: >>>-Original Message- >>>From: Mauricio Vásquez [mailto:mauriciovasquezber...@gmail.com] >>>Sent: Wednesday, June 1, 2016 11:17 AM >>>To: ovs dev >>>Cc: Bodireddy, Bhanuprakash >>>Subject: --with-dpdk configure option issue >>> >>>Dear All, >>>I noticed that when I run the command >>>./configure --with-dpdk=$SOME_NON_EXISTING_ENV_VAR >>>it does not give me an error, somewhere it says: >>>"checking whether dpdk datapath is enabled... no" but there is not an >>explicit >>>error. >>>I think this behavior should be avoided, an explicit error should be printed >to >>>avoid any possible confusion, as for example when DPDK_BUILD is not set. >> >>Thanks for reporting this issue. This is treated more a misconfiguration than >a >>bug. Please see below. >>The configure script was modified to handle auto discovery of DPDK library if >>present in standard search paths with ' ./configure --with-dpdk' option. >>It also handles other valid options as listed below in case (a),(b),(c), (e). >>All >the >>below options will set string 'with_dpdk' in OVS_CHECK_DPDK function in >>acinclude.m4. >> >>(a) ./configure --with-dpdk=$DPDK_BUILD [ $with_dpdk >>will be a >>valid $DPDK_BUILD dir] >>(b) ./configure --with-dpdk=$DPDK_BUILD/install [ $with_dpdk will be >>a >>valid $DPDK_BUILD/install dir] >>(c) ./configure --without-dpdk. >> [$with_dpdk will be >>'no'] >>(d) ./configure --with-dpdk="" >> [$with_dpdk will be >an >>empty string] >>(e) ./configure >> [ $with_dpdk will be >an >>empty string] >> >>In case (d), when empty string is passed to --with-dpdk option, it's not >known >>if user has invoked case (d) or case (e). Hence I throw dpdk datapath isn't >>enabled as part of configuration. >>i.e "checking whether dpdk datapath is enabled... no". >> >>I had a look at the autotools documentation and I think there is a way to >>distinguish between cases (d) and (e). >>What do you think about something like this? >> >>diff --git a/acinclude.m4 b/acinclude.m4 >>index f3de855..9314d82 100644 >>--- a/acinclude.m4 >>+++ b/acinclude.m4 >>@@ -161,10 +161,11 @@ dnl Configure DPDK source tree >> AC_DEFUN([OVS_CHECK_DPDK], [ >> AC_ARG_WITH([dpdk], >> [AC_HELP_STRING([--with-dpdk=/path/to/dpdk], >>- [Specify the DPDK build directory])]) >>+ [Specify the DPDK build directory])], >>+ [have_dpdk=true]) >> >> AC_MSG_CHECKING([whether dpdk datapath is enabled]) >>- if test -z "$with_dpdk" || test "$with_dpdk" = no; then >>+ if test "$have_dpdk" != true; then >> AC_MSG_RESULT([no]) >> DPDKLIB_FOUND=false >> else >This looks fine, but for one minor issue when using ./configure --without-dpdk >option. >Below it says dpdk datapath is still enabled and errors out.. > >checking whether dpdk datapath is enabled... yes >checking for no/include/rte_config.h... no >checking for no/include/dpdk/rte_config.h... no >configure: error: Could not find DPDK libraries in no/lib > >You are right!, what about? Perfect! , Infact I was about to send the patch with the same tweak. You can submit this patch, I will ack it. >diff --git a/acinclude.m4 b/acinclude.m4 >index f3de855..a5080ef 100644 >--- a/acinclude.m4 >+++ b/acinclude.m4 >@@ -161,10 +161,11 @@ dnl Configure DPDK source tree > AC_DEFUN([OVS_CHECK_DPDK], [ > AC_ARG_WITH([dpdk], > [AC_HELP_STRING([--with-dpdk=/path/to/dpdk], >- [Speci
Re: [ovs-dev] [PATCH 1/2] doc: Refactor DPDK install documentation
Thanks Flavio for reviewing the install guide in detail. My comments inline. >-Original Message- >From: Flavio Leitner [mailto:f...@sysclose.org] >Sent: Tuesday, May 31, 2016 9:44 PM >To: Bodireddy, Bhanuprakash >Cc: dev@openvswitch.org; Traynor, Kevin >Subject: Re: [ovs-dev] [PATCH 1/2] doc: Refactor DPDK install documentation > > >Hi, > >Thanks for doing this. >I have some comments inline. >fbl > > >On Thu, May 26, 2016 at 01:46:42PM +0100, Bhanuprakash Bodireddy wrote: >> Refactor the INSTALL.DPDK in to two documents named INSTALL.DPDK and >> INSTALL.DPDK-ADVANCED. While INSTALL.DPDK document shall facilitate >the >> novice user in setting up the OVS DPDK and running it out of box, the >> ADVANCED document is targeted at expert users looking for the optimum >> performance running dpdk datapath. >> >> This commit updates INSTALL.DPDK.md document. >> >> Signed-off-by: Bhanuprakash Bodireddy > >> --- >> INSTALL.DPDK.md | 1299 ++ >- >> 1 file changed, 429 insertions(+), 870 deletions(-) >> >> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md >> index 68735cc..561631f 100644 >> --- a/INSTALL.DPDK.md >> +++ b/INSTALL.DPDK.md >> @@ -1,1020 +1,579 @@ >> -Using Open vSwitch with DPDK >> - >> +OVS DPDK INSTALL GUIDE >> + >> >> -`./testpmd -c 0x3 -n 4 --socket-mem 512 -- --burst=64 -i -- >txqflags=0xf00 --disable-hw-vlan --forward-mode=io --auto-start` >> + Note: For IVSHMEM, Set `export DPDK_TARGET=x86_64-ivshmem- >linuxapp-gcc` >> >> -See below information on dpdkvhostcuse and dpdkvhostuser ports. >> -See [DPDK Docs] for more information on `testpmd`. >> +### 2.3 Install OVS > >It seems to me that this section could be better. We have a good INSTALL.md >file covering all options, additional details and also have pointers to more >specifics like how to do in Fedora or Debian. > >For instance, Fedora spec file in branch master allows you to build with >DPDK support with a simple command line: > > $ make rpm-fedora RPMBUILD_OPT="--with dpdk" > >Nothing wrong documenting a generic recipe, but I missed the other >options. Perhaps something like: > >2.3 Install OVS > OVS can be installed using different methods. The only requirement to >install >with DPDK support enabled is to pass an extra argument to ./configure. You >can find >additional information in INSTALL.md or more specific instructions for a >distribution >in the other INSTALL.*.md files available in the repository. This documents >focus on >a generic recipe that should work for most cases Good point, I will rework this section keeping your comments in mind. Also I would add hyperlinks to INSTALL.md and would redirect users doing distribution specific builds to respective pages. > >I am sure it can be reworded in a better way, but it shows my point. > > >> + OVS can be downloaded in compressed format from the OVS release >page (or) >> + cloned from git repository if user intends to develop and contribute >> + patches upstream. >> >> + - [Download OVS] tar ball and extract the file, for example in to /usr/src >> + and set OVS_DIR >> >> + ``` >> + wget -O ovs.tar https://github.com/openvswitch/ovs/tarball/master >> + mkdir -p /usr/src/ovs >> + tar -xvf ovs.tar -C /usr/src/ovs --strip-components=1 >> + export OVS_DIR=/usr/src/ovs >> + ``` >> >> -DPDK Rings : >> - >> + - Clone the Git repository for OVS, for example in to /usr/src >> >> -Following the steps above to create a bridge, you can now add dpdk rings >> -as a port to the vswitch. OVS will expect the DPDK ring device name to >> -start with dpdkr and end with a portid. >> + ``` >> + cd /usr/src/ >> + git clone https://github.com/openvswitch/ovs.git >> + export OVS_DIR=/usr/src/ovs >> + ``` >> >> -`ovs-vsctl add-port br0 dpdkr0 -- set Interface dpdkr0 type=dpdkr` >> + - Install OVS dependencies >> >> -DPDK rings client test application >> + GNU make, GCC 4.x (or) Clang 3.4 (Mandatory) >> + libssl, libcap-ng, Python 2.7 (Optional) >> + More information can be found at [Build Requirements] >> >> -Included in the test directory is a sample DPDK application for testing >> -the rings. This is from the base dpdk directory and modified to work >> -with the ring naming used within ovs. >> + - Configure, Install OVS >>
Re: [ovs-dev] [PATCH 2/2] doc: Refactor DPDK install guide, add ADVANCED doc
>-Original Message- >From: Flavio Leitner [mailto:f...@sysclose.org] >Sent: Tuesday, May 31, 2016 10:04 PM >To: Bodireddy, Bhanuprakash >Cc: dev@openvswitch.org; Traynor, Kevin >Subject: Re: [ovs-dev] [PATCH 2/2] doc: Refactor DPDK install guide, add >ADVANCED doc > > >This looks very good and I just have two minor comment below. > >On Thu, May 26, 2016 at 01:46:43PM +0100, Bhanuprakash Bodireddy wrote: >> Add INSTALL.DPDK-ADVANCED document that is forked off from original >> INSTALL.DPDK guide. This document is targeted at users looking for >> optimum performance on OVS using dpdk datapath. >> >> Signed-off-by: Bhanuprakash Bodireddy >> >> --- >[...] > >> + 5. Running sample "dpdk ring" app in VM >> + >> + ``` >> + umount /dev/hugepages >> + mount -t hugetlbfs hugetlbfs /mnt/hugepages >> + ln -s /sys/devices/pci:00/:00:04.0/resource2 >/dev/hugepages/rtemap_0 >> + mount -o iocharset=utf8 /dev/sdb1 /mnt/ovs >> + cp /mnt/ovs/.rte_config /run/. >> + cp /mnt/ovs/.rte_hugepage_info /run/. >> + >> + # Build the DPDK ring application in the VM >> + export RTE_SDK=/root/dpdk-16.04 >> + export RTE_TARGET=x86_64-ivshmem-linuxapp-gcc >> + make >> + >> + # Run dpdkring application >> + ./build/dpdkr -c 1 -n 4 --proc-type=secondary -- -n 0 >> + where "-n 0" refers to ring '0' i.e dpdkr0 >> + ``` >> + >> +## 6. Vhost Walkthrough >> + >> +DPDK 16.04 supports two types of vhost: >> +1. vhost-user - enabled default >> +2. vhost-cuse - Legacy, disabled by default > >That doesn't show nicely on the web. They are all in the same line. You are right, I will correct this. > >The same comment for the previous patch about the mount command in >''Mount huge pages'' here. Just add 'if not mounted' or something like that. Agree, Will add this. Thanks Flavio for reviewing the ADVANCED guide. Regards, Bhanu Prakash. > >Thanks, >-- >fbl ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2] netdev-dpdk: Set pmd thread priority
>-Original Message- >From: Daniele Di Proietto [mailto:daniele.di.proie...@gmail.com] >Sent: Friday, July 15, 2016 2:19 AM >To: Bodireddy, Bhanuprakash >Cc: dev@openvswitch.org >Subject: Re: [ovs-dev] [PATCH v2] netdev-dpdk: Set pmd thread priority > >Thanks for the patch. Hello Daniele, Thanks for looking in to this patch. >Is there any reason why core 0 is treated specially? it's very uncommon to see 'core0' isolated and HPC threads pinned to the core0. On multicore systems to improve application performance and mitigate Interrupts, IRQs get explicitly pinned to Core 0. In few more cases, core 0 is treated more like a management/control core that is used to launch applications on other cores. For this reasons I treat Core 0 special. >I think we should put pmd_thread_setpriority in lib/ovs-numa.c (adding >a ovs_numa prefix), and do nothing if dummy_numa is false. Agree. Or perhaps >integrate it the pthread_setschedparam in >ovs_numa_thread_setaffinity_core(). >I've noticed that processes with the same affinity as a PMD thread will >become >totally unresponsive after this patch. Is this expected? Will this have a >negative >impact on the overall stability of the system? There are 2 sides to this problem. (i) Out of Box Deployment (Not specifying dpdk-lcore-mask, pmd-cpu-mask): As it is now, when OVS DPDK is run out of box, one pmd thread shall be created and gets pinned to core 0. In this case the pmd thread shall run with default scheduling policy and priority with no impact to the stability of the system. (ii) High performance Deployment with SA (Explicitly specify dpdk-lcore-mask, pmd-cpu-mask): In this case user wants optimum Fastpath performance + SA and is explicitly pinning the control thread and pmd threads to cores. Only in this case the Real time scheduling policy shall be applied to the pmd threads as any disruption to the threads would impact the fastpath performance. I have come across cases where in multi VM deployments with HT enabled, due to wrong pinning of Qemu threads to the pmd cores, the pmd thread starvation was observed which eventually destabilizing the system. Regards, Bhanuprakash. > >2016-07-05 13:05 GMT-07:00 Bhanuprakash Bodireddy >: >Set the DPDK pmd thread scheduling policy to SCHED_RR and static >priority to highest priority value of the policy. This is to deal with >pmd thread starvation case where another cpu hogging process can get >scheduled/affinitized on to the same core the pmd thread is running there >by significantly impacting the datapath performance. > >Setting the realtime scheduling policy to the pmd threads is one step >towards Fastpath Service Assurance in OVS DPDK. > >The realtime scheduling policy is applied only when CPU mask is passed >to 'pmd-cpu-mask'. The exception to this is 'pmd-cpu-mask=1', where the >policy and priority shall not be applied to pmd thread spawned on core0. >For example: > > * In the absence of pmd-cpu-mask or if pmd-cpu-mask=1, one pmd > thread shall be created and affinitized to 'core 0' with default > scheduling policy and priority applied. > > * If pmd-cpu-mask is specified with CPU mask > 1, one or more pmd > threads shall be spawned on the corresponding core(s) in the mask > and real time scheduling policy SCHED_RR and highest static > priority is applied to the pmd thread(s). > >To reproduce the issue use following commands: > >ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6 >taskset 0x2 cat /dev/zero > /dev/null & > >Also OVS control threads should not be affinitized to the pmd cores. >For example 'dpdk-lcore-mask' and 'pmd-cpu-mask' should be exclusive. > >v1->v2: >* Removed #ifdef and introduced dummy function "pmd_thread_setpriority" > in netdev-dpdk.h >* Rebase > >Signed-off-by: Bhanuprakash Bodireddy > >--- > lib/dpif-netdev.c | 8 > lib/netdev-dpdk.c | 14 ++ > lib/netdev-dpdk.h | 7 +++ > 3 files changed, 29 insertions(+) > >diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >index 37c2631..6ff81d6 100644 >--- a/lib/dpif-netdev.c >+++ b/lib/dpif-netdev.c >@@ -2849,6 +2849,14 @@ pmd_thread_main(void *f_) > ovs_numa_thread_setaffinity_core(pmd->core_id); > dpdk_set_lcore_id(pmd->core_id); > poll_cnt = pmd_load_queues_and_ports(pmd, &poll_list); >+ >+ /* Set pmd thread's scheduling policy to SCHED_RR and priority to >+ * highest priority of SCHED_RR policy, In absence of pmd-cpu-mask (or) >+ * pmd-cpu-mask=1, default scheduling policy and priority shall >+ * apply to pmd thread */ >+ if (pmd->core_id) { >+ pmd_thread_setpriori
Re: [ovs-dev] [PATCH v4] Makefile.am: Add clang static analysis support
>> +Open vSwitch includes a Makefile target to trigger static code >> +Analysis and the instructions are below. >> + >> +1. ./boot.sh >> +2. ./configure CC=clang (when using clang compiler) >> + ./configure CC=gcc CFLAGS="-std=gnu99" (when using GCC) 3. make >> +clang-analyze > >OK, the above works for me now. Thanks. > >> +4. scan-view --host=IPADDR host --port PORT >> +$OVS_DIR>/clang-analyzer-results/-mm-dd-114251-1027-1 >> +--allow-all-hosts > >The above doesn't seem right. "host" is spurious and the specific digits you >include are different (I think that they are the time of day). > >The last line of output from "make clang-analyze" lists a command to >run: > >scan-build: Run 'scan-view /home/blp/nicira/ovs/tests/clang-analyzer- >results/2016-07-14-091820-11158-1' to examine bug reports. > >It's probably easiest to advise the user to just run that command. > >> +5. Visit http://ipaddr:PORT/ for analysis report. > >For me, at least, when I run the command above, the URL automatically pops >up in my web browser, so that I don't have to do anything extra. Hello Ben, I have sent out V5 patch with the modifications you suggested in this thread. Regards, Bhanu Prakash. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2] netdev-dpdk: Set pmd thread priority
>Thanks for the explanation. > >I still think it's weird to hardcode an exception for core 0. > >If no pmd-cpu-mask is specified other cores might be used, depending on the >numa affinity. >Perhaps we can call set_priority only if pmd-cpu-mask is specified? That >seems more consistent. I agree to this and looks good to me. I will send out v3 as discussed here. Regards, Bhanu Prakash. >Thanks, >Daniele > >2016-07-15 7:52 GMT-07:00 Bodireddy, Bhanuprakash >: >>-Original Message- >>From: Daniele Di Proietto [mailto:daniele.di.proie...@gmail.com] >>Sent: Friday, July 15, 2016 2:19 AM >>To: Bodireddy, Bhanuprakash >>Cc: dev@openvswitch.org >>Subject: Re: [ovs-dev] [PATCH v2] netdev-dpdk: Set pmd thread priority >> >>Thanks for the patch. >Hello Daniele, >Thanks for looking in to this patch. > >>Is there any reason why core 0 is treated specially? >it's very uncommon to see 'core0' isolated and HPC threads pinned to the >core0. On multicore systems to improve application performance and >mitigate Interrupts, IRQs get explicitly pinned to Core 0. In few more cases, >core 0 is treated more like a management/control core that is used to launch >applications on other cores. For this reasons I treat Core 0 special. > >>I think we should put pmd_thread_setpriority in lib/ovs-numa.c (adding >>a ovs_numa prefix), and do nothing if dummy_numa is false. > >Agree. > > Or perhaps >>integrate it the pthread_setschedparam in >>ovs_numa_thread_setaffinity_core(). >>I've noticed that processes with the same affinity as a PMD thread will >>become >>totally unresponsive after this patch. Is this expected? Will this have a >>negative >>impact on the overall stability of the system? > >There are 2 sides to this problem. >(i) Out of Box Deployment (Not specifying dpdk-lcore-mask, pmd-cpu-mask): > As it is now, when OVS DPDK is run out of box, one pmd thread shall be >created and gets pinned to core 0. In this case the pmd thread shall run with >default scheduling policy and priority with no impact to the stability of the >system. > >(ii) High performance Deployment with SA (Explicitly specify dpdk-lcore-mask, >pmd-cpu-mask): > In this case user wants optimum Fastpath performance + SA and is >explicitly pinning the control thread and pmd threads to cores. Only in this >case the Real time scheduling policy shall be applied to the pmd threads as any >disruption to the threads would impact the fastpath performance. > >I have come across cases where in multi VM deployments with HT enabled, >due to wrong pinning of Qemu threads to the pmd cores, >the pmd thread starvation was observed which eventually destabilizing the >system. > >Regards, >Bhanuprakash. > >> >>2016-07-05 13:05 GMT-07:00 Bhanuprakash Bodireddy >>: >>Set the DPDK pmd thread scheduling policy to SCHED_RR and static >>priority to highest priority value of the policy. This is to deal with >>pmd thread starvation case where another cpu hogging process can get >>scheduled/affinitized on to the same core the pmd thread is running there >>by significantly impacting the datapath performance. >> >>Setting the realtime scheduling policy to the pmd threads is one step >>towards Fastpath Service Assurance in OVS DPDK. >> >>The realtime scheduling policy is applied only when CPU mask is passed >>to 'pmd-cpu-mask'. The exception to this is 'pmd-cpu-mask=1', where the >>policy and priority shall not be applied to pmd thread spawned on core0. >>For example: >> >> * In the absence of pmd-cpu-mask or if pmd-cpu-mask=1, one pmd >> thread shall be created and affinitized to 'core 0' with default >> scheduling policy and priority applied. >> >> * If pmd-cpu-mask is specified with CPU mask > 1, one or more pmd >> threads shall be spawned on the corresponding core(s) in the mask >> and real time scheduling policy SCHED_RR and highest static >> priority is applied to the pmd thread(s). >> >>To reproduce the issue use following commands: >> >>ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6 >>taskset 0x2 cat /dev/zero > /dev/null & >> >>Also OVS control threads should not be affinitized to the pmd cores. >>For example 'dpdk-lcore-mask' and 'pmd-cpu-mask' should be exclusive. >> >>v1->v2: >>* Removed #ifdef and introduced dummy function >"pmd_thread_setpriority" >> in netdev-dpdk.h >>* Rebase >> >>Signed-off-by: Bhanuprakash Bodireddy >> &g
Re: [ovs-dev] [PATCH] netdev-dpdk: Add Flow Control support.
Thanks for the patch, can you also add the flow control options to the INSTALL-ADVANCED.md? Regards, Bhanu Prakash. >-Original Message- >From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Sugesh >Chandran >Sent: Friday, July 22, 2016 2:18 PM >To: dev@openvswitch.org >Subject: [ovs-dev] [PATCH] netdev-dpdk: Add Flow Control support. > >Add support for flow-control(mac control frame) to DPDK enabled physical >port types. By default, the flow-control is OFF on both rx and tx side. >The flow control can be enabled/disabled either when adding a port to OVS or >at run time. > >For eg: >To enable flow control support at tx side while adding a port, add the >'tx-flow- >ctrl' option to the 'ovs-vsctl add-port' command-line as below. > > 'ovs-vsctl add-port br0 dpdk0 -- \ > set Interface dpdk0 type=dpdk options:tx-flow-ctrl=on' > >Similarly to enable rx flow control, > 'ovs-vsctl add-port br0 dpdk0 -- \ > set Interface dpdk0 type=dpdk options:rx-flow-ctrl=on' > >And to enable the flow control auto-negotiation, 'ovs-vsctl add-port br0 >dpdk0 -- \ > set Interface dpdk0 type=dpdk options:flow-ctrl-autoneg=on' > >To turn ON the tx flow control at run time(After the port is being added to >OVS), the command-line input will be, 'ovs-vsctl set Interface dpdk0 >options:tx-flow-ctrl=on' > >The flow control parameters can be turned off by setting 'off' to the >respective parameter. To turn off the flow control at tx side, 'ovs-vsctl set >Interface dpdk0 options:tx-flow-ctrl=off' > >Signed-off-by: Sugesh Chandran >--- > lib/netdev-dpdk.c | 68 >+++ > 1 file changed, 68 insertions(+) > >diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c index 85b18fd..74efd25 >100644 >--- a/lib/netdev-dpdk.c >+++ b/lib/netdev-dpdk.c >@@ -634,6 +634,67 @@ dpdk_eth_dev_queue_setup(struct netdev_dpdk >*dev, int n_rxq, int n_txq) > return diag; > } > >+static void >+dpdk_eth_parse_flow_ctrl(struct netdev_dpdk *dev, >+ const struct smap *args, >+ struct rte_eth_fc_conf *fc_conf) >+ OVS_REQUIRES(dev->mutex) { >+int ret = 0; >+int rx_fc_en = 0; >+int tx_fc_en = 0; >+const char *rx_flow_mode; >+const char *tx_flow_mode; >+const char *flow_autoneg; >+enum rte_eth_fc_mode fc_mode_set[2][2] = {{RTE_FC_NONE, >RTE_FC_TX_PAUSE}, >+ {RTE_FC_RX_PAUSE, RTE_FC_FULL} >+ }; >+ >+ret = rte_eth_dev_flow_ctrl_get(dev->port_id, fc_conf); >+if (ret != 0) { >+VLOG_DBG("cannot get flow control parameters on port=%d, err=%s", >+ dev->port_id, rte_strerror(ret)); >+return; >+} >+rx_flow_mode = smap_get(args, "rx-flow-ctrl"); >+tx_flow_mode = smap_get(args, "tx-flow-ctrl"); >+flow_autoneg = smap_get(args, "flow-ctrl-autoneg"); >+if (rx_flow_mode) { >+if (!strcmp(rx_flow_mode, "on")) { >+rx_fc_en = 1; >+} >+else if (!strcmp(rx_flow_mode, "off")) { >+rx_fc_en = 0; >+} >+} >+if (tx_flow_mode) { >+if (!strcmp(tx_flow_mode, "on")) { >+tx_fc_en =1; >+} >+else if (!strcmp(tx_flow_mode, "off")) { >+tx_fc_en =0; >+} >+} >+if (flow_autoneg) { >+if (!strcmp(flow_autoneg, "on")) { >+fc_conf->autoneg = 1; >+} >+else if (!strcmp(flow_autoneg, "off")) { >+fc_conf->autoneg = 0; >+} >+} >+fc_conf->mode = fc_mode_set[tx_fc_en][rx_fc_en]; } >+ >+static void >+dpdk_eth_flow_ctrl_config(struct netdev_dpdk *dev, >+ struct rte_eth_fc_conf *fc_conf) >+ OVS_REQUIRES(dev->mutex) { >+if (rte_eth_dev_flow_ctrl_set(dev->port_id, fc_conf) != 0) { >+VLOG_ERR("Failed to enable flow control on device %d", dev->port_id); >+} >+} > > static int > dpdk_eth_dev_init(struct netdev_dpdk *dev) OVS_REQUIRES(dpdk_mutex) >@@ -991,6 +1052,13 @@ netdev_dpdk_set_config(struct netdev *netdev, >const struct smap *args) > dev->requested_n_rxq = new_n_rxq; > netdev_request_reconfigure(netdev); > } >+ >+/* Flow control configuration for DPDK Ethernet ports. */ >+if (dev->type == DPDK_DEV_ETH) { >+struct rte_eth_fc_conf fc_conf = {0}; >+dpdk_eth_parse_flow_ctrl(dev, args, &fc_conf); >+dpdk_eth_flow_ctrl_config(dev, &fc_conf); >+} > ovs_mutex_unlock(&dev->mutex); > > return 0; >-- >2.5.0 > >___ >dev mailing list >dev@openvswitch.org >http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] INSTALL.DPDK: Remove the experimental tag for OVS DPDK
Hi (Apologies if top posting is inappropriate), Would like to receive your feedback on removing the "experimental" tag to OVS DPDK. The Community made significant contributions to OVS DPDK since this post last August (http://openvswitch.org/pipermail/dev/2015-August/058814.html). Regards, Bhanu Prakash. >-Original Message- >From: Bodireddy, Bhanuprakash >Sent: Tuesday, July 26, 2016 10:30 AM >To: dev@openvswitch.org >Cc: therb...@redhat.com; Gray, Mark D ; >f...@redhat.com; diproiet...@ovn.org; ke...@dev.caoimhin.net; Bodireddy, >Bhanuprakash >Subject: [PATCH] INSTALL.DPDK: Remove the experimental tag for OVS DPDK > >The DPDK support of OVS was considered experimental as it had few >shortcomings that prevented it from being easily deployed and used. >Also there were few gaps to be filled to tag it "production-ready". > >Community has made significant contributions to improve OVS DPDK. This >commit removes experimental tag for OVS DPDK. > >Signed-off-by: Bhanuprakash Bodireddy > >--- > INSTALL.DPDK.md | 2 -- > 1 file changed, 2 deletions(-) > >diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index 5407794..d3f6359 >100644 >--- a/INSTALL.DPDK.md >+++ b/INSTALL.DPDK.md >@@ -17,8 +17,6 @@ This file provides information on installation and use of >Open vSwitch using DPDK datapath. This version of Open vSwitch should be >built manually with `configure` and `make`. > >-The DPDK support of Open vSwitch is considered 'experimental'. >- > ### Prerequisites > > * Required: DPDK 16.04, libnuma >-- >2.4.11 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v3] netdev-dpdk: Set pmd thread priority
Thanks Mark and Daniele, My comments inline. > >I agree with Mark's comments, other than that this looks good to me. >If you agree with the comments would you mind sending an updates version? I have sent out V4 with the changes suggested by Mark. >> >> * In the absence of pmd-cpu-mask, one pmd thread shall be created >> and default scheduling policy and prority gets applied. > >Typo above - 'prority' [OK] > >> >> * If pmd-cpu-mask is specified, one ore more pmd threads shall be > >Typo above - 'ore' [OK] > >>- A set bit in the mask means a pmd thread is created and pinned >>- to the corresponding CPU core. e.g. to run pmd threads on core 1 and 2 >>+ A set bit in the mask means a pmd thread is created, pinned to the >>+ corresponding CPU core and the scheduling policy SCHED_RR with highest >>+ priority of the scheduling policy applied to pmd thread. >>+ e.g. to run pmd threads on core 1 and 2 >There's some repetition in the last paragraph - I'm reviewing this patch in >isolation, so the text may make sense/be required in the full document. [BHANU] This isn't repetition, the changes are going in to 2 different sections and this would make sense when viewing the INSTALL.DPDK-ADVANCED.md. > >> >> `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6` >> >>@@ -246,6 +250,9 @@ needs to be affinitized accordingly. >> >> NIC port0 <-> OVS <-> VM <-> OVS <-> NIC port 1 >> >>+ Note: 'dpdk-lcore-mask' and 'pmd-cpu-mask' cpu mask settings should be >>+ non-overlapping. > >Although it's mentioned in the commit message, it might be worth mentioning >here the consequences of attempting to pin non-PMD processes to a pmd- >cpu-mask core (i.e. CPU starvation) [OK] > >>+ >>+void >>+ovs_numa_thread_setpriority(int policy) >>+{ >>+ if (dummy_numa) { >>+ return; >>+ } >>+ >>+ struct sched_param threadparam; >>+ int err; >>+ >>+ memset(&threadparam, 0, sizeof(threadparam)); >>+ threadparam.sched_priority = sched_get_priority_max(policy); >>+ err = pthread_setschedparam(pthread_self(), policy, &threadparam); >>+ if (err) { >>+ VLOG_ERR("Thread priority error %d",err); >The convention in this file seems to be to use ovs_strerror when reporting >errors; suggest that you stick with same. [AGREE] Regards, Bhanu Prakash. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] FAQ: Add contents section and enable internal links.
>-Original Message- >From: Ben Pfaff [mailto:b...@ovn.org] >Sent: Wednesday, July 27, 2016 8:59 PM >To: Bodireddy, Bhanuprakash >Cc: dev@openvswitch.org >Subject: Re: [PATCH] FAQ: Add contents section and enable internal links. > >On Wed, Jul 27, 2016 at 08:48:12PM +0100, Bhanuprakash Bodireddy wrote: >> Add contents section to FAQ and enable internal links in doc for >> pretty printing on GitHub. >> >> Signed-off-by: Bhanuprakash Bodireddy >> >> --- >> Reviewers can review the rendered form here >> https://github.com/bbodired/ovs/blob/master/FAQ.md > >This makes the FAQ less readable as a text file, because it's harder to spot > >## 2. Releases > >than > >Releases > > >Is there a way to link to a title? I found a better way to handle the links to titles. Sent v2 patch. Regards, Bhanu Prakash. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v5] netdev-dpdk: Set pmd thread priority
>-Original Message- >From: Daniele Di Proietto [mailto:diproiet...@ovn.org] >Sent: Wednesday, July 27, 2016 10:10 PM >To: Kavanagh, Mark B >Cc: Bodireddy, Bhanuprakash ; >dev@openvswitch.org >Subject: Re: [PATCH v5] netdev-dpdk: Set pmd thread priority > >Thanks for the patch, the implementation looks good to me too. >During testing I kept noticing that it's way too easy to make OVS completely >unresponsive. As you point out in the documentation by having dpdk-lcore- >mask the same as pmd-cpu-mask, OVS cannot even be killed (a kill -9 is >required). I wonder what happens if one tries to set pmd-cpu-mask to every >core in the system. >As a way to mitigate the risk perhaps we can avoid setting the main thread >affinity to the first core in dpdk-lcore-mask by _always_ restoring it in >dpdk_init__(), also if auto_determine is false. >Perhaps we should start explicitly prohibiting creating a pmd thread on the >first core in dpdk-lcore-mask (I get why previous version of this didn't do it >on >core 0. Perhaps we can generalize that to the first core in dpdk-lcore-mask). I will look in to this and get back to you sometime next week. Regards, Bhanuprakash. > >What's the behavior of other DPDK applications? >Thanks, >Daniele > >2016-07-27 5:28 GMT-07:00 Kavanagh, Mark B : >> >>Set the DPDK pmd thread scheduling policy to SCHED_RR and static >>priority to highest priority value of the policy. This is to deal with >>pmd thread starvation case where another cpu hogging process can get >>scheduled/affinitized on to the same core the pmd thread is running >>there by significantly impacting the datapath performance. >> >>Setting the realtime scheduling policy to the pmd threads is one step >>towards Fastpath Service Assurance in OVS DPDK. >> >>The realtime scheduling policy is applied only when CPU mask is passed >>to 'pmd-cpu-mask'. For example: >> >> * In the absence of pmd-cpu-mask, one pmd thread shall be created >> and default scheduling policy and priority gets applied. >> >> * If pmd-cpu-mask is specified, one or more pmd threads shall be >> spawned on the corresponding core(s) in the mask and real time >> scheduling policy SCHED_RR and highest priority of the policy is >> applied to the pmd thread(s). >> >>To reproduce the pmd thread starvation case: >> >>ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6 >>taskset 0x2 cat /dev/zero > /dev/null & >> >>With this commit, it is recommended that the OVS control thread and pmd >>thread shouldn't be pinned to same core ('dpdk-lcore-mask','pmd-cpu-mask' >>should be non-overlapping). Also other processes with same affinity as >>PMD thread will be unresponsive. >> >>Signed-off-by: Bhanuprakash Bodireddy > > >LGTM - Acked-by: mark.b.kavan...@intel.com > >>--- >>v4->v5: >>* Reword Note section in DPDK-ADVANCED.md >> >>v3->v4: >>* Document update >>* Use ovs_strerror for reporting errors in lib-numa.c >> >>v2->v3: >>* Move set_priority() function to lib/ovs-numa.c >>* Apply realtime scheduling policy and priority to pmd thread only if >> pmd-cpu-mask is passed. >>* Update INSTALL.DPDK-ADVANCED. >> >>v1->v2: >>* Removed #ifdef and introduced dummy function >"pmd_thread_setpriority" >> in netdev-dpdk.h >>* Rebase >> >> INSTALL.DPDK-ADVANCED.md | 17 + >> lib/dpif-netdev.c | 9 + >> lib/ovs-numa.c | 18 ++ >> lib/ovs-numa.h | 1 + >> 4 files changed, 41 insertions(+), 4 deletions(-) >> >>diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md >>index 9ae536d..d76cb4e 100644 >>--- a/INSTALL.DPDK-ADVANCED.md >>+++ b/INSTALL.DPDK-ADVANCED.md >>@@ -205,8 +205,10 @@ needs to be affinitized accordingly. >> pmd thread is CPU bound, and needs to be affinitized to isolated >> cores for optimum performance. >> >>- By setting a bit in the mask, a pmd thread is created and pinned >>- to the corresponding CPU core. e.g. to run a pmd thread on core 2 >>+ By setting a bit in the mask, a pmd thread is created, pinned >>+ to the corresponding CPU core and the scheduling policy SCHED_RR >>+ along with maximum priority of the policy applied to the pmd thread. >>+ e.g. to pin a pmd thread on core 2 >> >> `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=4` >> >>@@ -234,8 +236,10 @@ needs to be affinitized accordingly. >&g
Re: [ovs-dev] [PATCH v2] netdev-dpdk: Add Flow Control support.
>-Original Message- >From: Chandran, Sugesh >Sent: Thursday, July 28, 2016 4:30 PM >To: diproiet...@ovn.org; Bodireddy, Bhanuprakash >; dev@openvswitch.org >Cc: Chandran, Sugesh >Subject: [PATCH v2] netdev-dpdk: Add Flow Control support. > >Add support for flow-control(mac control frame) to DPDK enabled physical >port types. By default, the flow-control is OFF on both rx and tx side. >The flow control can be enabled/disabled either when adding a port to OVS or >at run time. > >For eg: >To enable flow control support at tx side while adding a port, add the >'tx-flow- >ctrl' option to the 'ovs-vsctl add-port' command-line as below. > > 'ovs-vsctl add-port br0 dpdk0 -- \ > set Interface dpdk0 type=dpdk options:tx-flow-ctrl=true' > >Similarly to enable rx flow control, > 'ovs-vsctl add-port br0 dpdk0 -- \ > set Interface dpdk0 type=dpdk options:rx-flow-ctrl=true' > >And to enable the flow control auto-negotiation, 'ovs-vsctl add-port br0 >dpdk0 -- \ > set Interface dpdk0 type=dpdk options:flow-ctrl-autoneg=true' > >To turn ON the tx flow control at run time(After the port is being added to >OVS), the command-line input will be, 'ovs-vsctl set Interface dpdk0 >options:tx-flow-ctrl=true' > >The flow control parameters can be turned off by setting 'false' to the >respective parameter. To dsiable the flow control at tx side, 'ovs-vsctl set >Interface dpdk0 options:tx-flow-ctrl=false' > >Signed-off-by: Sugesh Chandran LGTM, I tested it and can apply the rx flow control setting even when the interface is transmitting. Acked-by: Bhanuprakash Bodireddy Regards, Bhanu Prakash. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] OVS+DPDK: pci_map_resource(): cannot mmap error
>-Original Message- >From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Kapil >Adhikesavalu >Sent: Sunday, July 31, 2016 1:30 PM >To: dev@openvswitch.org; disc...@openvswitch.org >Subject: [ovs-dev] OVS+DPDK: pci_map_resource(): cannot mmap error > >Hello, > >i am getting "EAL: pci_map_resource(): cannot mmap(18, 0x7f504000, >0x8, 0x0): Invalid argument (0x)" when i start >ovs-vswitchd. > >Setup:DL360gen8 CPU:E5-2967 NIC:82599ES 10-Gigabit SFI/SFP+ (2 Port) (PCI: >slot0: 04:00.0 04:00.1) >Kernel: 4.6.4-201.fc23.x86_64 ixgbe driver: 4.2.1-k There seems to be issue with 4.5+ kernel as reported at http://dpdk.org/ml/archives/dev/2016-July/043122.html commenting out "pci_request_regions(dev, "igb_uio");" in igb_uio.c should resolve your issue. Regards, Bhanu Prakash. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/2] ovs-appctl: Fix potential crash with timeout argument
>-Original Message- >From: Kavanagh, Mark B >Sent: Monday, August 8, 2016 9:15 AM >To: Bodireddy, Bhanuprakash ; >dev@openvswitch.org >Subject: RE: [ovs-dev] [PATCH 2/2] ovs-appctl: Fix potential crash with timeout >argument > >> >>ovs-appctl can crash with missing timeout argument. >> # ovs-appctl --timeout= dpif-netdev/pmd-stats-show >> >>Fix by using strtol and validating the timeout value. >> >>Signed-off-by: Bhanuprakash Bodireddy >> >>--- >> utilities/ovs-appctl.c | 9 - >> 1 file changed, 8 insertions(+), 1 deletion(-) >> >>diff --git a/utilities/ovs-appctl.c b/utilities/ovs-appctl.c index >>8f87cc4..2543ee9 100644 >>--- a/utilities/ovs-appctl.c >>+++ b/utilities/ovs-appctl.c >>@@ -127,6 +127,7 @@ parse_command_line(int argc, char *argv[]) >> char *short_options_ = >ovs_cmdl_long_options_to_short_options(long_options); >> char *short_options = xasprintf("+%s", short_options_); >> const char *target; >>+int timeout; >> int e_options; >> >> target = NULL; >>@@ -165,7 +166,13 @@ parse_command_line(int argc, char *argv[]) >> exit(EXIT_SUCCESS); >> >> case 'T': >>-time_alarm(atoi(optarg)); >>+timeout = strtol(optarg, NULL, 10); > >Hi Bhanu, > >To ensure that the user has supplied a valid numeric timeout value, you >should provide a non-NULL 'endptr' parameter, and perform the usual checks >on it, as described in the strtol man page: > " If endptr is not NULL, strtol() stores the address of the first > invalid character in *endptr. If there were no digits at all, > strtol() stores the original value of nptr in *endptr (and returns > 0). In particular, if *nptr is not '\0' but **endptr is '\0' on > return, the entire string is valid." > Thanks for reviewing the patch Mark, you have a point and I am sending out another version of the patch. I have verified all the below conditions with the v2 and found to be working good. ovs-appctl --timeout= dpif-netdev/pmd-stats-show (no timeout specified) ovs-appctl --timeout=0 dpif-netdev/pmd-stats-show (timeout 0, invalid) ovs-appctl --timeout=12345675345212151252543523524524 dpif-netdev/pmd-stats-show (overflow case) ovs-appctl --timeout=123abc dpif-netdev/pmd-stats-show (invalid input) ovs-appctl --timeout=1 dpif-netdev/pmd-stats-show (valid input) Regards, Bhanu Prakash. > > >>+if (timeout <= 0) { >>+ovs_fatal(0, "timeout value %s on -t or --timeout is >>invalid", >>+ optarg); >>+} else { >>+time_alarm(timeout); >>+} >> break; >> >> case 'V': >>-- >>2.4.11 >> >>___ >>dev mailing list >>dev@openvswitch.org >>http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/2] ovs-appctl: Fix potential crash with timeout argument
>-Original Message- >From: nickcooper-zhangtonghao [mailto:nickcooper- >zhangtong...@opencloud.tech] >Sent: Monday, August 8, 2016 1:45 PM >To: Bodireddy, Bhanuprakash >Cc: dev@openvswitch.org >Subject: Re: [ovs-dev] [PATCH 2/2] ovs-appctl: Fix potential crash with >timeout argument > > >On Aug 8, 2016, at 3:45 PM, dev-requ...@openvswitch.org wrote: > >Date: Sun, 7 Aug 2016 22:06:05 +0100 >From: Bhanuprakash Bodireddy >To: dev@openvswitch.org >Subject: [ovs-dev] [PATCH 2/2] ovs-appctl: Fix potential crash with > timeout argument >Message-ID: > <1470603965-73273-2-git-send-email- >bhanuprakash.bodire...@intel.com> > >ovs-appctl can crash with missing timeout argument. > # ovs-appctl --timeout= dpif-netdev/pmd-stats-show > >Fix by using strtol and validating the timeout value. > >Signed-off-by: Bhanuprakash Bodireddy > >--- >utilities/ovs-appctl.c | 9 - >1 file changed, 8 insertions(+), 1 deletion(-) > >diff --git a/utilities/ovs-appctl.c b/utilities/ovs-appctl.c >index 8f87cc4..2543ee9 100644 >--- a/utilities/ovs-appctl.c >+++ b/utilities/ovs-appctl.c >@@ -127,6 +127,7 @@ parse_command_line(int argc, char *argv[]) >char *short_options_ = >ovs_cmdl_long_options_to_short_options(long_options); >char *short_options = xasprintf("+%s", short_options_); >const char *target; >+ int timeout; >int e_options; > >target = NULL; >@@ -165,7 +166,13 @@ parse_command_line(int argc, char *argv[]) >exit(EXIT_SUCCESS); > >case 'T': >- time_alarm(atoi(optarg)); >+ timeout = strtol(optarg, NULL, 10); >+ if (timeout <= 0) { >+ ovs_fatal(0, "timeout value %s on -t or --timeout is invalid", >+ optarg); >+ } else { >+ time_alarm(timeout); >+ } >break; > >case 'V': >-- >2.4.11 > >It seems to me that it’s unnecessary to change the codes. If the “timeout” is >empty, the “atoi” function will convert it to 0, Got a signal when I passed empty timeout accidentally during my test runs and so I sent out this patch. Don’t you see this issue on your target? # ovs-appctl --timeout= dpif-netdev/pmd-stats-show 2016-08-08T15:20:01Z|1|fatal_signal|WARN|terminating with signal 14 (Alarm clock) Alarm clock Regards, Bhanu Prakash. then time_alarm will disable the >“timeout” feature, and return directly. If the “timeout” is not a positive >number, (e.g. -100), the check will been done in the time_alarm. > > > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 2/2] ovs-appctl: Fix potential crash with timeout argument
>-Original Message- >From: Ben Pfaff [mailto:b...@ovn.org] >Sent: Monday, August 8, 2016 5:24 PM >To: Bodireddy, Bhanuprakash >Cc: nickcooper-zhangtonghao ; >dev@openvswitch.org >Subject: Re: [ovs-dev] [PATCH 2/2] ovs-appctl: Fix potential crash with >timeout argument > >On Mon, Aug 08, 2016 at 03:29:11PM +, Bodireddy, Bhanuprakash wrote: >> >It seems to me that it’s unnecessary to change the codes. If the >> >“timeout” is empty, the “atoi” function will convert it to 0, >> >> Got a signal when I passed empty timeout accidentally during my test runs >and so I sent out this patch. Don’t you see this issue on your target? >> # ovs-appctl --timeout= dpif-netdev/pmd-stats-show >> 2016-08-08T15:20:01Z|1|fatal_signal|WARN|terminating with signal >> 14 (Alarm clock) Alarm clock > >The signal is the timeout expiring! SIGALRM seems like a reasonable way to >terminate a program that is timing out. Not a bug. Thanks Ben for the clarification. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [ovs-discuss] OVS DPDK VFIO error
>-Original Message- >From: discuss [mailto:discuss-boun...@openvswitch.org] On Behalf Of Kapil >Adhikesavalu >Sent: Tuesday, August 9, 2016 10:46 AM >To: dev@openvswitch.org; disc...@openvswitch.org >Subject: [ovs-discuss] OVS DPDK VFIO error > >Hi, > >On a Intel xeon E5-2697 chip with iommu turned on with Intel NIC 82599, i am >getting the following error while doing the NIC binding using VFIO. >kernel: 4.23 fedora 23, i haven't tried the latest kernel yet. > >E5-2697 supports IOMMU VT-d I hope you have already enabled VT-d in BIOS, can you check 'dmesg | grep -e DMAR -e IOMMU'. > >VFIO NIC binding steps, >modprobe vfio-pci >sudo /usr/bin/chmod a+x /dev/vfio >sudo /usr/bin/chmod 0666 /dev/vfio/* >$DPDK_DIR/tools/dpdk_nic_bind.py --bind=vfio-pci :04:00.0 >$DPDK_DIR/tools/dpdk_nic_bind.py --status >Error >= >EAL: Detected 48 lcore(s) >EAL: Probing VFIO support... >EAL: IOMMU type 1 (Type 1) is supported >EAL: IOMMU type 8 (No-IOMMU) is not supported >EAL: VFIO support initialized > >EAL: Master lcore 1 is ready (tid=83504bc0;cpuset=[1]) >EAL: PCI device :04:00.0 on NUMA socket 0 >EAL: probe driver: 8086:154d rte_ixgbe_pmd >EAL: set IOMMU type 1 (Type 1) failed, error 1 (Operation not permitted) >EAL: set IOMMU type 8 (No-IOMMU) failed, error 19 (No such device) >EAL: :04:00.0 failed to select IOMMU type >EAL: Error - exiting with code: 1 > Cause: Requested device :04:00.0 cannot be used > >dmesg: >== >[ 0.997461] DMAR: Ignoring identity map for HW passthrough device >:00:1f.0 [0x0 - 0xff] >[ 0.997465] DMAR: Intel(R) Virtualization Technology for Directed I/O >[ 1.351801] DMAR: 32bit :00:1a.0 uses non-identity mapping >[ 1.362623] DMAR: 32bit :00:1d.0 uses non-identity mapping >[ 1.373601] DMAR: 32bit :01:00.4 uses non-identity mapping >[ 297.035504] vfio-pci :04:00.0: Device is ineligible for IOMMU domain >attach due to platform RMRR requirement. Contact your platform vendor. > > >[root@localhost bin]# cat /proc/cmdline >BOOT_IMAGE=/vmlinuz-4.2.3-300.fc23.x86_64 root=/dev/mapper/fedora- >root ro rd.lvm.lv=fedora/root rd.lvm.lv=fedora/swap rhgb quiet >default_hugepagesz=1G hugepagesz=1G hugepages=16 hugepagesz=2M >hugepages=2048 iommu=pt intel_iommu=on I don’t see any problem with your cmdline as iommu=pt and intel_iommu is added. Regards, Bhanu Prakash. > >demsg | grep 10G - 82599 controller >04:00.0 Ethernet controller: Intel Corporation Ethernet 10G 2P X520 Adapter >(rev 01) >04:00.1 Ethernet controller: Intel Corporation Ethernet 10G 2P X520 Adapter >(rev 01) > >Regards >Kapil. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v6 1/2] doc: Refactor DPDK install documentation
Thanks Mauricio for your comments, My comments inline. >+ it has to be configured with DPDK support and is done by './configure -- >with-dpdk'. >+ This section focus on generic recipe that suits most cases and for >distribution >+ specific instructions, refer [INSTALL.Fedora.md], [INSTALL.RHEL.md] and >+ [INSTALL.Debian.md]. > >-9. Rx Mergeable buffers >+ OVS can be downloaded in compressed format from the OVS release page >(or) >+ cloned from git repository if user intends to develop and contribute >+ patches upstream. > > I think it is better just to have one download method, it keeps things simple. [BHANU] This is done for a reason. Most of the users would like to work on a stable release and few would be working on the master branch. One section covers downloading the stable release as compressed file. This section would be updated once 2.6 is released. Other section comes handy for users working on master. > > >- Rx Mergeable buffers is a virtio feature that allows chaining of multiple >- virtio descriptors to handle large packet sizes. As such, large packets >- are handled by reserving and chaining multiple free descriptors >- together. Mergeable buffer support is negotiated between the virtio >- driver and virtio device and is supported by the DPDK vhost library. >- This behavior is typically supported and enabled by default, however >- in the case where the user knows that rx mergeable buffers are not >needed > >-DPDK vhost: > >+### 3.2 Setup DPDK devices using VFIO > >-DPDK 16.04 supports two types of vhost: >+ - Supported with DPDK release >= 1.7 and kernel version >= 3.6 > >It is already mentioned that DPDK 16.04 is required, then the comment about >the DPDK version is not necessary. [OK] > > >+ - VFIO needs support from BIOS and kernel. >+ - BIOS changes: > >-1. vhost-user >-2. vhost-cuse >+ Enable VT-d, can be verified from `dmesg | grep -e DMAR -e IOMMU` >output > >-Whatever type of vhost is enabled in the DPDK build specified, is the type >-that will be enabled in OVS. By default, vhost-user is enabled in DPDK. >-Therefore, unless vhost-cuse has been enabled in DPDK, vhost-user ports >-will be enabled in OVS. >-Please note that support for vhost-cuse is intended to be deprecated in OVS >-in a future release. > >-1. DPDK 16.04 with vhost support enabled as documented in the "Building >and >- Installing section" >+ Note: If using older DPDK release (or) running kernels < 3.6 UIO drivers to >be used, > >Same here for DPDK version [OK] > >+ please check section 4 (DPDK devices using UIO) for the steps. > >-2. QEMU version v2.1.0+ >+### 3.3 Setup OVS > >- QEMU v2.1.0 will suffice, but it is recommended to use v2.2.0 if providing >- your VM with memory greater than 1GB due to potential issues with >memory >- mapping larger areas. >+ 1. DB creation (One time step) > > >-DPDK vhost-cuse VM configuration: >-- >+DPDK 'testpmd' application can be run in the Guest VM for high speed >+packet forwarding between vhostuser ports. This needs DPDK, testpmd to >be >+compiled along with kernel modules. > >I think that sentence is not clear. What do you mean by "testpmd to be >compiled along with kernel modules" ? [BHANU] This has to reworded. I meant DPDK should be compiled and UIO module to be loaded to support Userspace IO for DPDK. >Below are the steps for setting up >+the testpmd application in the VM. More information on the vhostuser ports >+can be found in [Vhost Walkthrough]. > >- vhost-cuse ports use a Linux* character device to communicate with QEMU. >- By default it is set to `/dev/vhost-net`. It is possible to reuse this >- standard device for DPDK vhost, which makes setup a little simpler but it >- is better practice to specify an alternative character device in order to >- avoid any conflicts if kernel vhost is to be used in parallel. >+ * Instantiate the Guest > >-1. This step is only needed if using an alternative character device. >+ ``` >+ Qemu version >= 2.2.0 > >- The new character device filename must be specified in the ovsdb: >+ export VM_NAME=Centos-vm >+ export GUEST_MEM=3072M >+ export QCOW2_IMAGE=/root/CentOS7_x86_64.qcow2 >+ export VHOST_SOCK_DIR=/usr/local/var/run/openvswitch > >- `./utilities/ovs-vsctl --no-wait set Open_vSwitch . \ >- other_config:cuse-dev-name=my-vhost-net` >+ qemu-system-x86_64 -name $VM_NAME -cpu host -enable-kvm -m >$GUEST_MEM -object memory-backend- >file,id=mem,size=$GUEST_MEM,mem-path=/dev/hugepages,share=on - >numa node,memdev=mem -mem-prealloc -smp sockets=1,cores=2 -drive >file=$QCOW2_IMAGE -chardev >socket,id=char0,path=$VHOST_SOCK_DIR/dpdkvhostuser0 -netdev >type=vhost-user,id=mynet1,chardev=char0,vhostforce -device virtio-net- >pci,mac=00:00:00:00:00:01,netdev=mynet1,mrg_rxbuf=off -chardev >socket,id=char1,path=$VHOST_SOCK_DIR/dpdkvhostuser1 -netdev >type=vhost-user,id=mynet2,chardev=char1,vhostforce -dev
Re: [ovs-dev] [PATCH v6 2/2] doc: Refactor DPDK install guide, add ADVANCED doc
Thanks Mauricio for your comments. Comments inline. >+### 3.7 Compiler Optimizations >+ >+ The default compiler optimization level is '-O2'. Changing this to >+ more aggressive compiler optimization such as '-O3 -march=native' >+ with gcc(verified on 5.3.1) can produce performance gains though not >+ siginificant. '-march=native' will produce optimized code on local machine >+ and should be used when SW compilation is done on Testbed. >+ >+## 4. Performance Tuning >+ > >Reading section 3 I feel all the setting are oriented to the performance, then >what is the difference between section 3 and 4? Section 3 talks about BIOS settings, PCIe slot selection, core isolation, NUMA. I have recommended best known configuration but this can vary between server platforms and can be ignored in few cases. Section 4 talks about pmd, qemu threads affinity, MQ, EMC, mrg_rxbuf which are quite generic ones and have to be tuned by users looking for extra performance. As you pointed both sections3, 4 are all about achieving optimum performance with OVS DPDK. >+### 4.1 Affinity >+ >+For superior performance, DPDK pmd threads and Qemu vCPU threads >+needs to be affinitized accordingly. >+ >+ * PMD thread Affinity >+ >+ A poll mode driver (pmd) thread handles the I/O of all DPDK >+ interfaces assigned to it. A pmd thread shall poll the ports >+ for incoming packets, switch the packets and send to tx port. >+ pmd thread is CPU bound, and needs to be affinitized to isolated >+ cores for optimum performance. >+ >+ By setting a bit in the mask, a pmd thread is created and pinned >+ to the corresponding CPU core. e.g. to run a pmd thread on core 2 >+ >+ `ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=4` >+ >+ Note: pmd thread on a NUMA node is only created if there is >+ at least one DPDK interface from that NUMA node added to OVS. >+ >+ * Qemu vCPU thread Affinity >+ >+ A VM performing simple packet forwarding or running complex packet >+ pipelines has to ensure that the vCPU threads performing the work has >+ as much CPU occupancy as possible. >+ >+ Example: On a multicore VM, multiple QEMU vCPU threads shall be >spawned. >+ when the DPDK 'testpmd' application that does packet forwarding >+ is invoked, 'taskset' cmd should be used to affinitize the vCPU threads >+ to the dedicated isolated cores on the host system. >+ >+### 4.2 Multiple poll mode driver threads >+ Regards, Bhanu Prakash. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [RFC Patch] dpif-netdev: Sorted subtable vectors per in_port in dpcls
>-Original Message- >From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Jan >Scheurich >Sent: Thursday, June 16, 2016 2:56 PM >To: dev@openvswitch.org >Subject: [ovs-dev] [RFC Patch] dpif-netdev: Sorted subtable vectors per >in_port in dpcls > >The user-space datapath (dpif-netdev) consists of a first level "exact match >cache" (EMC) matching on 5-tuples and the normal megaflow classifier. With >many parallel packet flows (e.g. TCP connections) the EMC becomes >inefficient and the OVS forwarding performance is determined by the >megaflow classifier. > >The megaflow classifier (dpcls) consists of a variable number of hash tables >(aka subtables), each containing megaflow entries with the same mask of >packet header and metadata fields to match upon. A dpcls lookup matches a >given packet against all subtables in sequence until it hits a match. As >megaflow cache entries are by construction non-overlapping, the first match >is the only match. > >Today the order of the subtables in the dpcls is essentially random so that on >average a dpcsl lookup has to visit N/2 subtables for a hit, when N is the >total >number of subtables. Even though every single hash-table lookup is fast, the >performance of the current dpcls degrades when there are many subtables. > >How does the patch address this issue: > >In reality there is often a strong correlation between the ingress port and a >small subset of subtables that have hits. The entire megaflow cache typically >decomposes nicely into partitions that are hit only by packets entering from a >range of similar ports (e.g. traffic from Phy -> VM vs. traffic from VM -> >Phy). > >Therefore, keeping a separate list of subtables per ingress port, sorted by >frequency of hits, reduces the average number of subtables lookups in the >dpcls to a minimum, even if the total number of subtables gets large. I like the proposed approach of subtable prioritization for each ingress port there by reducing the lookup time. +1 on this approach. > >The patch introduces 32 subtable vectors per dpcls and hashes the ingress >port to select the subtable vector. The patch also counts matches per 32 slots >in each vector (hashing the subtable pointer to obtain the slot) and sorts the >vectors according to match frequency every second. > >To monitor the effectiveness of the patch we have enhanced the ovs-appctl >dpif-netdev/pmd-stats-show command with an extra line "avg. subtable >lookups per hit" to report the average number of subtable lookup needed for >a megaflow match. Ideally, this should be close to 1 and much smaller than >N/2. > >I have benchmarked a cloud L3 overlay pipeline with a VXLAN overlay mesh. >With pure L3 tenant traffic between VMs on different nodes the resulting >netdev dpcls contains N=4 subtables. > >Disabling the EMC, I have measured a baseline performance (in+out) of ~1.32 >Mpps (64 bytes, 1000 L4 flows). The average number of subtable lookups per >dpcls match is 2.5. > >With the patch the average number of subtable lookups per dpcls match goes >down to 1.25 (apparently there are still two ports of different nature hashed >to the same vector, otherwise it should be exactly one). Even so the >forwarding performance grows by ~30% to 1.72 Mpps. I ran some benchmarks and observed that the patch improves performance even with multiple subtables around. EMC is disabled here and had 5 VMs doing packet forwarding. The flow rules are setup so that 8 subtables are created and the performance improvement of 16% was observed In this case. I would like to try some more complex test scenarios when I get time. Regards, Bhanu Prakash. > >As the number of subtables will often be higher in reality, we can assume that >this is at the lower end of the speed-up one can expect from this >optimization. Just running a parallel ping between the VXLAN tunnel >endpoints increases the number of subtables and hence the average number >of subtable lookups from 2.5 to 3.5 with a corresponding decrease of >throughput to 1.14 Mpps. With the patch the parallel ping has no impact on >average number of subtable lookups and performance. The performance gain >is then ~50%. > >Signed-off-by: Jan Scheurich > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] Makefile.am: Add clang static analysis support
>-Original Message- >From: William Tu [mailto:u9012...@gmail.com] >Sent: Monday, June 27, 2016 6:57 PM >To: Bodireddy, Bhanuprakash >Cc: >Subject: Re: [ovs-dev] [PATCH] Makefile.am: Add clang static analysis support > >This is pretty cool. I tested it and have some comments. > >On Mon, Jun 27, 2016 at 9:11 AM, Bhanuprakash Bodireddy > wrote: >> Clang Static Analyzer is a source code analysis tool to find bugs. >> This patch adds make target to trigger static analysis using below commands. >> >> ./boot.sh >> ./configure --with-dpdk(for configuring DPDK datapath) make >> clang-analyze scan-view --host= --port >> $OVS_DIR>/clang-analyzer-results/-mm-dd-114251-1027-1> >> --allow-all-hosts >> >> Results can be viewed on browser: http://:/ >> >> Signed-off-by: Bhanuprakash Bodireddy >> >> --- >> Makefile.am | 10 ++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/Makefile.am b/Makefile.am index 8cb8523..ac96be6 100644 >> --- a/Makefile.am >> +++ b/Makefile.am >> @@ -400,6 +400,16 @@ ovsext_clean: datapath-windows/ovsext.sln endif >> .PHONY: ovsext >> >> +clang-analyze: clean >> + @if which clang scan-build > /dev/null 2>&1; then \ >> + $(MKDIR_P) "$(srcdir)/clang-analyzer-results" || exit 1; \ >> + scan-build -o $(srcdir)/clang-analyzer-results >> +--use-analyzer=/usr/bin/clang \ > >Since we have valgrind/helgrind results under tests dir, maybe output to >''$(srcdir)/tests/clang-analyzer-results". I agree to this. I will do this in v2 patch. > >> + make -j || exit 1; \ > >"make -j" creates lots of jobs and hangs my system. Maybe just use 'make' >and let people to optimize if they want. Thanks for testing and pointing this out. I had intentionally used 'make -j' to speed up the analysis and tested it on my target with 28 cores. I shall update the patch to remove '-j' flag to prevent potential hangs. > >Regards, >William ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] Makefile.am: Add clang static analysis support
>-Original Message- >From: Lance Richardson [mailto:lrich...@redhat.com] >Sent: Monday, June 27, 2016 7:34 PM >To: Bodireddy, Bhanuprakash >Cc: dev@openvswitch.org >Subject: Re: [ovs-dev] [PATCH] Makefile.am: Add clang static analysis support > > > >- Original Message - >> From: "Bhanuprakash Bodireddy" >> To: dev@openvswitch.org >> Sent: Monday, June 27, 2016 12:11:40 PM >> Subject: [ovs-dev] [PATCH] Makefile.am: Add clang static analysis >> support >> >> Clang Static Analyzer is a source code analysis tool to find bugs. >> This patch adds make target to trigger static analysis using below commands. >> >> ./boot.sh >> ./configure --with-dpdk(for configuring DPDK datapath) make >> clang-analyze scan-view --host= --port >> $OVS_DIR>/clang-analyzer-results/-mm-dd-114251-1027-1> >> --allow-all-hosts >> >> Results can be viewed on browser: http://:/ >> >> Signed-off-by: Bhanuprakash Bodireddy >> >> --- >> Makefile.am | 10 ++ >> 1 file changed, 10 insertions(+) >> >> diff --git a/Makefile.am b/Makefile.am index 8cb8523..ac96be6 100644 >> --- a/Makefile.am >> +++ b/Makefile.am >> @@ -400,6 +400,16 @@ ovsext_clean: datapath-windows/ovsext.sln endif >> .PHONY: ovsext >> >> +clang-analyze: clean >> +@if which clang scan-build > /dev/null 2>&1; then \ >> + $(MKDIR_P) "$(srcdir)/clang-analyzer-results" || exit 1; \ >> + scan-build -o $(srcdir)/clang-analyzer-results >> --use-analyzer=/usr/bin/clang \ >> +make -j || exit 1; \ >> +else \ >> + echo -e "Unable to find clang/scan-build, Install >> +clang,clang-analyzer >> packages"; \ >> +fi >> +.PHONY: clang-analyze >> + >> dist-hook: $(DIST_HOOKS) >> all-local: $(ALL_LOCAL) >> clean-local: $(CLEAN_LOCAL) >> -- >> 2.4.11 >> >> ___ >> dev mailing list >> dev@openvswitch.org >> http://openvswitch.org/mailman/listinfo/dev >> > >LGTM, I tried it out with no issues found. Thanks for testing the patch. > a couple of small suggestions: > - It would be good to add some text to the "Build Requirements" section of > INSTALL.md, mentioning clang-analyzer where clang is already listed. > - It might also be nice to have a sentence or two somewhere in INSTALL.md >about > how to use this feature. I agree to your suggestion. I worked on refactoring the install guide in to Beginner and Advanced guides and submitted v7 recently. I have a section 9 in the ADVANCED install guide that talks about static analysis. Please check the rendered form here. https://github.com/bbodired/ovs/blob/master/INSTALL.DPDK-ADVANCED.md v7 patch: http://openvswitch.org/pipermail/dev/2016-June/thread.html Regards, Bhanu Prakash. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2] Makefile.am: Add clang static analysis support
>-Original Message- >From: Ben Pfaff [mailto:b...@ovn.org] >Sent: Saturday, July 2, 2016 6:14 PM >To: Bodireddy, Bhanuprakash >Cc: dev@openvswitch.org >Subject: Re: [ovs-dev] [PATCH v2] Makefile.am: Add clang static analysis >support > >On Tue, Jun 28, 2016 at 05:09:13PM +0100, Bhanuprakash Bodireddy wrote: >> Clang Static Analyzer is a source code analysis tool to find bugs. >> This patch adds make target to trigger static analysis using below commands. >> >> ./boot.sh >> ./configure --with-dpdk(in case of DPDK datapath) make clang-analyze >> scan-view --host= --port >> $OVS_DIR>/clang-analyzer-results/-mm-dd-114251-1027-1> >> --allow-all-hosts >> >> Results can be viewed on browser: http://:/ >> >> v1->v2: >> * Change the output directory to tests/clang-analyzer-results >> * Remove '-j' make option, This might potentially hang some system >> while spawning infinite jobs. >> >> Signed-off-by: Bhanuprakash Bodireddy >> > >I'd tend to write this a little differently, maybe like this: > >clang-analyze: clean > @which clang scan-build >/dev/null 2>&1 || \ > (echo "Unable to find clang/scan-build, Install clang,clang-analyzer >packages"; exit 1) > @$(MKDIR_P) "$(srcdir)/tests/clang-analyzer-results" > @scan-build -o $(srcdir)/tests/clang-analyzer-results --use- >analyzer=/usr/bin/clang $(MAKE) >.PHONY: clang-analyze This is fine for me. > >But it doesn't work for me anyway. When I run it from a build tree configured >to use clang, I get the following: > >make all-recursive >make[2]: Entering directory '/home/blp/nicira/ovs/_clang' >Making all in datapath >make[3]: Entering directory '/home/blp/nicira/ovs/_clang/datapath' >make[4]: Entering directory '/home/blp/nicira/ovs/_clang/datapath' >make[4]: Leaving directory '/home/blp/nicira/ovs/_clang/datapath' >make[3]: Leaving directory '/home/blp/nicira/ovs/_clang/datapath' >make[3]: Entering directory '/home/blp/nicira/ovs/_clang' > CC lib/aes128.lo >gcc: error: unrecognized command line option '-Wthread-safety' >gcc: error: unrecognized command line option '-Qunused-arguments' >gcc: error: unrecognized command line option '-fno-caret-diagnostics' >Makefile:4179: recipe for target 'lib/aes128.lo' failed >make[3]: *** [lib/aes128.lo] Error 1 >make[3]: Leaving directory '/home/blp/nicira/ovs/_clang' >Makefile:4831: recipe for target 'all-recursive' failed >make[2]: *** [all-recursive] Error 1 >make[2]: Leaving directory '/home/blp/nicira/ovs/_clang' >Makefile:2749: recipe for target 'all' failed >make[1]: *** [all] Error 2 >make[1]: Leaving directory '/home/blp/nicira/ovs/_clang' >scan-build: Removing directory '/home/blp/nicira/ovs/tests/clang-analyzer- >results/2016-07-02-101251-14653-1' because it contains no reports. >scan-build: No bugs found. >Makefile:5845: recipe for target 'clang-analyze' failed >make: *** [clang-analyze] Error 1 > >Alternatively, if I run it from a build tree configured to use GCC, I get the >following: > >make all-recursive >make[2]: Entering directory '/home/blp/nicira/ovs/_build' >Making all in datapath >make[3]: Entering directory '/home/blp/nicira/ovs/_build/datapath' >make[4]: Entering directory '/home/blp/nicira/ovs/_build/datapath' >make[4]: Leaving directory '/home/blp/nicira/ovs/_build/datapath' >make[3]: Leaving directory '/home/blp/nicira/ovs/_build/datapath' >make[3]: Entering directory '/home/blp/nicira/ovs/_build' > CC lib/aes128.lo > CC lib/backtrace.lo > CC lib/bfd.lo >In file included from ../lib/bfd.h:24:0, > from ../lib/bfd.c:16: >../lib/packets.h: In function 'eth_addr_invert': >../lib/packets.h:237:5: error: 'for' loop initial declarations are only > allowed in >C99 or C11 mode >../lib/packets.h:237:5: note: use option -std=c99, -std=gnu99, -std=c11 or > - >std=gnu11 to compile your code I have tested this on F22 and didn't see this issue. But when I tested it now on Ubuntu 14.04 LTS I could see the issue you reported here. Adding CFLAGS="-std=gnu99" to make should fix this issue and I tested it now on Ubuntu 14.04 Can you try the below patch and see if you can generate clang analysis report properly this time? +clang-analyze: clean + @which clang scan-build >/dev/null 2>&1 || \ + (echo "Unable to find clang/scan-build, Install clang,clang-analyzer packages"; exit 1) + @$(MKDIR_P) "$(srcdir)/tests/clang-analyzer-results" + @scan-build -o $(srcdir)/tests/clang-analyzer-results --use-analyzer=/usr/bin/clang $(MAKE) CFLAGS="-std=gnu99" +.PHONY: clang-analyze Regards, Bhanu Prakash. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2] Makefile.am: Add clang static analysis support
>-Original Message- >From: Ben Pfaff [mailto:b...@ovn.org] >Sent: Saturday, July 2, 2016 9:31 PM >To: Bodireddy, Bhanuprakash >Cc: dev@openvswitch.org >Subject: Re: [ovs-dev] [PATCH v2] Makefile.am: Add clang static analysis >support > >On Sat, Jul 02, 2016 at 08:14:02PM +0000, Bodireddy, Bhanuprakash wrote: >> >-Original Message- >> >From: Ben Pfaff [mailto:b...@ovn.org] >> >Sent: Saturday, July 2, 2016 6:14 PM >> >To: Bodireddy, Bhanuprakash >> >Cc: dev@openvswitch.org >> >Subject: Re: [ovs-dev] [PATCH v2] Makefile.am: Add clang static >> >analysis support >> > >> >On Tue, Jun 28, 2016 at 05:09:13PM +0100, Bhanuprakash Bodireddy wrote: >> >> Clang Static Analyzer is a source code analysis tool to find bugs. >> >> This patch adds make target to trigger static analysis using below >commands. >> >> >> >> ./boot.sh >> >> ./configure --with-dpdk(in case of DPDK datapath) make >> >> clang-analyze scan-view --host= --port >> >> $OVS_DIR>/clang-analyzer-results/-mm-dd-114251-1027-1> >> >> --allow-all-hosts >> >> >> >> Results can be viewed on browser: http://:/ >> >> >> >> v1->v2: >> >> * Change the output directory to tests/clang-analyzer-results >> >> * Remove '-j' make option, This might potentially hang some system >> >> while spawning infinite jobs. >> >> >> >> Signed-off-by: Bhanuprakash Bodireddy >> >> >> > >> >I'd tend to write this a little differently, maybe like this: >> > >> >clang-analyze: clean >> >@which clang scan-build >/dev/null 2>&1 || \ >> > (echo "Unable to find clang/scan-build, Install >> >clang,clang-analyzer packages"; exit 1) >> >@$(MKDIR_P) "$(srcdir)/tests/clang-analyzer-results" >> >@scan-build -o $(srcdir)/tests/clang-analyzer-results --use- >> >analyzer=/usr/bin/clang $(MAKE) >> >.PHONY: clang-analyze >> >> This is fine for me. >> >> > >> >But it doesn't work for me anyway. When I run it from a build tree >> >configured to use clang, I get the following: >> > >> >make all-recursive >> >make[2]: Entering directory '/home/blp/nicira/ovs/_clang' >> >Making all in datapath >> >make[3]: Entering directory '/home/blp/nicira/ovs/_clang/datapath' >> >make[4]: Entering directory '/home/blp/nicira/ovs/_clang/datapath' >> >make[4]: Leaving directory '/home/blp/nicira/ovs/_clang/datapath' >> >make[3]: Leaving directory '/home/blp/nicira/ovs/_clang/datapath' >> >make[3]: Entering directory '/home/blp/nicira/ovs/_clang' >> > CC lib/aes128.lo >> >gcc: error: unrecognized command line option '-Wthread-safety' >> >gcc: error: unrecognized command line option '-Qunused-arguments' >> >gcc: error: unrecognized command line option '-fno-caret-diagnostics' >> >Makefile:4179: recipe for target 'lib/aes128.lo' failed >> >make[3]: *** [lib/aes128.lo] Error 1 >> >make[3]: Leaving directory '/home/blp/nicira/ovs/_clang' >> >Makefile:4831: recipe for target 'all-recursive' failed >> >make[2]: *** [all-recursive] Error 1 >> >make[2]: Leaving directory '/home/blp/nicira/ovs/_clang' >> >Makefile:2749: recipe for target 'all' failed >> >make[1]: *** [all] Error 2 >> >make[1]: Leaving directory '/home/blp/nicira/ovs/_clang' >> >scan-build: Removing directory >> >'/home/blp/nicira/ovs/tests/clang-analyzer- >> >results/2016-07-02-101251-14653-1' because it contains no reports. >> >scan-build: No bugs found. >> >Makefile:5845: recipe for target 'clang-analyze' failed >> >make: *** [clang-analyze] Error 1 >> > >> >Alternatively, if I run it from a build tree configured to use GCC, I >> >get the >> >following: >> > >> >make all-recursive >> >make[2]: Entering directory '/home/blp/nicira/ovs/_build' >> >Making all in datapath >> >make[3]: Entering directory '/home/blp/nicira/ovs/_build/datapath' >> >make[4]: Entering directory '/home/blp/nicira/ovs/_build/datapath' >> >make[4]:
Re: [ovs-dev] [PATCH v2] Makefile.am: Add clang static analysis support
>-Original Message- >From: Ben Pfaff [mailto:b...@ovn.org] >Sent: Sunday, July 3, 2016 1:12 AM >To: Bodireddy, Bhanuprakash >Cc: dev@openvswitch.org >Subject: Re: [ovs-dev] [PATCH v2] Makefile.am: Add clang static analysis >support > >On Sat, Jul 02, 2016 at 10:02:37PM +0000, Bodireddy, Bhanuprakash wrote: >> >> >-Original Message- >> >From: Ben Pfaff [mailto:b...@ovn.org] >> >Sent: Saturday, July 2, 2016 9:31 PM >> >To: Bodireddy, Bhanuprakash >> >Cc: dev@openvswitch.org >> >Subject: Re: [ovs-dev] [PATCH v2] Makefile.am: Add clang static >> >analysis support >> > >> >On Sat, Jul 02, 2016 at 08:14:02PM +, Bodireddy, Bhanuprakash wrote: >> >> >-Original Message- >> >> >From: Ben Pfaff [mailto:b...@ovn.org] >> >> >Sent: Saturday, July 2, 2016 6:14 PM >> >> >To: Bodireddy, Bhanuprakash >> >> >Cc: dev@openvswitch.org >> >> >Subject: Re: [ovs-dev] [PATCH v2] Makefile.am: Add clang static >> >> >analysis support >> >> > >> >> >On Tue, Jun 28, 2016 at 05:09:13PM +0100, Bhanuprakash Bodireddy >wrote: >> >> >> Clang Static Analyzer is a source code analysis tool to find bugs. >> >> >> This patch adds make target to trigger static analysis using >> >> >> below >> >commands. >> >> >> >> >> >> ./boot.sh >> >> >> ./configure --with-dpdk(in case of DPDK datapath) make >> >> >> clang-analyze scan-view --host= --port >> >> >> $OVS_DIR>/clang-analyzer-results/-mm-dd-114251-1027-1> >> >> >> --allow-all-hosts >> >> >> >> >> >> Results can be viewed on browser: http://:/ >> >> >> >> >> >> v1->v2: >> >> >> * Change the output directory to tests/clang-analyzer-results >> >> >> * Remove '-j' make option, This might potentially hang some system >> >> >> while spawning infinite jobs. >> >> >> >> >> >> Signed-off-by: Bhanuprakash Bodireddy >> >> >> >> >> > >> >> >I'd tend to write this a little differently, maybe like this: >> >> > >> >> >clang-analyze: clean >> >> > @which clang scan-build >/dev/null 2>&1 || \ >> >> > (echo "Unable to find clang/scan-build, Install >> >> >clang,clang-analyzer packages"; exit 1) >> >> > @$(MKDIR_P) "$(srcdir)/tests/clang-analyzer-results" >> >> > @scan-build -o $(srcdir)/tests/clang-analyzer-results --use- >> >> >analyzer=/usr/bin/clang $(MAKE) >> >> >.PHONY: clang-analyze >> >> >> >> This is fine for me. >> >> >> >> > >> >> >But it doesn't work for me anyway. When I run it from a build >> >> >tree configured to use clang, I get the following: >> >> > >> >> >make all-recursive >> >> >make[2]: Entering directory '/home/blp/nicira/ovs/_clang' >> >> >Making all in datapath >> >> >make[3]: Entering directory '/home/blp/nicira/ovs/_clang/datapath' >> >> >make[4]: Entering directory '/home/blp/nicira/ovs/_clang/datapath' >> >> >make[4]: Leaving directory '/home/blp/nicira/ovs/_clang/datapath' >> >> >make[3]: Leaving directory '/home/blp/nicira/ovs/_clang/datapath' >> >> >make[3]: Entering directory '/home/blp/nicira/ovs/_clang' >> >> > CC lib/aes128.lo >> >> >gcc: error: unrecognized command line option '-Wthread-safety' >> >> >gcc: error: unrecognized command line option '-Qunused-arguments' >> >> >gcc: error: unrecognized command line option '-fno-caret-diagnostics' >> >> >Makefile:4179: recipe for target 'lib/aes128.lo' failed >> >> >make[3]: *** [lib/aes128.lo] Error 1 >> >> >make[3]: Leaving directory '/home/blp/nicira/ovs/_clang' >> >> >Makefile:4831: recipe for target 'all-recursive' failed >> >> >make[2]: *** [all-recursive] Error 1 >> >> >make[2]: Leaving directory '/home/blp/nicira/ovs/_clang' >> >> >Makefile:2749: recipe for target
Re: [ovs-dev] [PATCH v1] INSTALL.DPDK: Update vhost multiqueue instructions.
>-Original Message- >From: Stokes, Ian >Sent: Wednesday, June 29, 2016 2:32 PM >To: dev@openvswitch.org >Cc: Bodireddy, Bhanuprakash >Subject: RE: [PATCH v1] INSTALL.DPDK: Update vhost multiqueue instructions. > >Hi All, > >Just a gentle reminder as regards this patch. > >Bhanu has a new documentation patch for INSTALL.DPDK, maybe it should be >added as part of that? I have captured the vhost multiqueue instructions below in the ADVANCED install guide of version 8 patch below. http://openvswitch.org/pipermail/dev/2016-July/074375.html Regards, Bhanu Prakash. > >Is there value in back porting this patch to OVS 2.5 though? > >Thanks >Ian > >> -Original Message- >> From: Stokes, Ian >> Sent: Tuesday, June 14, 2016 4:02 PM >> To: dev@openvswitch.org >> Cc: Stokes, Ian >> Subject: [PATCH v1] INSTALL.DPDK: Update vhost multiqueue instructions. >> >> Add details regarding PMD and rxq configuration to the vhost-user >> multiqueue section to better enable packet enqueueing across multiple >> vhost queues. >> >> Signed-off-by: Ian Stokes >> --- >> INSTALL.DPDK.md | 11 +++ >> 1 files changed, 11 insertions(+), 0 deletions(-) >> >> diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index c2e32bf..afab7d7 >> 100644 >> --- a/INSTALL.DPDK.md >> +++ b/INSTALL.DPDK.md >> @@ -674,6 +674,17 @@ Follow the steps below to attach vhost-user >> port(s) to a VM. >> -device virtio-net- >> pci,mac=00:00:00:00:00:02,netdev=mynet2,mq=on,vectors=$v >> ``` >> >> + A least 2 PMDs should be configured for the vswitch when using >> multiqueue. >> + Using a single PMD will cause traffic to be enqueued to the same >> vhost >> + queue rather than being distributed among different vhost queues >> + for >> a >> + vhost-user interface. >> + >> + If traffic destined for a VM configured with multiqueue arrives to >> the >> + vswitch via a physical DPDK port, then the number of rxqs should >> also be >> + set to at least 2 for that physical DPDK port. This is required to >> increase >> + the probability that a different PMD will handle the multiqueue >> + transmission to the guest using a different vhost queue. >> + >> If one wishes to use multiple queues for an interface in the >> guest, the >> driver in the guest operating system must be configured to do so. >> It is >> recommended that the number of queues configured be equal to '$q'. >> -- >> 1.7.4.1 ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v1 1/1] INSTALL.DPDK: Flag DPDK firmware requirements.
Please note that Firmware requirements information is also captured in Refactored install guide v8. http://openvswitch.org/pipermail/dev/2016-July/074373.html http://openvswitch.org/pipermail/dev/2016-July/074374.html Regards, Bhanu Prakash. >-Original Message- >From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Ian Stokes >Sent: Thursday, June 30, 2016 3:46 PM >To: dev@openvswitch.org >Subject: [ovs-dev] [PATCH v1 1/1] INSTALL.DPDK: Flag DPDK firmware >requirements. > >Add a note regarding required firmware versions for network interfaces used >with DPDK as well as a link to the list of validated versions for DPDK 16.04. > >Signed-off-by: Ian Stokes >--- > INSTALL.DPDK.md | 12 > 1 files changed, 12 insertions(+), 0 deletions(-) > >diff --git a/INSTALL.DPDK.md b/INSTALL.DPDK.md index 00e75bd..3b93351 >100644 >--- a/INSTALL.DPDK.md >+++ b/INSTALL.DPDK.md >@@ -1017,6 +1017,18 @@ Restrictions: > increased to the desired number of queues. Both DPDK and OVS must be > recompiled for this change to take effect. > >+ Network Interface Firmware requirements: >+ - Each release of DPDK is validated against a specific firmware version for >+a supported Network Interface. New firmware versions introduce bug >fixes, >+performance improvements and new functionality that DPDK leverages. >The >+validated firmware versions are available as part of the release notes for >+DPDK. It is recommended that users update Network Interface firmware >to >+match what has been validated for the DPDK release. >+ >+For DPDK 16.04, the list of validated firmware versions can be found at: >+ >+http://dpdk.org/doc/guides/rel_notes/release_16_04.html >+ > Bug Reporting: > -- > >-- >1.7.4.1 > >___ >dev mailing list >dev@openvswitch.org >http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v8 2/2] INSTALL.DPDK: Refactor DPDK install guide, add ADVANCED doc
>-Original Message- >From: Lance Richardson [mailto:lrich...@redhat.com] >Sent: Sunday, July 3, 2016 4:17 PM >To: Bodireddy, Bhanuprakash >Cc: dev@openvswitch.org; daniele di proietto > >Subject: Re: [ovs-dev] [PATCH v8 2/2] INSTALL.DPDK: Refactor DPDK install >guide, add ADVANCED doc > >- Original Message - >> From: "Bhanuprakash Bodireddy" >> To: dev@openvswitch.org >> Cc: "daniele di proietto" >> Sent: Sunday, July 3, 2016 10:48:25 AM >> Subject: [ovs-dev] [PATCH v8 2/2] INSTALL.DPDK: Refactor DPDK install >> guide,add ADVANCED doc >> >> Add INSTALL.DPDK-ADVANCED document that is forked off from original >> INSTALL.DPDK guide. This document is targeted at users looking for >> optimum performance on OVS using dpdk datapath. >> >> Signed-off-by: Bhanuprakash Bodireddy >> >> Acked-by: Flavio Leitner >> --- > >> +## 9. Static Code Analysis >> + >> +Static Analysis is method of debugging SW by examining the code >> +rather than actually executing it. Many third party Software is >> +available to carry Static analysis, few being open source and rest >commercial. >> + >> +Below are the steps to run clang static analyzer on OVS codebase. >> + >> + ``` >> + apt-get install clang [ On Ubuntu] >> + dnf install clang clang-analyzer -y [ On fedora] >> + >> + cd $OVS_DIR >> + ./boot.sh >> + ./configure --with-dpdk >> + make clean >> + scan-build make CFLAGS="-std=gnu99" >> + scan-view --host= --port 8183 >> /tmp/scan-build--mm-dd-114251-1027-1 --allow-all-hosts >> + ``` >> + >> + The results can be viewed on the browser using ip address and port no. >> + >> + `http://:8183/` >> + > >Static analysis support (which is a very nice addition by the way) really >should >be documented elsewhere. It is just as useful (and usable) for non-DPDK >builds. I think INSTALL.md would be a better place to document it. I agree with you Lance. I am going to send out another version of static analysis patch where in INSTALL.md shall be updated with necessary instructions. Regards, Bhanu Prakash. > >Regards, > > Lance ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v8 2/2] INSTALL.DPDK: Refactor DPDK install guide, add ADVANCED doc
>-Original Message- >From: Daniele Di Proietto [mailto:diproiet...@ovn.org] >Sent: Wednesday, July 6, 2016 7:05 AM >To: Bodireddy, Bhanuprakash >Cc: Lance Richardson ; dev@openvswitch.org >Subject: Re: [ovs-dev] [PATCH v8 2/2] INSTALL.DPDK: Refactor DPDK install >guide, add ADVANCED doc > >I agree, maybe the section about static analysis could be added to INSTALL.md >(doesn't need to be part of this series). I agree with your suggestion. I have removed the static analysis section from ADVANCED guide in v9 sent out today. Also I have added static code Analysis section to INSTALL.md and sent out a v4 patch the other day for triggering clang static analysis with 'make clang-analyze'. Regards, Bhanu Prakash. >Other than that the patch looks good to me >Thanks, >Daniele > > >2016-07-04 1:19 GMT-07:00 Bodireddy, Bhanuprakash >: >>-Original Message- >>From: Lance Richardson [mailto:lrich...@redhat.com] >>Sent: Sunday, July 3, 2016 4:17 PM >>To: Bodireddy, Bhanuprakash >>Cc: dev@openvswitch.org; daniele di proietto >> >>Subject: Re: [ovs-dev] [PATCH v8 2/2] INSTALL.DPDK: Refactor DPDK install >>guide, add ADVANCED doc >> >>- Original Message - >>> From: "Bhanuprakash Bodireddy" >>> To: dev@openvswitch.org >>> Cc: "daniele di proietto" >>> Sent: Sunday, July 3, 2016 10:48:25 AM >>> Subject: [ovs-dev] [PATCH v8 2/2] INSTALL.DPDK: Refactor DPDK install >>> guide,add ADVANCED doc >>> >>> Add INSTALL.DPDK-ADVANCED document that is forked off from original >>> INSTALL.DPDK guide. This document is targeted at users looking for >>> optimum performance on OVS using dpdk datapath. >>> >>> Signed-off-by: Bhanuprakash Bodireddy >>> >>> Acked-by: Flavio Leitner >>> --- >> >>> +## 9. Static Code Analysis >>> + >>> +Static Analysis is method of debugging SW by examining the code >>> +rather than actually executing it. Many third party Software is >>> +available to carry Static analysis, few being open source and rest >>commercial. >>> + >>> +Below are the steps to run clang static analyzer on OVS codebase. >>> + >>> + ``` >>> + apt-get install clang [ On Ubuntu] >>> + dnf install clang clang-analyzer -y [ On fedora] >>> + >>> + cd $OVS_DIR >>> + ./boot.sh >>> + ./configure --with-dpdk >>> + make clean >>> + scan-build make CFLAGS="-std=gnu99" >>> + scan-view --host= --port 8183 >>> /tmp/scan-build--mm-dd-114251-1027-1 --allow-all-hosts >>> + ``` >>> + >>> + The results can be viewed on the browser using ip address and port no. >>> + >>> + `http://:8183/` >>> + >> >>Static analysis support (which is a very nice addition by the way) really >>should >>be documented elsewhere. It is just as useful (and usable) for non-DPDK >>builds. I think INSTALL.md would be a better place to document it. > >I agree with you Lance. I am going to send out another version of static >analysis patch where in INSTALL.md >shall be updated with necessary instructions. > >Regards, >Bhanu Prakash. > >> >>Regards, >> >> Lance ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v8 1/2] INSTALL.DPDK: Refactor DPDK install documentation
>- With SMT enabled, one physical core appears as two logical cores >- which can improve performance. >+ - OVS current development can be clone using 'git' tool > >- SMT can be utilized to add additional pmd threads without consuming >- additional physical cores. Additional pmd threads may be added in the >- same manner as described in section 2. If trying to minimize the use >- of physical cores for pmd threads, care must be taken to set the >- correct bits in the pmd-cpu-mask to ensure that the pmd threads are >- pinned to SMT siblings. >+ ``` >+ cd /usr/src/ >+ git clone https://github.com/openvswitch/ovs.git >+ export OVS_DIR=/usr/src/ovs >+ ``` > >Not a big deal, but I would be less verbose about downloading OVS sources. >The reader probably already has already downloaded the sources and is >reading the document offline. >I'd definitely remove instruction on how to get a tarball from github. Thanks Daniele for reviewing the install guides. I have removed the tarball download instructions and sent out V9 patch. > > >- For example, when using 2x 10 core processors in a dual socket system >- with HT enabled, /proc/cpuinfo will report 40 logical cores. To use >- two logical cores which share the same physical core for pmd threads, >- the following command can be used to identify a pair of logical cores. >+ - Install OVS dependencies > >- `cat /sys/devices/system/cpu/cpuN/topology/thread_siblings_list` >+ GNU make, GCC 4.x (or) Clang 3.4 (Mandatory) >+ libssl, libcap-ng, Python 2.7 (Optional) > >Would you mind adding libnuma again here? I have added libnuma to mandatory list in v9. Regards, Bhanu Prakash. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH] acinclude: Autodetect DPDK location when configuring OVS
Hello Panu, Thanks for the comments. I have follow up question on the auto discovery of DPDK install location. As the DPDK install location can vary with distros, do OVS configure script has to search for DPDK libraries in /usr/local/share as a starting point and proceed to /usr/share location (or) search only in /usr/local/share and throw up an error in case of missing libraries? I shall send out the updated patch based on your feedback. Regards, Bhanu Prakash. -Original Message- From: Panu Matilainen [mailto:pmati...@redhat.com] Sent: Friday, March 18, 2016 11:23 AM To: Bodireddy, Bhanuprakash ; dev@openvswitch.org Subject: Re: [ovs-dev] [PATCH] acinclude: Autodetect DPDK location when configuring OVS On 03/18/2016 01:11 PM, Bhanuprakash Bodireddy wrote: > When using DPDK datapath, the OVS configure script requires the DPDK > build directory passed on --with-dpdk. This can be avoided if the DPDK > is installed in standard location i.e /usr/src. > > This patch fixes the problem by searching for DPDK libraries in > standard location and configures OVS sources for dpdk datapath. > > Signed-off-by: Bhanuprakash Bodireddy > > --- > acinclude.m4 | 36 ++-- > 1 files changed, 34 insertions(+), 2 deletions(-) > > diff --git a/acinclude.m4 b/acinclude.m4 index 74f0494..c1036e4 100644 > --- a/acinclude.m4 > +++ b/acinclude.m4 > @@ -163,9 +163,41 @@ AC_DEFUN([OVS_CHECK_DPDK], [ > [AC_HELP_STRING([--with-dpdk=/path/to/dpdk], > [Specify the DPDK build directory])]) > > - if test X"$with_dpdk" != X; then > -RTE_SDK=$with_dpdk > + RTE_SDK="" > + AC_MSG_CHECKING([whether dpdk datapath is enabled]) case > + "$with_dpdk" in > +yes) > + AC_MSG_RESULT([$with_dpdk]) > + DEFAULT_RTE_SDK="/usr/src/dpdk*" > + DEFAULT_RTE_TARGET="x86_64-native-linuxapp-gcc" > + dpdk_build=`find $DEFAULT_RTE_SDK -name $DEFAULT_RTE_TARGET > 2>/dev/null | head -1` > + if test -d "$dpdk_build"; then > +AC_CHECK_FILE("$dpdk_build/lib/libdpdk.a", dpdk_lib=1, > [AC_CHECK_FILE("$dpdk_build/lib/libdpdk.so", dpdk_lib=1, dpdk_lib=0)]) > +if test "$dpdk_lib" = 1; then > + RTE_SDK="$dpdk_build" > +fi > + else > +AC_MSG_ERROR([Unable to find dpdk in /usr/src, if installed in a > non-standard location specify the target location using '--with-dpdk' option]) > + fi Um, /usr/src is in no way a standard location for dpdk. "make install" in dpdk >= 2.2.0 installs the sdk environment to /share/dpdk/ where by upstream default is /usr/local. Distros are likely to use /usr as the prfix. I'd suggest using the dpdk upstream installation layout as a starting point for automatic discovery. - Panu - -- Intel Research and Development Ireland Limited Registered in Ireland Registered Office: Collinstown Industrial Park, Leixlip, County Kildare Registered Number: 308263 This e-mail and any attachments may contain confidential material for the sole use of the intended recipient(s). Any review or distribution by others is strictly prohibited. If you are not the intended recipient, please contact the sender and delete all copies. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v2] acinclude: Autodetect DPDK location when configuring OVS
Hello Panu, My answers inline. -Original Message- From: Panu Matilainen [mailto:pmati...@redhat.com] Sent: Wednesday, March 23, 2016 1:20 PM To: Bodireddy, Bhanuprakash ; dev@openvswitch.org Subject: Re: [PATCH v2] acinclude: Autodetect DPDK location when configuring OVS On 03/21/2016 04:12 PM, Bhanuprakash Bodireddy wrote: > When using DPDK datapath, the OVS configure script requires the DPDK > build directory passed on --with-dpdk. This can be avoided if the DPDK > is installed in standard location i.e /usr/local/share/dpdk (or) > /usr/share/dpdk > > This patch fixes the problem by searching for DPDK libraries in > standard location and configures OVS sources for dpdk datapath. > > If the install location is manually specified in "--with-dpdk" > autodiscovery shall be skipped. > > Signed-off-by: Bhanuprakash Bodireddy > > --- > acinclude.m4 | 41 +++-- > 1 file changed, 39 insertions(+), 2 deletions(-) > > diff --git a/acinclude.m4 b/acinclude.m4 index 74f0494..d780759 100644 > --- a/acinclude.m4 > +++ b/acinclude.m4 > @@ -163,9 +163,46 @@ AC_DEFUN([OVS_CHECK_DPDK], [ > [AC_HELP_STRING([--with-dpdk=/path/to/dpdk], > [Specify the DPDK build directory])]) > > - if test X"$with_dpdk" != X; then > -RTE_SDK=$with_dpdk > + RTE_SDK="" > + AC_MSG_CHECKING([whether dpdk datapath is enabled]) case > + "$with_dpdk" in > +yes) > + AC_MSG_RESULT([$with_dpdk]) > + INSTALL_PREFIX="/usr/local /usr" > + for i in $INSTALL_PREFIX; do > + DEFAULT_RTE_SDK="$i/share/dpdk" > + DEFAULT_RTE_TARGET="x86_64-native-linuxapp-gcc" Limiting autodetection to x86_64-native-linuxapp-gcc seems ... quite limited. That would not, for example, find DPDK on Fedora or RHEL since the target name is x86_64-default-linuxapp-gcc on x86_64, never mind non-x86_64 architectures. I'd suggest figuring the target name from $DEFAULT_RTE_SDK/*/.config matches, that is what rhel/openvswitch-fedora.spec does to solve this problem. [BHANU] Good point. Will handle this case. Sorry for not noticing this on the first round. > + DPDK_BUILD=$DEFAULT_RTE_SDK/$DEFAULT_RTE_TARGET > + if test -d "$DPDK_BUILD"; then > +AC_CHECK_FILE("$DPDK_BUILD/lib/libdpdk.a", dpdk_lib=1, > [AC_CHECK_FILE("$DPDK_BUILD/lib/libdpdk.so", dpdk_lib=1, dpdk_lib=0)]) > +if test "$dpdk_lib" = 1; then > + RTE_SDK="$DPDK_BUILD" > + break > +fi > + fi > + done > + if test -z "$RTE_SDK"; then > +AC_MSG_ERROR([Could not find DPDK libraries in $INSTALL_PREFIX > directories, Use '--with-dpdk' to specify the path to DPDK libraries > installed in non-standard location]) > + fi > + ;; > +no) > + AC_MSG_RESULT([$with_dpdk]) > + ;; > +"") > + AC_MSG_RESULT([no]) > + ;; > +*) > + AC_MSG_RESULT([yes]) > + AC_CHECK_FILE("$with_dpdk/lib/libdpdk.a", dpdk_lib=1, > [AC_CHECK_FILE("$with_dpdk/lib/libdpdk.so", dpdk_lib=1, dpdk_lib=0)]) > + if test "$dpdk_lib" = 1; then > +RTE_SDK="$with_dpdk" > + else > +AC_MSG_ERROR([Could not find DPDK libraries in $with_dpdk/lib]) > + fi > + ;; > + esac > > + if test X"$RTE_SDK" != X; then > DPDK_INCLUDE=$RTE_SDK/include > DPDK_LIB_DIR=$RTE_SDK/lib > DPDK_LIB="-ldpdk" > OTOH... there's another way of looking at it all: with DPDK >= 2.2 standard installation, the library and includes should actually be in the regular compiler etc search paths and all this poking around be unnecessary, you could just try to link to it. That said, I wouldn't be surprised if there are some further gotchas to sort in that direction. [BHANU] I quickly verified this by installing latest DPDK on fedora22. In this case the DPDK library got installed in /usr/local/lib and the header files in to /usr/local/include/dpdk/ directory. I presume in case of RHEL it may be /usr/lib & /usr/lib/include/dpdk. I would rework the patch to handle the below cases. - In auto discovery approach, find and link against dpdk library found in default Library search path. - In other case, link against the dpdk library present in the DPDK build location passed on "--with-dpdk". Regards, Bhanu Prakash. - Panu - ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 1/2] acinclude.m4: Allow building against a DPDK installation
Hello Markos, While this patch allows OVS to build against DPDK even with install directory passed (i.e --with-dpdk=$DPDK_BUILD/install), I am reworking on another patch that handles auto discovery of DPDK library and fixes some issues in OVS configuration. http://openvswitch.org/pipermail/dev/2016-March/068356.html With the updated patch the options "--with-dpdk, --with-dpdk=$DPDK_BUILD, --without-dpdk " shall be supported. Currently --with-dpdk, --without-dpdk options aren't supported and I would also add support for --with-dpdk=$DPDK_BUILD/install in my next patch update. Does this sound good? Regards, Bhanu Prakash. > It's possible for a system to not have the compiled DPDK sources around but > it might have DPDK installed so use that instead if possible. > > Signed-off-by: Markos Chandras > --- > acinclude.m4 | 11 --- > 1 file changed, 8 insertions(+), 3 deletions(-) > > diff --git a/acinclude.m4 b/acinclude.m4 index f345c31..0e6517b 100644 > --- a/acinclude.m4 > +++ b/acinclude.m4 > @@ -161,19 +161,24 @@ dnl Configure DPDK source tree > AC_DEFUN([OVS_CHECK_DPDK], [ >AC_ARG_WITH([dpdk], >[AC_HELP_STRING([--with-dpdk=/path/to/dpdk], > - [Specify the DPDK build directory])]) > + [Specify the DPDK build or install > + directory])]) > >if test X"$with_dpdk" != X; then > RTE_SDK=$with_dpdk > > DPDK_INCLUDE=$RTE_SDK/include > +# Maybe RTE_SDK points to an installed DPDK? > +# DPDK installs headers in $DESTDIR/$prefix/include/dpdk > +if test ! -e $DPDK_INCLUDE/rte_config.h; then > +DPDK_INCLUDE=$DPDK_INCLUDE/dpdk > +fi > DPDK_LIB_DIR=$RTE_SDK/lib > DPDK_LIB="-ldpdk" > DPDK_EXTRA_LIB="" > -RTE_SDK_FULL=`readlink -f $RTE_SDK` > +DPDK_INCLUDE_FULL=`readlink -f $DPDK_INCLUDE` > > AC_COMPILE_IFELSE( > - [AC_LANG_PROGRAM([#include > <$RTE_SDK_FULL/include/rte_config.h> > + [AC_LANG_PROGRAM([#include <$DPDK_INCLUDE_FULL/rte_config.h> > #if !RTE_LIBRTE_VHOST_USER > #error > #endif], [])], > -- > 2.7.3 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v3] acinclude: Autodetect DPDK when configuring OVS
> -Original Message- > From: Panu Matilainen [mailto:pmati...@redhat.com] > Sent: Wednesday, March 30, 2016 1:06 PM > To: Bodireddy, Bhanuprakash ; > dev@openvswitch.org > Cc: mchand...@suse.de > Subject: Re: [PATCH v3] acinclude: Autodetect DPDK when configuring OVS > > On 03/25/2016 05:31 PM, Bhanuprakash Bodireddy wrote: > > When using DPDK datapath, the OVS configure script requires the DPDK > > build directory passed on --with-dpdk. This can be avoided if DPDK > > library, headers are in standard compiler search paths. > > > > This patch fixes the problem by searching for DPDK libraries in > > standard locations and configures OVS sources for dpdk datapath. > > > > If the install location is manually specified in "--with-dpdk" > > autodiscovery shall be skipped > > > > Signed-off-by: Bhanuprakash Bodireddy > > > > --- > > acinclude.m4 | 61 --- > - > > 1 file changed, 37 insertions(+), 24 deletions(-) > > > > diff --git a/acinclude.m4 b/acinclude.m4 index f345c31..edb9563 100644 > > --- a/acinclude.m4 > > +++ b/acinclude.m4 > > @@ -163,22 +163,32 @@ AC_DEFUN([OVS_CHECK_DPDK], [ > > [AC_HELP_STRING([--with-dpdk=/path/to/dpdk], > > [Specify the DPDK build directory])]) > > > > - if test X"$with_dpdk" != X; then > > -RTE_SDK=$with_dpdk > > + AC_MSG_CHECKING([whether dpdk datapath is enabled]) if test -z > > + "$with_dpdk" || test "$with_dpdk" == no; then > > +AC_MSG_RESULT([no]) > > +DPDKLIB_FOUND=false > > + elif test -n "$with_dpdk"; then > > +AC_MSG_RESULT([yes]) > > +case "$with_dpdk" in > > + yes) > > +DPDK_AUTO_DISCOVER="true" > > +;; > > + *) > > +DPDK_AUTO_DISCOVER="false" > > +;; > > +esac > > > > -DPDK_INCLUDE=$RTE_SDK/include > > -DPDK_LIB_DIR=$RTE_SDK/lib > > +if $DPDK_AUTO_DISCOVER; then > > + DPDK_INCLUDE="/usr/local/include/dpdk -I/usr/include/dpdk" > > + DPDK_LIB_DIR="/usr/local/lib -L/usr/lib64 -L/usr/lib" > > This raises questions like why /usr/lib64 but not /usr/local/lib64? > However, the bigger issue there is that lib and lib64 contents cannot be > mixed because on a system where lib64 paths are valid, lib paths are 32bit > libraries. The linker already knows all that, so instead of trying to guess > paths > in vain, just try to link to -ldpdk. If that fails then its either outside > standard > library paths or does not exist on the system. Ok, In case of autodiscovery mechanism I will let the linker do the job and will have the DPDK_LIB_DIR set only when user explicitly pass the build location on "--with-dpdk". > > The include path does need to be determined since it needs to be added to - > I, those two locations should be enough though.. Fine. > > +else > > + DPDK_INCLUDE="$with_dpdk/include" > > + # If 'with_dpdk' is passed install directory, point to headers > > + # installed in $DESTDIR/$prefix/include/dpdk > > + > AC_CHECK_FILE([$DPDK_INCLUDE/rte_config.h],,[AC_CHECK_FILE([$DPDK_I > NCLUDE/dpdk/rte_config.h],[DPDK_INCLUDE=$DPDK_INCLUDE/dpdk],[])]) > > + DPDK_LIB_DIR="$with_dpdk/lib" > > +fi > > DPDK_LIB="-ldpdk" > > -DPDK_EXTRA_LIB="" > > -RTE_SDK_FULL=`readlink -f $RTE_SDK` > > - > > -AC_COMPILE_IFELSE( > > - [AC_LANG_PROGRAM([#include > <$RTE_SDK_FULL/include/rte_config.h> > > -#if !RTE_LIBRTE_VHOST_USER > > -#error > > -#endif], [])], > > -[], [AC_DEFINE([VHOST_CUSE], [1], [DPDK vhost-cuse > > support > enabled, vhost-user disabled.]) > > - DPDK_EXTRA_LIB="-lfuse"]) > > This is silently removing vhost-cuse/vhost-user detection. I personally > wouldn't mind support for vhost-cuse dropped but this is not the way to do > it, it belongs to a separate patch along with appropriate explanation. Point taken, I will handle this separately in a different patch with updates to INSTALL.DPDK.md. - Bhanu Prakash. > > > - Panu - ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 0/2] doc: Refactor DPDK install guide
Hello, A gentle reminder. Any comments on the updated OVS DPDK install guides? Regards, Bhanu Prakash. > -Original Message- > From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of > Bhanuprakash Bodireddy > Sent: Wednesday, March 16, 2016 4:18 PM > To: dev@openvswitch.org > Subject: [ovs-dev] [PATCH 0/2] doc: Refactor DPDK install guide > > This patchset refactors the present INSTALL.DPDK.md guide. > > The INSTALL guide is split in to two documents named INSTALL.DPDK and > INSTALL.DPDK-ADVANCED. The former document is simplified with emphasis > on installation, basic testcases and targets novice users. > Sections related to system configuration, performance tuning, vhost > walkthrough are moved to DPDK-ADVANCED guide. > > Bhanuprakash Bodireddy (2): > doc: Refactor DPDK install documentation > doc: Refactor DPDK install guide, add ADVANCED doc > > INSTALL.DPDK-ADVANCED.md | 650 ++ > INSTALL.DPDK.md | 1125 > -- > 2 files changed, 941 insertions(+), 834 deletions(-) create mode 100644 > INSTALL.DPDK-ADVANCED.md > > -- > 1.7.4.1 > > ___ > dev mailing list > dev@openvswitch.org > http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v4] acinclude: Autodetect DPDK location when configuring OVS
Thanks for the review Ben. I have sent out Patch V5 based on your comments. - Bhanu Prakash. > -Original Message- > From: Ben Pfaff [mailto:b...@ovn.org] > Sent: Tuesday, April 12, 2016 3:58 AM > To: Bodireddy, Bhanuprakash > Cc: dev@openvswitch.org > Subject: Re: [ovs-dev] [PATCH v4] acinclude: Autodetect DPDK location when > configuring OVS > > On Thu, Mar 31, 2016 at 08:03:05PM +0100, Bhanuprakash Bodireddy wrote: > > When using DPDK datapath, the OVS configure script requires the DPDK > > build directory passed on --with-dpdk. This can be avoided if DPDK > > library, headers are in standard compiler search paths. > > > > This patch fixes the problem by searching for DPDK libraries in > > standard locations and configures OVS sources for dpdk datapath. > > > > If the install location is manually specified in "--with-dpdk" > > autodiscovery shall be skipped. > > > > Signed-off-by: Bhanuprakash Bodireddy > > > > ... > > > - if test X"$with_dpdk" != X; then > > -RTE_SDK=$with_dpdk > > + AC_MSG_CHECKING([whether dpdk datapath is enabled]) > > == is not portable for "test", use = instead: > > + if test -z "$with_dpdk" || test "$with_dpdk" == no; then > > +AC_MSG_RESULT([no]) > > +DPDKLIB_FOUND=false > > Isn't the following "elif" test always true? That is, can't it just be an > "else"? > > > + elif test -n "$with_dpdk"; then > > +AC_MSG_RESULT([yes]) > > +case "$with_dpdk" in > > + yes) > > +DPDK_AUTO_DISCOVER="true" > > +;; > > + *) > > +DPDK_AUTO_DISCOVER="false" > > +;; > > +esac > > > > -DPDK_INCLUDE=$RTE_SDK/include > > -DPDK_LIB_DIR=$RTE_SDK/lib > > Isn't the following "if" a little redundant given the previous "case" > command, that is, why not integrate these into the cases? > > > +if $DPDK_AUTO_DISCOVER; then > > + DPDK_INCLUDE="/usr/local/include/dpdk -I/usr/include/dpdk" > > +else > > + DPDK_INCLUDE="$with_dpdk/include" > > + # If 'with_dpdk' is passed install directory, point to headers > > + # installed in $DESTDIR/$prefix/include/dpdk > > This is really crunched together, why not add some whitespace to match the > rest of the style: > > + > AC_CHECK_FILE([$DPDK_INCLUDE/rte_config.h],,[AC_CHECK_FILE([$DPDK_I > NCLUDE/dpdk/rte_config.h],[DPDK_INCLUDE=$DPDK_INCLUDE/dpdk],[])]) > > + DPDK_LIB_DIR="$with_dpdk/lib" > > +fi > > DPDK_LIB="-ldpdk" > > DPDK_EXTRA_LIB="" > > -RTE_SDK_FULL=`readlink -f $RTE_SDK` > > + > > +ovs_save_CFLAGS="$CFLAGS" > > +ovs_save_LDFLAGS="$LDFLAGS" > > +CFLAGS="$CFLAGS -I$DPDK_INCLUDE" > > +if test "$DPDK_AUTO_DISCOVER" = "false"; then > > + LDFLAGS="$LDFLAGS -L${DPDK_LIB_DIR}" > > +fi > > > > AC_COMPILE_IFELSE( > > - [AC_LANG_PROGRAM([#include > <$RTE_SDK_FULL/include/rte_config.h> > > + [AC_LANG_PROGRAM([#include > > #if !RTE_LIBRTE_VHOST_USER > > #error > > #endif], [])], > > The following line looks pretty randomly indented, I'd suggest that it should > be two spaces farther in than the AC_LANG_PROGRAM keyword (and > probably should start the [#include... on a separate line also indented the > same amount): > > > [], [AC_DEFINE([VHOST_CUSE], [1], [DPDK vhost-cuse > > support > enabled, vhost-user disabled.]) > > DPDK_EXTRA_LIB="-lfuse"]) > > > > -ovs_save_CFLAGS="$CFLAGS" > > -ovs_save_LDFLAGS="$LDFLAGS" > > -LDFLAGS="$LDFLAGS -L$DPDK_LIB_DIR" > > -CFLAGS="$CFLAGS -I$DPDK_INCLUDE" > > - > > # On some systems we have to add -ldl to link with dpdk > > # > > # This code, at first, tries to link without -ldl (""), > > Thanks, > > Ben. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 3/3] acinclude: Use SSE4.2 instruction set
Thanks for looking in to this Daniele, My comments inline. >-Original Message- >From: Daniele Di Proietto [mailto:diproiet...@ovn.org] >Sent: Sunday, August 14, 2016 9:13 PM >To: Bodireddy, Bhanuprakash >Cc: dev@openvswitch.org >Subject: Re: [ovs-dev] [PATCH 3/3] acinclude: Use SSE4.2 instruction set > >By default we do not want to build Open vSwitch for a particular CPU, we >want to create a build that can be distributed to every CPU of the architecture >(e.g. amd64). >Unfortunately, this not entirely possible for with DPDK, because rte_memcpy >(included by the headers in netdev-dpdk) requires SSSE3. The 'default' >machine in DPDK (which, if I'm not mistaken, should produce the most >distributable build) does not work on every amd64, but requires - >march=core2, i.e. -mssse3. I got the reason behind using mssse3 from the previous discussions on the Mailing list. >We shouldn't change this unless OvS+DPDK cannot work without sse4.2. I >don't think this is the case, right? With out sse4.2, OVS DPDK uses murmur hash for hash computation. I found that performance is better using crc32 intrinsics instead of murmurhash3. OVS DPDK is configured with SSSE3 by default and thought to override this with the patch. But I agree with your suggestion. Anyways we have a note in INSTALL.md on msse4.2 and that should help users looking to leverage built-in Intrinsics. Regards, Bhanu Prakash. >Thanks, >Daniele > >2016-08-14 11:35 GMT-07:00 Bhanuprakash Bodireddy >: >On processors with SSE4.2 instruction set support, CRC32 intrinsics can >be used for efficient hash computation. > >Update the m4_translit to convert '.' to '_' that otherwise cause 'bad >substitution' error when configuring OVS DPDK with msse4.2 support. > > ./configure: line 21027: ${ovs_cv__msse4.2+:}: bad substitution > >Signed-off-by: Bhanuprakash Bodireddy > >--- > acinclude.m4 | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > >diff --git a/acinclude.m4 b/acinclude.m4 >index aa57b47..93c7a5a 100644 >--- a/acinclude.m4 >+++ b/acinclude.m4 >@@ -280,7 +280,7 @@ AC_DEFUN([OVS_CHECK_DPDK], [ > OVS_LDFLAGS="$OVS_LDFLAGS -L$DPDK_LIB_DIR" > fi > OVS_CFLAGS="$OVS_CFLAGS -I$DPDK_INCLUDE" >- OVS_ENABLE_OPTION([-mssse3]) >+ OVS_ENABLE_OPTION([-msse4.2]) > > # DPDK pmd drivers are not linked unless --whole-archive is used. > # >@@ -776,7 +776,7 @@ dnl gives unlimited permission to copy and/or >distribute it, > dnl with or without modifications, as long as this notice is preserved. > > AC_DEFUN([_OVS_CHECK_CC_OPTION], [dnl >- m4_define([ovs_cv_name], [ovs_cv_[]m4_translit([$1], [-=], [__])])dnl >+ m4_define([ovs_cv_name], [ovs_cv_[]m4_translit([$1], [-=.], [___])])dnl > AC_CACHE_CHECK([whether $CC accepts $1], [ovs_cv_name], > [ovs_save_CFLAGS="$CFLAGS" > dnl Include -Werror in the compiler options, because without -Werror >-- >2.4.11 > >___ >dev mailing list >dev@openvswitch.org >http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v5] netdev-dpdk: Set pmd thread priority
>-Original Message- >From: Daniele Di Proietto [mailto:diproiet...@ovn.org] >Sent: Wednesday, July 27, 2016 10:10 PM >To: Kavanagh, Mark B >Cc: Bodireddy, Bhanuprakash ; >dev@openvswitch.org >Subject: Re: [PATCH v5] netdev-dpdk: Set pmd thread priority > >Thanks for the patch, the implementation looks good to me too. >During testing I kept noticing that it's way too easy to make OVS completely >unresponsive. As you point out in the documentation by having dpdk-lcore- >mask the same as pmd-cpu-mask, OVS cannot even be killed (a kill -9 is >required). I wonder what happens if one tries to set pmd-cpu-mask to every >core in the system. >As a way to mitigate the risk perhaps we can avoid setting the main thread >affinity to the first core in dpdk-lcore-mask by _always_ restoring it in >dpdk_init__(), also if auto_determine is false. I followed this approach but during the testing I found that the revalidator thread sometimes gets blocked when the pmd threads are scheduled. This may be for the reason that once the pmd threads (SCHED_RR policy applied) kicks in the other threads would never get scheduled and may eventually block. >Perhaps we should start explicitly prohibiting creating a pmd thread on the >first core in dpdk-lcore-mask (I get why previous version of this didn't do it >on >core 0. Perhaps we can generalize that to the first core in dpdk-lcore-mask). I thought this is a better approach and implemented this in V6 of the patch. Tested various cases and found to be working as expected. http://openvswitch.org/pipermail/dev/2016-August/078001.html > >What's the behavior of other DPDK applications? I spoke to DPDK team here and they confirmed our observations. Given the affinity mask the control threads gets always pinned to the lowest core of the mask and can be reproduced with any DPDK application. I am following up with the DPDK team to see if this behavior can be changed. Regards, Bhanu Prakash. >Thanks, >Daniele > >2016-07-27 5:28 GMT-07:00 Kavanagh, Mark B : >> >>Set the DPDK pmd thread scheduling policy to SCHED_RR and static >>priority to highest priority value of the policy. This is to deal with >>pmd thread starvation case where another cpu hogging process can get >>scheduled/affinitized on to the same core the pmd thread is running >>there by significantly impacting the datapath performance. >> >>Setting the realtime scheduling policy to the pmd threads is one step >>towards Fastpath Service Assurance in OVS DPDK. >> >>The realtime scheduling policy is applied only when CPU mask is passed >>to 'pmd-cpu-mask'. For example: >> >> * In the absence of pmd-cpu-mask, one pmd thread shall be created >> and default scheduling policy and priority gets applied. >> >> * If pmd-cpu-mask is specified, one or more pmd threads shall be >> spawned on the corresponding core(s) in the mask and real time >> scheduling policy SCHED_RR and highest priority of the policy is >> applied to the pmd thread(s). >> >>To reproduce the pmd thread starvation case: >> >>ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6 >>taskset 0x2 cat /dev/zero > /dev/null & >> >>With this commit, it is recommended that the OVS control thread and pmd >>thread shouldn't be pinned to same core ('dpdk-lcore-mask','pmd-cpu-mask' >>should be non-overlapping). Also other processes with same affinity as >>PMD thread will be unresponsive. >> >>Signed-off-by: Bhanuprakash Bodireddy > > >LGTM - Acked-by: mark.b.kavan...@intel.com > >>--- >>v4->v5: >>* Reword Note section in DPDK-ADVANCED.md >> >>v3->v4: >>* Document update >>* Use ovs_strerror for reporting errors in lib-numa.c >> >>v2->v3: >>* Move set_priority() function to lib/ovs-numa.c >>* Apply realtime scheduling policy and priority to pmd thread only if >> pmd-cpu-mask is passed. >>* Update INSTALL.DPDK-ADVANCED. >> >>v1->v2: >>* Removed #ifdef and introduced dummy function >"pmd_thread_setpriority" >> in netdev-dpdk.h >>* Rebase >> >> INSTALL.DPDK-ADVANCED.md | 17 + >> lib/dpif-netdev.c | 9 + >> lib/ovs-numa.c | 18 ++ >> lib/ovs-numa.h | 1 + >> 4 files changed, 41 insertions(+), 4 deletions(-) >> >>diff --git a/INSTALL.DPDK-ADVANCED.md b/INSTALL.DPDK-ADVANCED.md >>index 9ae536d..d76cb4e 100644 >>--- a/INSTALL.DPDK-ADVANCED.md >>+++ b/INSTALL.DPDK-ADVANCED.md >>@@ -205,8 +205,10 @@ needs to be affinitized accordi
Re: [ovs-dev] [PATCH v5] netdev-dpdk: Set pmd thread priority
Hello Flavio, Thanks for your feedback, unfortunately I missed this mail due to my outlook filter settings. Please see my comments inline. >-Original Message- >From: Flavio Leitner [mailto:f...@sysclose.org] >Sent: Thursday, July 28, 2016 8:27 PM >To: Bodireddy, Bhanuprakash >Cc: Daniele Di Proietto ; Kavanagh, Mark B >; dev@openvswitch.org >Subject: Re: [ovs-dev] [PATCH v5] netdev-dpdk: Set pmd thread priority > >On Thu, Jul 28, 2016 at 03:39:58PM +0000, Bodireddy, Bhanuprakash wrote: >> >-Original Message- >> >From: Daniele Di Proietto [mailto:diproiet...@ovn.org] >> >Sent: Wednesday, July 27, 2016 10:10 PM >> >To: Kavanagh, Mark B >> >Cc: Bodireddy, Bhanuprakash ; >> >dev@openvswitch.org >> >Subject: Re: [PATCH v5] netdev-dpdk: Set pmd thread priority >> > >> >Thanks for the patch, the implementation looks good to me too. >> >During testing I kept noticing that it's way too easy to make OVS >> >completely unresponsive. As you point out in the documentation by >> >having dpdk-lcore- mask the same as pmd-cpu-mask, OVS cannot even be >> >killed (a kill -9 is required). I wonder what happens if one tries >> >to set pmd-cpu-mask to every core in the system. >> >As a way to mitigate the risk perhaps we can avoid setting the main >> >thread affinity to the first core in dpdk-lcore-mask by _always_ >> >restoring it in dpdk_init__(), also if auto_determine is false. >> >Perhaps we should start explicitly prohibiting creating a pmd thread >> >on the first core in dpdk-lcore-mask (I get why previous version of >> >this didn't do it on core 0. Perhaps we can generalize that to the first >> >core >in dpdk-lcore-mask). >> I will look in to this and get back to you sometime next week. > >Isn't enough to just increase priority to the max with setpriority(2)? >I know it is not the same as SCHED_RR but for those users that don't know >how to tune it properly, this seems to be less dangerous while still providing >a >good share of CPU to the PMD thread. I agree with your suggestion here. Though my initial patch was to address pmd thread starvation case, I see now that I have put some restriction around the pmd-cpu-mask affinity setting by not spawning the pmd thread on the first core of the dpdk-lcore-mask due to the way DPDK handles the thread pinning . You can check the patch here: http://openvswitch.org/pipermail/dev/2016-August/078001.html At this point In time I would bump up the pmd thread priority instead of changing the policty to SCHED_RR as this Is less dangerous and would allow users with fewer cores to still set pmd-cpu-mask. > >For instance, I recall to have seen OVS revalidation threads running with PMD >on the same CPU, but that might have been 2.5 only. This is seen even with latest OVS Master. Here is the way you can reproduce this. From the below output you can see that the ovs-vswitchd, revalidators threads are running on the same core as pmd thread. $ ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=F0 $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=10 $ ps -eLo tid,psr,comm | grep -e lcore -e revalidator -e ovs-vswitchd -e pmd 89966 4 ovs-vswitchd 89969 5 lcore-slave-5 89970 6 lcore-slave-6 89971 7 lcore-slave-7 90038 4 revalidator37 90039 4 revalidator52 90040 4 revalidator42 90041 4 revalidator38 90042 4 revalidator39 90043 4 revalidator45 90044 4 revalidator53 90045 4 revalidator54 90047 4 pmd61 > >Thanks, >-- >fbl ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH V6] netdev-dpdk: Set pmd thread priority
>-Original Message- >From: Daniele Di Proietto [mailto:diproiet...@ovn.org] >Sent: Tuesday, August 16, 2016 1:44 AM >To: Bodireddy, Bhanuprakash >Cc: dev@openvswitch.org; Flavio Leitner >Subject: Re: [PATCH V6] netdev-dpdk: Set pmd thread priority > >I found a crash if apply this patch, "dpdk-lcore-mask" is not set and "-c 0x1" >is >passed to "dpdk-extra". My bad, I didn't test with dpdk-extra options. I see that the crash was due to strtol. >Also, I believe Flavio had a comment on the previous version of this >patch. Would it be enough to use setpriority(2)? I sent out my comments in another mail. I agree to Flavio's suggestion as this seems less dangerous and is guaranteed to work even in case of misconfiguration. I tested this approach and have a concern with setpriority(). To apply the new nice value to the thread, thread id is needed and due to absence of glibc wrapper for gettid, I have to use syscall(SYS_gettid). I want to know if this is acceptable in OVS or better way to handle this? Void ovs_numa_thread_setpriority(int nice OVS_UNUSED) { #if defined(__linux__) && defined(SYS_gettid) pid_t tid = syscall(SYS_gettid); err = setpriority(PRIO_PROCESS, tid, nice); #endif } Without priority patch: $ ps -eLo tid,pri,psr,comm | grep -e lcore -e revalidator -e ovs-vswitchd -e pmd 22509 19 4 ovs-vswitchd 22512 19 5 lcore-slave-5 22513 19 6 lcore-slave-6 22514 19 7 lcore-slave-7 22589 19 4 revalidator37 22590 19 4 revalidator52 22591 19 4 revalidator42 22592 19 4 revalidator38 22593 19 4 revalidator39 22594 19 4 revalidator45 22595 19 4 revalidator53 22596 19 4 revalidator54 22598 19 4 pmd61[Default priority] With priority patch: $ ps -eLo tid,pri,psr,comm | grep -e lcore -e revalidator -e ovs-vswitchd -e pmd 24879 19 4 ovs-vswitchd 24881 19 5 lcore-slave-5 24882 19 6 lcore-slave-6 24883 19 7 lcore-slave-7 24951 19 4 revalidator55 24952 19 4 revalidator37 24953 19 4 revalidator52 24954 19 4 revalidator42 24955 19 4 revalidator38 24956 19 4 revalidator39 24957 19 4 revalidator45 24958 19 4 revalidator53 24964 39 4 pmd61 [Higher priority set] Regards, Bhanu Prakash. >Thanks, >Daniele > >2016-08-15 8:19 GMT-07:00 Bhanuprakash Bodireddy >: >Set the DPDK pmd thread scheduling policy to SCHED_RR and static >priority to highest priority value of the policy. This is to deal with >pmd thread starvation case where another cpu hogging process can get >scheduled/affinitized on to the same core the pmd thread is running >there by significantly impacting the datapath performance. To reproduce >the pmd thread starvation case: > > $ ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=6 > $ taskset 0x2 cat /dev/zero > /dev/null & > >Setting the realtime scheduling policy to the pmd threads is one step >towards Fastpath Service Assurance in OVS DPDK. > >The realtime scheduling policy is applied only when CPU mask is passed >to 'pmd-cpu-mask'. For example: > > * In the absence of pmd-cpu-mask, one pmd thread shall be created > and default scheduling policy and priority gets applied. > > * If pmd-cpu-mask is specified, one or more pmd threads shall be > spawned on the corresponding core(s) in the mask and real time > scheduling policy SCHED_RR and highest priority of the policy is > applied to the pmd thread(s). > >With this commit, it is recommended that the OVS control thread and pmd >thread shouldn't be pinned to same core ('dpdk-lcore-mask','pmd-cpu-mask' >should be non-overlapping). If dpdk-lcore-mask is set same as pmd-cpu-mask >the pmd thread is not spawned on lowest core of the dpdk-lcore-mask. >Also other processes with same affinity as PMD thread will be unresponsive. > >Signed-off-by: Bhanuprakash Bodireddy > >--- >v5->v6: >* Prohibit spawning pmd thread on the lowest core in dpdk-lcore-mask if > lcore-mask and pmd-mask affinity are identical. >* Updated Note section in INSTALL.DPDK-ADVANCED doc. >* Tested below cases to verify system stability with pmd priority patch > > dpdk-lcore-mask | pmd-cpu-mask | Comment >1. Not set | Not set | control threads affinity: 0-27 > pmd thread: core 0 >2. 1 | 1 | pmd thread isn't spawned and warning > logged in logfile. >3. 1 | c | >4. F0 | F0 | control threads pinned to core 4. > 3 pmd threads created on core 5,6,7 but 4. > >v4->v5: >* Reword Note section in D
[ovs-dev] OVS DPDK performance drop with multiple flows
Hello All, I found significant performance drop using OVS DPDK when testing with multiple IXIA streams and matching flow rules. Example: For a packet stream with src ip 2.2.2.1 and dst ip 3.3.3.1, corresponding flow rule is set up as below. $ ovs-ofctl add-flow br0 dl_type=0x0800,nw_src=2.2.2.1,actions=output:2 From the implementation, I see that post the emc_lookup(), the packets are batched matching the flow and get processed in 'batches' with packet_batch_execute(). In OVS 2.6, during the testing I observed that with only few packets in a batch the netdev_send() gets called which internally invokes rte_eth_tx_burst() that incurs an expensive MMIO write. I was told that OVS 2.5 has intermediate queue feature enabled that queues and burst as many packets as it can to amortize the cost of MMIO write. When tested on OVS 2.5 performance drop is still noticed inspite of intermediate queue implementation due to below reason. With single queue in use txq_needs_locking is 'false' and flush_tx is always 'true'. With flush_tx always 'true' the Intermediate queue flushes packets for each batch using dpdk_queue_flush__() instead of queueing packets and the behavior is same as OVS 2.6. This may not be the idea behind the initial implementation of intermediate queue logic with dpdk_queue_pkts(). Appreciate your comments on this. Regards, Bhanu Prakash. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH V6] netdev-dpdk: Set pmd thread priority
>-Original Message- >From: Flavio Leitner [mailto:f...@sysclose.org] >Sent: Thursday, August 18, 2016 2:15 PM >To: Bodireddy, Bhanuprakash >Cc: Daniele Di Proietto ; dev@openvswitch.org >Subject: Re: [ovs-dev] [PATCH V6] netdev-dpdk: Set pmd thread priority > >On Tue, Aug 16, 2016 at 02:30:04PM +, Bodireddy, Bhanuprakash wrote: >> >-Original Message- >> >From: Daniele Di Proietto [mailto:diproiet...@ovn.org] >> >Sent: Tuesday, August 16, 2016 1:44 AM >> >To: Bodireddy, Bhanuprakash >> >Cc: dev@openvswitch.org; Flavio Leitner >> >Subject: Re: [PATCH V6] netdev-dpdk: Set pmd thread priority >> > >> >I found a crash if apply this patch, "dpdk-lcore-mask" is not set and >> >"-c 0x1" is passed to "dpdk-extra". >> My bad, I didn't test with dpdk-extra options. I see that the crash was due >> to >strtol. >> >> >Also, I believe Flavio had a comment on the previous version of this >> >patch. Would it be enough to use setpriority(2)? >> I sent out my comments in another mail. I agree to Flavio's suggestion >> as this seems less dangerous and is guaranteed to work even in case of >> misconfiguration. I tested this approach and have a concern with >setpriority(). >> >> To apply the new nice value to the thread, thread id is needed and due >> to absence of glibc wrapper for gettid, I have to use syscall(SYS_gettid). I >want to know if this is acceptable in OVS or better way to handle this? >> >> Void ovs_numa_thread_setpriority(int nice OVS_UNUSED) { >> #if defined(__linux__) && defined(SYS_gettid) >> pid_t tid = syscall(SYS_gettid); >> err = setpriority(PRIO_PROCESS, tid, nice); >> >> #endif >> } > >I don't know a better way to implement this and it seems ovs-numa.c already >has some ifdefs specific to linux. > >Do you know if this problem happen on BSD? I don't know if this is a problem on BSD. I searched a bit and found BSD code using "syscall(SYS_thr_self, &tid)" to retrieve the tid. > > >> Without priority patch: >> >> $ ps -eLo tid,pri,psr,comm | grep -e lcore -e revalidator -e ovs-vswitchd -e >pmd >> 22509 19 4 ovs-vswitchd >> 22512 19 5 lcore-slave-5 >> 22513 19 6 lcore-slave-6 >> 22514 19 7 lcore-slave-7 >> 22589 19 4 revalidator37 >> 22590 19 4 revalidator52 >> 22591 19 4 revalidator42 >> 22592 19 4 revalidator38 >> 22593 19 4 revalidator39 >> 22594 19 4 revalidator45 >> 22595 19 4 revalidator53 >> 22596 19 4 revalidator54 >> 22598 19 4 pmd61[Default priority] >> >> With priority patch: >> >> $ ps -eLo tid,pri,psr,comm | grep -e lcore -e revalidator -e ovs-vswitchd -e >pmd >> 24879 19 4 ovs-vswitchd >> 24881 19 5 lcore-slave-5 >> 24882 19 6 lcore-slave-6 >> 24883 19 7 lcore-slave-7 >> 24951 19 4 revalidator55 >> 24952 19 4 revalidator37 >> 24953 19 4 revalidator52 >> 24954 19 4 revalidator42 >> 24955 19 4 revalidator38 >> 24956 19 4 revalidator39 >> 24957 19 4 revalidator45 >> 24958 19 4 revalidator53 >> 24964 39 4 pmd61 [Higher priority set] > >Looks good, so if you affinity your bash to the CPU running >pmd61 thread, are you able to use it? Yes, I tested this case and there seems to be no problem here. Regards, Bhanu Prakash. > >Thanks, >-- >fbl ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH V6] netdev-dpdk: Set pmd thread priority
> >2016-08-18 14:20 GMT-07:00 Bodireddy, Bhanuprakash >: >>-Original Message- >>From: Flavio Leitner [mailto:f...@sysclose.org] >>Sent: Thursday, August 18, 2016 2:15 PM >>To: Bodireddy, Bhanuprakash >>Cc: Daniele Di Proietto ; dev@openvswitch.org >>Subject: Re: [ovs-dev] [PATCH V6] netdev-dpdk: Set pmd thread priority >> >>On Tue, Aug 16, 2016 at 02:30:04PM +, Bodireddy, Bhanuprakash wrote: >>> >-Original Message- >>> >From: Daniele Di Proietto [mailto:diproiet...@ovn.org] >>> >Sent: Tuesday, August 16, 2016 1:44 AM >>> >To: Bodireddy, Bhanuprakash >>> >Cc: dev@openvswitch.org; Flavio Leitner >>> >Subject: Re: [PATCH V6] netdev-dpdk: Set pmd thread priority >>> > >>> >I found a crash if apply this patch, "dpdk-lcore-mask" is not set and >>> >"-c 0x1" is passed to "dpdk-extra". >>> My bad, I didn't test with dpdk-extra options. I see that the crash was due >to >>strtol. >>> >>> >Also, I believe Flavio had a comment on the previous version of this >>> >patch. Would it be enough to use setpriority(2)? >>> I sent out my comments in another mail. I agree to Flavio's suggestion >>> as this seems less dangerous and is guaranteed to work even in case of >>> misconfiguration. I tested this approach and have a concern with >>setpriority(). >>> >>> To apply the new nice value to the thread, thread id is needed and due >>> to absence of glibc wrapper for gettid, I have to use syscall(SYS_gettid). I >>want to know if this is acceptable in OVS or better way to handle this? >>> >>> Void ovs_numa_thread_setpriority(int nice OVS_UNUSED) { >>> #if defined(__linux__) && defined(SYS_gettid) >>> pid_t tid = syscall(SYS_gettid); >>> err = setpriority(PRIO_PROCESS, tid, nice); >>> >>> #endif >>> } >> >>I don't know a better way to implement this and it seems ovs-numa.c >already >>has some ifdefs specific to linux. >> >>Do you know if this problem happen on BSD? >I don't know if this is a problem on BSD. I searched a bit and found BSD code >using "syscall(SYS_thr_self, &tid)" >to retrieve the tid. > > >The dummy ovs-numa works (and thus compile) everywhere, for pmd tests, >but the module only works on linux. >I think it's fine to implement something only for linux and return >EOPNOTSUPP (only if dummy is not enabled) for everything else. > >I think passing 0 as thread_id to setpriority() changes the current thread >priority, so there's probably no need for SYS_gettid. This is helpful, passing 0 for thread_id worked. > >> >> >>> Without priority patch: >>> >>> $ ps -eLo tid,pri,psr,comm | grep -e lcore -e revalidator -e ovs-vswitchd -e >>pmd >>> 22509 19 4 ovs-vswitchd >>> 22512 19 5 lcore-slave-5 >>> 22513 19 6 lcore-slave-6 >>> 22514 19 7 lcore-slave-7 >>> 22589 19 4 revalidator37 >>> 22590 19 4 revalidator52 >>> 22591 19 4 revalidator42 >>> 22592 19 4 revalidator38 >>> 22593 19 4 revalidator39 >>> 22594 19 4 revalidator45 >>> 22595 19 4 revalidator53 >>> 22596 19 4 revalidator54 >>> 22598 19 4 pmd61 [Default priority] >>> >>> With priority patch: >>> >>> $ ps -eLo tid,pri,psr,comm | grep -e lcore -e revalidator -e ovs-vswitchd -e >>pmd >>> 24879 19 4 ovs-vswitchd >>> 24881 19 5 lcore-slave-5 >>> 24882 19 6 lcore-slave-6 >>> 24883 19 7 lcore-slave-7 >>> 24951 19 4 revalidator55 >>> 24952 19 4 revalidator37 >>> 24953 19 4 revalidator52 >>> 24954 19 4 revalidator42 >>> 24955 19 4 revalidator38 >>> 24956 19 4 revalidator39 >>> 24957 19 4 revalidator45 >>> 24958 19 4 revalidator53 >>> 24964 39 4 pmd61 [Higher priority set] >> >>Looks good, so if you affinity your bash to the CPU running >>pmd61 thread, are you able to use it? >Yes, I tested this case and there seems to be no problem here. > >So there's no need to make it exclusive with dpdk-lcore-mask anymore, right? Yes, I have sent out another version of patch which doesn't change any of the existing functionality but lower the nice Value of the pmd thread to -20. Regards, Bhanu Prakash. ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] Hugepages allocation
>-Original Message- >From: dev [mailto:dev-boun...@openvswitch.org] On Behalf Of Kapil >Adhikesavalu >Sent: Monday, October 3, 2016 12:07 PM >To: dev@openvswitch.org; disc...@openvswitch.org >Subject: [ovs-dev] Hugepages allocation > >Hi, > >1. in INSTALL.DPDK.md, huge page size is recommended as 2MB pages. Earlier >i remember seeing some reference to use 1GB huge pages. is there any >noticeable performance improvement in using 1G hugepages vs 2M pages ? We did some benchmarks and found that there isn't any significant performance difference using 2MB hugepages over 1GB pages. >2. is there a way to allocate 1G hugepages using sysctl. Default hugepage size >is 2M and i didn't find any way to modify it using sysctl. is grub config the >only >way to allocate 1GB pages ? Yes, for this one has to build kernel with CONFIG_CMA. In case the kernel is built with this, one can allocate hugepages at run time as below echo N > /sys/devices/system/node/nodeX/hugepages/hugepages-1048576kB/nr_hugepages >3. Is there any difference in allocating huge pages run time vs >persistent(/etc/sysctl.d/hugepages.conf) ? will the hugepages allocated at >runtime will be fragmented or something? The only advantage of allocating hugepages at boot time is you are sure to get the requested pages as the memory isn't fragmented. if memory is too fragmented the chances are that fewer pages would be allocated by kernel than requested. > >Regards >Kapil. >___ >dev mailing list >dev@openvswitch.org >http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 08/12] dpif-netdev: Reorder elements in dp_netdev_port structure.
>-Original Message- >From: Jarno Rajahalme [mailto:ja...@ovn.org] >Sent: Friday, October 7, 2016 10:11 PM >To: Bodireddy, Bhanuprakash >Cc: dev@openvswitch.org >Subject: Re: [ovs-dev] [PATCH 08/12] dpif-netdev: Reorder elements in >dp_netdev_port structure. > >Would equivalent packing be achieved by moving the line down before the >bool instead? If yes, it would be preferable. Absolutely yes, I would do this in v2. Regards, Bhanu Prakash. > >Acked-by: Jarno Rajahalme > >> On Oct 7, 2016, at 9:17 AM, Bhanuprakash Bodireddy > wrote: >> >> By reordering the data elements in dp_netdev_port structure, pad bytes >> can be reduced and there by saving a cache line. >> >> Before: structure size:136, holes:3, sum padbytes:15, cachelines:3 >> After: structure size:128, holes:1, sum padbytes:7, cachelines:2 >> >> Signed-off-by: Bhanuprakash Bodireddy >> >> Signed-off-by: Antonio Fischetti >> --- >> lib/dpif-netdev.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index >> dfc9cbd..262f4de 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -284,10 +284,10 @@ struct dp_netdev_rxq { >> /* A port in a netdev-based datapath. */ struct dp_netdev_port { >> odp_port_t port_no; >> +unsigned n_rxq; /* Number of elements in 'rxq' */ >> struct netdev *netdev; >> struct hmap_node node; /* Node in dp_netdev's 'ports'. */ >> struct netdev_saved_flags *sf; >> -unsigned n_rxq; /* Number of elements in 'rxq' */ >> struct dp_netdev_rxq *rxqs; >> bool dynamic_txqs; /* If true XPS will be used. */ >> unsigned *txq_used; /* Number of threads that uses each tx queue. >*/ >> -- >> 2.4.11 >> >> ___ >> dev mailing list >> dev@openvswitch.org >> http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 07/12] dpif-netdev: Cache align netdev_flow_keys.
>-Original Message- >From: Daniele Di Proietto [mailto:diproiet...@ovn.org] >Sent: Friday, October 7, 2016 11:46 PM >To: Bodireddy, Bhanuprakash >Cc: dev@openvswitch.org; Jarno Rajahalme >Subject: Re: [ovs-dev] [PATCH 07/12] dpif-netdev: Cache align >netdev_flow_keys. > > > >2016-10-07 14:10 GMT-07:00 Jarno Rajahalme : > >> On Oct 7, 2016, at 9:17 AM, Bhanuprakash Bodireddy > wrote: >> >> Aligning the 'keys' array seems to positively impact performance. >> >> Signed-off-by: Bhanuprakash Bodireddy > >> Signed-off-by: Antonio Fischetti >> --- >> lib/dpif-netdev.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >> index d0bb191..dfc9cbd 100644 >> --- a/lib/dpif-netdev.c >> +++ b/lib/dpif-netdev.c >> @@ -4157,7 +4157,7 @@ dp_netdev_input__(struct >dp_netdev_pmd_thread *pmd, >> /* Sparse or MSVC doesn't like variable length array. */ >> enum { PKT_ARRAY_SIZE = NETDEV_MAX_BURST }; >> #endif >> - struct netdev_flow_key keys[PKT_ARRAY_SIZE]; >> + struct netdev_flow_key keys[PKT_ARRAY_SIZE] >__attribute__((aligned(64))); > >Due to compiler compatibility you must use OVS_ALIGNED_VAR(64) instead. > >I would also use the CACHE_LINE_SIZE define, instead of 64 Agree, I would change to OVS_ALIGNED_VAR(CACHE_LINE_SIZE) when I send v2. Regards, Bhanu Prakash. >Thanks, >Daniele > >> struct packet_batch_per_flow batches[PKT_ARRAY_SIZE]; >> long long now = time_msec(); >> size_t newcnt, n_batches, i; >> -- >> 2.4.11 >> >> ___ >> dev mailing list >> dev@openvswitch.org >> http://openvswitch.org/mailman/listinfo/dev > >___ >dev mailing list >dev@openvswitch.org >http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 04/12] hash: Skip invoking mhash_add__() with zero input.
>-Original Message- >From: Jarno Rajahalme [mailto:ja...@ovn.org] >Sent: Friday, October 7, 2016 10:09 PM >To: Bodireddy, Bhanuprakash >Cc: dev@openvswitch.org >Subject: Re: [ovs-dev] [PATCH 04/12] hash: Skip invoking mhash_add__() with >zero input. > > >> On Oct 7, 2016, at 9:17 AM, Bhanuprakash Bodireddy > wrote: >> >> mhash_add__() is expensive and should be only called with valid input. >> This patch will validate the input before invoking the mhash_add__ and >> there by saving some cpu cycles. >> >> Signed-off-by: Bhanuprakash Bodireddy >> >> Signed-off-by: Antonio Fischetti >> --- >> lib/hash.h | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/lib/hash.h b/lib/hash.h >> index 114a419..9bfebdb 100644 >> --- a/lib/hash.h >> +++ b/lib/hash.h >> @@ -70,7 +70,7 @@ static inline uint32_t mhash_add__(uint32_t hash, >> uint32_t data) >> >> static inline uint32_t mhash_add(uint32_t hash, uint32_t data) { >> -hash = mhash_add__(hash, data); >> +hash = data ? mhash_add__(hash, data): hash; > >IMO the zero check is best placed in the function mhash_add__(), where it is >also more evident that zero-valued data would not change the hash anyway. >Maybe a comment to that effect would be also nice? Agree, will do this in v2. Bhanu Prakash. > >> hash = hash_rot(hash, 13); >> return hash * 5 + 0xe6546b64; >> } >> -- >> 2.4.11 >> >> ___ >> dev mailing list >> dev@openvswitch.org >> http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 09/12] dpif: Reorder elements in dpif_upcall structure.
Hello jarno, Thanks for the feedback, while reordering the members of dpif_upcall, I had to deviate from standards due to below reason. The dp_packet member has mbuf as first member that starts at new cache line creating hole of size 60 bytes between dpif_upcall_type and dp_packet as pointed below. struct dpif_upcall { enum dpif_upcall_type type; --> 60 bytes hole /* --- cacheline 1 boundary (64 bytes) --- */ struct dp_packet { struct rte_mbuf { /* typedef MARKER */ void * cacheline0[0]; /* 64 0 */ } struct nlattr *key; . . } I tried to pack this hole by moving other members in to this space. Regards, Bhanu Prakash. >-Original Message- >From: Jarno Rajahalme [mailto:ja...@ovn.org] >Sent: Friday, October 7, 2016 10:11 PM >To: Bodireddy, Bhanuprakash >Cc: dev@openvswitch.org >Subject: Re: [ovs-dev] [PATCH 09/12] dpif: Reorder elements in dpif_upcall >structure. > >CodingStyle.md instructs to group struct members into related groups. Also, >changing the relative order of pointers should not make any difference. Could >you achieve the same by reordering just the members above the >‘DPIF_UC_ACTION only.’ comment? > > Jarno > >> On Oct 7, 2016, at 9:17 AM, Bhanuprakash Bodireddy > wrote: >> >> By reordering the data elements in dpif_upcall structure, pad bytes >> can be reduced and also a cache line. >> >> Before: structure size:768, holes:1, sum padbytes:60, cachelines:12 >> After: structure size:656, holes:1, sum padbytes:4, cachelines:11 >> >> Signed-off-by: Bhanuprakash Bodireddy >> >> Signed-off-by: Antonio Fischetti >> --- >> lib/dpif.h | 17 + >> 1 file changed, 9 insertions(+), 8 deletions(-) >> >> diff --git a/lib/dpif.h b/lib/dpif.h >> index a7c5097..4a4bb3d 100644 >> --- a/lib/dpif.h >> +++ b/lib/dpif.h >> @@ -779,17 +779,18 @@ const char *dpif_upcall_type_to_string(enum >> dpif_upcall_type); struct dpif_upcall { >> /* All types. */ >> enum dpif_upcall_type type; >> -struct dp_packet packet; /* Packet data. */ >> -struct nlattr *key; /* Flow key. */ >> -size_t key_len; /* Length of 'key' in bytes. */ >> -ovs_u128 ufid; /* Unique flow identifier for 'key'. */ >> -struct nlattr *mru; /* Maximum receive unit. */ >> -struct nlattr *cutlen; /* Number of bytes shrink from the end. */ >> >> /* DPIF_UC_ACTION only. */ >> -struct nlattr *userdata;/* Argument to >OVS_ACTION_ATTR_USERSPACE. */ >> -struct nlattr *out_tun_key;/* Output tunnel key. */ >> struct nlattr *actions;/* Argument to OVS_ACTION_ATTR_USERSPACE. >*/ >> +struct nlattr *out_tun_key;/* Output tunnel key. */ >> +struct nlattr *userdata;/* Argument to >OVS_ACTION_ATTR_USERSPACE. */ >> + >> +struct nlattr *cutlen; /* Number of bytes shrink from the end. */ >> +struct nlattr *mru; /* Maximum receive unit. */ >> +ovs_u128 ufid; /* Unique flow identifier for 'key'. */ >> +struct dp_packet packet; /* Packet data. */ >> +struct nlattr *key; /* Flow key. */ >> +size_t key_len; /* Length of 'key' in bytes. */ >> }; >> >> /* A callback to notify higher layer of dpif about to be purged, so >> that >> -- >> 2.4.11 >> >> ___ >> dev mailing list >> dev@openvswitch.org >> http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 06/12] cmap: Remove prefetching in cmap_find_batch().
>-Original Message- >From: Jarno Rajahalme [mailto:ja...@ovn.org] >Sent: Friday, October 7, 2016 10:10 PM >To: Bodireddy, Bhanuprakash >Cc: dev@openvswitch.org >Subject: Re: [ovs-dev] [PATCH 06/12] cmap: Remove prefetching in >cmap_find_batch(). > > >> On Oct 7, 2016, at 9:17 AM, Bhanuprakash Bodireddy > wrote: >> >> prefetching the data in to the caches isn't improving the performance >> in cmap_find_batch(). Moreover its found that there is slight >> improvement in performance with out prefetching. >> > >I recall doing some performance testing for this earlier, but have no records >of >the system or other circumstances. Is it likely that this is at least somewhat >system and test case dependent? Agree with you, we are testing this on Haswell CPUs. Also the modified batch size may be a >factor here. I verified this patch with batch size '16' and still could get better performance. So the improvement may not be due to increasing the batch size to 32 as done in first patch of the series. Anyway, I don’t currently have any proof to the contrary, so I >have no problem removing the prefetching. > >However, you should also update all the comments referring to prefetching. Sure, I will do this in v2. Regards, Bhanu Prakash. > > Jarno > >> This patch removes prefetching from cmap_find_batch(). >> >> Signed-off-by: Bhanuprakash Bodireddy >> >> Signed-off-by: Antonio Fischetti >> --- >> lib/cmap.c | 4 >> 1 file changed, 4 deletions(-) >> >> diff --git a/lib/cmap.c b/lib/cmap.c >> index 8c7312d..4c34bda 100644 >> --- a/lib/cmap.c >> +++ b/lib/cmap.c >> @@ -397,7 +397,6 @@ cmap_find_batch(const struct cmap *cmap, >unsigned long map, >> ULLONG_FOR_EACH_1(i, map) { >> h1s[i] = rehash(impl, hashes[i]); >> b1s[i] = &impl->buckets[h1s[i] & impl->mask]; >> -OVS_PREFETCH(b1s[i]); >> } >> /* Lookups, Round 1. Only look up at the first bucket. */ >> ULLONG_FOR_EACH_1(i, map) { >> @@ -413,13 +412,11 @@ cmap_find_batch(const struct cmap *cmap, >unsigned long map, >> if (!node) { >> /* Not found (yet); Prefetch the 2nd bucket. */ >> b2s[i] = &impl->buckets[other_hash(h1s[i]) & impl->mask]; >> -OVS_PREFETCH(b2s[i]); >> c1s[i] = c1; /* We may need to check this after Round 2. */ >> continue; >> } >> /* Found. */ >> ULLONG_SET0(map, i); /* Ignore this on round 2. */ >> -OVS_PREFETCH(node); >> nodes[i] = node; >> } >> /* Round 2. Look into the 2nd bucket, if needed. */ @@ -453,7 >> +450,6 @@ cmap_find_batch(const struct cmap *cmap, unsigned long map, >> continue; >> } >> found: >> -OVS_PREFETCH(node); >> nodes[i] = node; >> } >> return result; >> -- >> 2.4.11 >> >> ___ >> dev mailing list >> dev@openvswitch.org >> http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 00/12] Improve performance of OVS-DPDK classifier
Thanks Daniele, Jarno for reviewing and testing the patch series. We are working on the changes as suggested and will send out v2 soon. Regards, Bhanu Prakash. From: Daniele Di Proietto [mailto:diproiet...@ovn.org] Sent: Friday, October 7, 2016 11:45 PM To: Bodireddy, Bhanuprakash Cc: dev@openvswitch.org; Jarno Rajahalme Subject: Re: [ovs-dev] [PATCH 00/12] Improve performance of OVS-DPDK classifier Thanks for posting this. I quickly tried this with some simple flow tables and it seems to be beneficial. I agree with Jarno's comments and I posted a couple more to the single patches. I see two signoff, but a single author, should you add a Co-authored-by, perhaps? Other than these I am fine with the series. Could you maybe send another version with the suggested changes, please? Thanks, Daniele 2016-10-07 9:17 GMT-07:00 Bhanuprakash Bodireddy mailto:bhanuprakash.bodire...@intel.com>>: This patch series is aimed at improving the performance of OVS-DPDK dpcls. With few thousands flows installed, the EMC becomes inefficient due to thrashing and the bottleneck moves to the dpcls. In EMC disabled case, through VTune we found that significant performance degradation is due to LLC thrashing, memory latency, machine clears and expensive hash computation. This first patch-set improves the dpcls performance by 15% (~1 Mpps) when EMC is disabled and OVS-DPDK built with CFLAGS="-O2 -g". Bhanuprakash Bodireddy (12): dpcls: Use 32 packet batches for lookups. Comment: ~120k performance throughput improvement. flow: Add comments to mf_get_next_in_map() Comment: Add comments to function. flow: Skip invoking expensive count_1bits() with zero input. Comment: ~630k performance throughput improvement. hash: Skip invoking mhash_add__() with zero input. Comment: ~150k performance throughput improvement. dpif-netdev: Clear flow batches inside packet_batch_execute. Comment: ~50k performance throughput improvement with multiple batches test case. cmap: Remove prefetching in cmap_find_batch(). Comment: ~39k performance throughput improvement. dpif-netdev: Cache align netdev_flow_keys. Comment: ~170k performance throughput improvement in EMC enabled case. dpif-netdev: Reorder elements in dp_netdev_port structure. dpif: Reorder elements in dpif_upcall structure. ovsdb: Reorder elements in ovsdb_table_schema structure. netlink-socket: Reorder elements in nl_dump structure. timeval: Reorder elements in clock structure. Comment: Reorder memeber variables of the structures to reduce pad bytes and there by memory footprint. lib/cmap.c | 4 -- lib/dpif-netdev.c| 118 +-- lib/dpif.h | 17 lib/flow.h | 29 +++-- lib/hash.h | 2 +- lib/netlink-socket.h | 6 +-- lib/timeval.c| 4 +- ovsdb/table.h| 4 +- 8 files changed, 91 insertions(+), 93 deletions(-) -- 2.4.11 ___ dev mailing list dev@openvswitch.org<mailto:dev@openvswitch.org> http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH 09/12] dpif: Reorder elements in dpif_upcall structure.
>-Original Message- >From: Jarno Rajahalme [mailto:ja...@ovn.org] >Sent: Monday, October 10, 2016 9:01 PM >To: Bodireddy, Bhanuprakash >Cc: dev@openvswitch.org >Subject: Re: [ovs-dev] [PATCH 09/12] dpif: Reorder elements in dpif_upcall >structure. > >How about making the ‘dp_packet’ member the first member and adding a >comment that this should be first? This makes sense and by doing so we can avoid the hole, will make this change in V2. - Bhanu Prakash. > > Jarno > >> On Oct 10, 2016, at 8:42 AM, Bodireddy, Bhanuprakash > wrote: >> >> Hello jarno, >> >> Thanks for the feedback, while reordering the members of dpif_upcall, I had >to deviate from standards due to below reason. >> The dp_packet member has mbuf as first member that starts at new cache >line creating hole of size 60 bytes between dpif_upcall_type and dp_packet as >pointed below. >> >> struct dpif_upcall { >>enum dpif_upcall_type type; >> >> --> 60 bytes hole >> >>/* --- cacheline 1 boundary (64 bytes) --- */ >>struct dp_packet { >>struct rte_mbuf { >>/* typedef MARKER */ void * cacheline0[0]; /* >>64 0 */ >> >> } >> struct nlattr *key; >> . >> . >> } >> >> I tried to pack this hole by moving other members in to this space. >> >> Regards, >> Bhanu Prakash. >> >>> -Original Message- >>> From: Jarno Rajahalme [mailto:ja...@ovn.org] >>> Sent: Friday, October 7, 2016 10:11 PM >>> To: Bodireddy, Bhanuprakash >>> Cc: dev@openvswitch.org >>> Subject: Re: [ovs-dev] [PATCH 09/12] dpif: Reorder elements in >>> dpif_upcall structure. >>> >>> CodingStyle.md instructs to group struct members into related groups. >>> Also, changing the relative order of pointers should not make any >>> difference. Could you achieve the same by reordering just the members >>> above the ‘DPIF_UC_ACTION only.’ comment? >>> >>> Jarno >>> >>>> On Oct 7, 2016, at 9:17 AM, Bhanuprakash Bodireddy >>> wrote: >>>> >>>> By reordering the data elements in dpif_upcall structure, pad bytes >>>> can be reduced and also a cache line. >>>> >>>> Before: structure size:768, holes:1, sum padbytes:60, cachelines:12 >>>> After: structure size:656, holes:1, sum padbytes:4, cachelines:11 >>>> >>>> Signed-off-by: Bhanuprakash Bodireddy >>>> >>>> Signed-off-by: Antonio Fischetti >>>> --- >>>> lib/dpif.h | 17 + >>>> 1 file changed, 9 insertions(+), 8 deletions(-) >>>> >>>> diff --git a/lib/dpif.h b/lib/dpif.h index a7c5097..4a4bb3d 100644 >>>> --- a/lib/dpif.h >>>> +++ b/lib/dpif.h >>>> @@ -779,17 +779,18 @@ const char *dpif_upcall_type_to_string(enum >>>> dpif_upcall_type); struct dpif_upcall { >>>>/* All types. */ >>>>enum dpif_upcall_type type; >>>> -struct dp_packet packet; /* Packet data. */ >>>> -struct nlattr *key; /* Flow key. */ >>>> -size_t key_len; /* Length of 'key' in bytes. */ >>>> -ovs_u128 ufid; /* Unique flow identifier for 'key'. */ >>>> -struct nlattr *mru; /* Maximum receive unit. */ >>>> -struct nlattr *cutlen; /* Number of bytes shrink from the end. */ >>>> >>>>/* DPIF_UC_ACTION only. */ >>>> -struct nlattr *userdata;/* Argument to >>> OVS_ACTION_ATTR_USERSPACE. */ >>>> -struct nlattr *out_tun_key;/* Output tunnel key. */ >>>>struct nlattr *actions;/* Argument to >OVS_ACTION_ATTR_USERSPACE. >>> */ >>>> +struct nlattr *out_tun_key;/* Output tunnel key. */ >>>> +struct nlattr *userdata;/* Argument to >>> OVS_ACTION_ATTR_USERSPACE. */ >>>> + >>>> +struct nlattr *cutlen; /* Number of bytes shrink from the end. */ >>>> +struct nlattr *mru; /* Maximum receive unit. */ >>>> +ovs_u128 ufid; /* Unique flow identifier for 'key'. */ >>>> +struct dp_packet packet; /* Packet data. */ >>>> +struct nlattr *key; /* Flow key. */ >>>> +size_t key_len; /* Length of 'key' in bytes. */ >>>> }; >>>> >>>> /* A callback to notify higher layer of dpif about to be purged, so >>>> that >>>> -- >>>> 2.4.11 >>>> >>>> ___ >>>> dev mailing list >>>> dev@openvswitch.org >>>> http://openvswitch.org/mailman/listinfo/dev >> ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v3 01/12] dpcls: Use 32 packet batches for lookups.
>-Original Message- >From: Daniele Di Proietto [mailto:diproiet...@ovn.org] >Sent: Tuesday, October 18, 2016 4:04 AM >To: Bodireddy, Bhanuprakash >Cc: dev@openvswitch.org >Subject: Re: [ovs-dev] [PATCH v3 01/12] dpcls: Use 32 packet batches for >lookups. > > > >2016-10-14 7:37 GMT-07:00 Bhanuprakash Bodireddy >: >This patch increases the number of packets processed in a batch during a >lookup from 16 to 32. Processing batches of 32 packets improves >performance and also one of the internal loops can be avoided here. > >Signed-off-by: Antonio Fischetti >Co-authored-by: Bhanuprakash Bodireddy > >Signed-off-by: Bhanuprakash Bodireddy > > >I guess Co-authored-by should have Antonio? My mistake, I am sending out the remaining 4 patches separately with appropriate tags. >Also, the (already existing code) is not trivial, I'd like to take another >look at it Sure, let us know if you have more comments. Regards, Bhanuprakash. >Thanks, >Daniele > >--- > lib/dpif-netdev.c | 110 ++--- >- > 1 file changed, 45 insertions(+), 65 deletions(-) > >diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c >index eb9f764..0a4f338 100644 >--- a/lib/dpif-netdev.c >+++ b/lib/dpif-netdev.c >@@ -4985,23 +4985,21 @@ dpcls_lookup(struct dpcls *cls, const struct >netdev_flow_key keys[], > int *num_lookups_p) > { > /* The received 'cnt' miniflows are the search-keys that will be processed >- * in batches of 16 elements. N_MAPS will contain the number of these >- * 16-elements batches. i.e. for 'cnt' = 32, N_MAPS will be 2. The batch >- * size 16 was experimentally found faster than 8 or 32. */ >- typedef uint16_t map_type; >+ * to find a matching entry into the available subtables. >+ * The number of bits in map_type is equal to NETDEV_MAX_BURST. */ >+ typedef uint32_t map_type; > #define MAP_BITS (sizeof(map_type) * CHAR_BIT) >+ BUILD_ASSERT_DECL(MAP_BITS >= NETDEV_MAX_BURST); > >-#if !defined(__CHECKER__) && !defined(_WIN32) >- const int N_MAPS = DIV_ROUND_UP(cnt, MAP_BITS); >-#else >- enum { N_MAPS = DIV_ROUND_UP(NETDEV_MAX_BURST, MAP_BITS) }; >-#endif >- map_type maps[N_MAPS]; > struct dpcls_subtable *subtable; > >- memset(maps, 0xff, sizeof maps); >- if (cnt % MAP_BITS) { >- maps[N_MAPS - 1] >>= MAP_BITS - cnt % MAP_BITS; /* Clear extra bits. >*/ >+ map_type keys_map = TYPE_MAXIMUM(map_type); >+ map_type found_map; >+ uint32_t hashes[MAP_BITS]; >+ const struct cmap_node *nodes[MAP_BITS]; >+ >+ if (cnt != NETDEV_MAX_BURST) { >+ keys_map >>= NETDEV_MAX_BURST - cnt; /* Clear extra bits. */ > } > memset(rules, 0, cnt * sizeof *rules); > >@@ -5015,61 +5013,43 @@ dpcls_lookup(struct dpcls *cls, const struct >netdev_flow_key keys[], > * search-key, the search for that key can stop because the rules are > * non-overlapping. */ > PVECTOR_FOR_EACH (subtable, &cls->subtables) { >- const struct netdev_flow_key *mkeys = keys; >- struct dpcls_rule **mrules = rules; >- map_type remains = 0; >- int m; >- >- BUILD_ASSERT_DECL(sizeof remains == sizeof *maps); >- >- /* Loops on each batch of 16 search-keys. */ >- for (m = 0; m < N_MAPS; m++, mkeys += MAP_BITS, mrules += >MAP_BITS) { >- uint32_t hashes[MAP_BITS]; >- const struct cmap_node *nodes[MAP_BITS]; >- unsigned long map = maps[m]; >- int i; >- >- if (!map) { >- continue; /* Skip empty maps. */ >- } >- >- /* Compute hashes for the remaining keys. Each search-key is >- * masked with the subtable's mask to avoid hashing the wildcarded >- * bits. */ >- ULLONG_FOR_EACH_1(i, map) { >- hashes[i] = netdev_flow_key_hash_in_mask(&mkeys[i], >- &subtable->mask); >- } >- /* Lookup. */ >- map = cmap_find_batch(&subtable->rules, map, hashes, nodes); >- /* Check results. When the i-th bit of map is set, it means that >a >- * set of nodes with a matching hash value was found for the i-th >- * search-key. Due to possible hash collisions we need to check >- * which of the found rules, if any, really matches our masked >- * search-key. */ >- ULLONG_FOR_EACH_1(i, map) { >- struct dpcls_rule *rule; >- >- CMAP_NODE_FOR_EACH (rule, cmap
Re: [ovs-dev] [PATCH v3 06/12] cmap: Remove prefetching in cmap_find_batch().
>-Original Message- >From: Daniele Di Proietto [mailto:diproiet...@ovn.org] >Sent: Tuesday, October 18, 2016 4:07 AM >To: Bodireddy, Bhanuprakash >Cc: dev@openvswitch.org >Subject: Re: [ovs-dev] [PATCH v3 06/12] cmap: Remove prefetching in >cmap_find_batch(). > > > >2016-10-14 7:37 GMT-07:00 Bhanuprakash Bodireddy >: >prefetching the data in to the caches isn't improving the performance in >cmap_find_batch(). Moreover its found that there is slight improvement >in performance with out prefetching. > >This patch removes prefetching from cmap_find_batch(). > >Signed-off-by: Bhanuprakash Bodireddy > >Co-authored-by: Antonio Fischetti >Signed-off-by: Antonio Fischetti > >I tested this patch in isolation and on my system I didn't notice any >improvements for a single flow (with EMC disabled), I noticed a slight drop >instead with 128 flows in the classifier. >Probably this is due to the fact that I didn't apply yet the first patch of the >series (the one that increases the batch to 32), so I guess I'll defer this >patch >until we can apply the rest of the series. >Also, if you guys see an improvement (and since you got some evidence with >VTune), I don't think it matters that on one particular system (mine) I can't >see >any benefit. I am testing this on haswell and VTune confirmed our observation. Also prefetching Is done at 4 places in cmap_find_batch() and at two places the prefetching is done just before the data is accessed. As prefetch instruction has some overhead, prefetching should be done well enough in advance to have performance gains. Also prefetching too earlier can has negative effect as the prefetched data can be flushed by other access. We played around a bit and found removing the prefetching doesn't impact the performance and hence submitted this patch. Regards, Bhanu Prakash. > >Thanks, >Daniele > > ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [PATCH v3 00/12] Improve performance of OVS-DPDK classifier.
Thanks daniele. Will send on the remaining patches with appropriate tags. Regards, Bhanu Prakash. >-Original Message- >From: Daniele Di Proietto [mailto:diproiet...@ovn.org] >Sent: Tuesday, October 18, 2016 4:04 AM >To: Bodireddy, Bhanuprakash >Cc: dev@openvswitch.org >Subject: Re: [ovs-dev] [PATCH v3 00/12] Improve performance of OVS-DPDK >classifier. > >Thanks for the series, I applied most of it to master. >I sent some comments on the few remaining patches. >Thanks again, >Daniele > >2016-10-14 7:37 GMT-07:00 Bhanuprakash Bodireddy >: >This patch series is aimed at improving the performance of OVS-DPDK >dpcls. > >With few thousand flows installed, the EMC becomes inefficient due >to thrashing and the bottleneck moves to the dpcls. In EMC disabled >case, through VTune we found that significant performance degradation >is due to LLC thrashing, memory latency, machine clears and expensive >hash computation. > >This first patch-set improves the dpcls performance by 15% (+1 Mpps) >when EMC is disabled and OVS-DPDK built with CFLAGS="-O2 -g". > >Bhanuprakash Bodireddy (12): > dpcls: Use 32 packet batches for lookups. > Comment: ~120k performance throughput improvement. > > flow: Add comments to mf_get_next_in_map(). > Comment: Add comments to the function. > > flow: Skip invoking expensive count_1bits() with zero input. > Comment: ~630k performance throughput improvement. > > hash: Skip invoking mhash_add__() with zero input. > Comment: ~150k performance throughput improvement. > > dpif-netdev: Add comments to dp_netdev_input__(). > Comment: Add comments to the function. > > cmap: Remove prefetching in cmap_find_batch(). > Comment: ~39k performance throughput improvement. > > dpif-netdev: Cache align netdev_flow_keys. > Comment: ~170k performance throughput improvement in EMC enabled >case. > > dpif-netdev: Reorder elements in dp_netdev_port structure. > dpif: Reorder elements in dpif_upcall structure. > ovsdb: Reorder elements in ovsdb_table_schema structure. > netlink-socket: Reorder elements in nl_dump structure. > timeval: Reorder elements in clock structure. > Comment: Reorder memeber variables of the structures to reduce > pad bytes and there by the memory footprint. > > lib/cmap.c | 8 +--- > lib/dpif-netdev.c | 123 +++ > lib/dpif.h | 5 ++- > lib/flow.h | 47 +++- > lib/hash.h | 5 +++ > lib/netlink-socket.h | 6 +-- > lib/timeval.c | 4 +- > ovsdb/table.h | 4 +- > 8 files changed, 111 insertions(+), 91 deletions(-) > >-- >2.4.11 > >___ >dev mailing list >dev@openvswitch.org >http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [ovs dpdk] why all the ovs threads pinned to master lcore?
> >++ pidof ovs-vswitchd >+ ps -To tid,pid,psr,comm -p 25932 > TID PID PSR COMMAND >25932 25932 1 ovs-vswitchd >25934 25932 0 eal-intr-thread >25935 25932 1 dpdk_watchdog1 >25936 25932 2 vhost_thread2 >25937 25932 1 pdump-thread >25938 25932 8 urcu3 >25939 25932 2 ovs-vswitchd >25959 25932 1 ct_clean4 >26013 25932 3 handler18 >26014 25932 9 handler17 >26015 25932 3 handler20 >26016 25932 11 handler19 >26022 25932 4 handler21 >26028 25932 9 handler22 >26033 25932 10 handler23 >26037 25932 10 handler24 >26042 25932 9 revalidator25 >26045 25932 10 revalidator26 >26046 25932 0 revalidator27 >26054 25932 2 revalidator28 >26143 25932 5 pmd29 >26144 25932 4 pmd30 This is for the reason that 'dpdk-lcore-mask' hasn't been specified in the above case. If one specify dpdk-lcore-mask=, all the threads(revalidator/handler/ovs-vswitchd/urc) will automatically be pinned to the lowest core of the 'mask'. For example with 'ovs-vsctl --no-wait set Open_vSwitch . other_config:dpdk-lcore-mask=f00', I would expect the threads to float between cores 8 and 11, but all the threads get pinned to lowest core of the core-mask which is core 8. Hope this clears the confusion. $ ps -eLo tid,psr,comm | grep -e revalidator -e handler -e ovs -e pmd -e urc -e eal 98443 22 ovsdb-server 98454 8 ovs-vswitchd 98465 8 eal-intr-thread 98471 8 urcu3 98520 8 handler60 98521 8 handler59 98522 8 handler58 98523 8 handler57 98524 8 handler43 98525 8 handler44 98526 8 handler46 98527 8 handler47 98528 8 handler48 98529 8 handler41 98530 8 handler55 98531 8 handler49 98532 8 handler33 98533 8 handler40 98534 8 handler34 98535 8 handler35 98536 8 handler50 98537 8 handler36 98538 8 handler51 98539 8 handler56 98540 8 revalidator37 98541 8 revalidator52 98542 8 revalidator42 98543 8 revalidator38 98544 8 revalidator39 98545 8 revalidator45 98546 8 revalidator53 98547 8 revalidator54 98549 4 pmd61 Regards, Bhanu Prakash. > > >-- >Christian Ehrhardt >Software Engineer, Ubuntu Server >Canonical Ltd >___ >dev mailing list >dev@openvswitch.org >http://openvswitch.org/mailman/listinfo/dev ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev
Re: [ovs-dev] [ovs dpdk] why all the ovs threads pinned to master lcore?
> >sorry, I am still confused. >when I launched vswitchd with dpdk-init=false, all the threads changed their >pinned cores as time goes by. >The following output is from ovs version 2.6 with dpdk-init=false ps -To >tid,pid,psr,comm -p 5922 > TIDPID PSR COMMAND > 5922 5922 12 ovs-vswitchd > 5934 5922 24 urcu5 ==>does not follow main the ovs-vswitchd > 6917 5922 0 handler98 > 6918 5922 24 handler100 > 6919 5922 0 handler99 > 6920 5922 0 handler101 > 6921 5922 0 handler102 > 6922 5922 2 handler103 >6952 5922 0 revalidator133 > 6953 5922 10 revalidator134 > 6954 5922 38 revalidator135 ==>seems these threads can be pinned on any >cores, just like CPU auto scheduled > 6955 5922 26 revalidator136 > 6956 5922 40 revalidator137 > 6957 5922 22 revalidator138 > > >after somtime, >5922 5922 7 ovs-vswitchd ==>changed from core 12 to core 7 > 5934 5922 18 urcu5 > 6917 5922 0 handler98 > 6918 5922 0 handler100 > 6919 5922 0 handler99 > 6920 5922 0 handler101 > 6921 5922 0 handler102 > 6922 5922 0 handler103 >6952 5922 3 revalidator133 > 6953 5922 35 revalidator134 > 6954 5922 27 revalidator135 > 6955 5922 21 revalidator136 > 6956 5922 9 revalidator137 > 6957 5922 33 revalidator138 > > >When launch vswitchd with dpdk-init as true, all the handler threads and >revalidator threads are pinned to master lcore and can not change their >pinned cores when time goes by. >so here is my problem: >1. with the same ovs code, there is different phenomenon for ovs threads >with dpdk inited have different values? Yes, with dpdk_init=true, the dpdk initialization will be done and there is a mechanism to determine and pin the threads to the cores. If 'dpdk-lcore-mask' is specified the threads would be pinned to the lowest core of the mask. This is more dpdk library behavior as DPDK wants to use the lowest core of the mask as control core and the rest of the cores as packet forwarding cores. If one doesn't specifiy the dpdk-lcore-mask, the thread will not be pinned explicitly to any core. To check this run taskset on the respective threads to retrieve the affinity In this case you can see the threads are free to float on any core between 0 and 27. For example. $ ps -eLo tid,psr,comm | grep -e revalidator -e handler -e ovs -e pmd -e urc -e eal 99078 17 ovs-vswitchd 99137 11 handler33 99157 6 revalidator53 $taskset -cp 99078 pid 99078's current affinity list: 0-27 $ taskset -cp 99137 pid 99137's current affinity list: 0-27 $ taskset -cp 99157 pid 99157's current affinity list: 0-27 Hope this clears your confusion. Regards, Bhanu Prakash. >2.with dpdk-init=true, if all the other threads share the same logical core, >then >for the performance meaning, is it not friendly? > > > > > > > > > > >At 2016-10-25 01:39:13, "Kevin Traynor" wrote: >>On 10/24/2016 11:55 AM, ychen wrote: >>> hi, I am a freshman to ovs DPDK, when I tried to launch ovs with dpdk >>> inited, I found that all the ovs threads are pinned to master lcore, but I >can't find any code for setting the affinity of the specified thread. >> >>On older versions of OVS you you can set the affinity via the -c >>0x vswitchd dpdk cmd line arg. It will use the lsb only. For the >>latest versions, you can set this through OVSDB - alternatively you can >>not set it and by default the non-pmd threads will float on the cores >>that vswitchd runs on. >> >>> Here is my configuration: >>> lscpu >>> Architecture: x86_64 >>> CPU op-mode(s):32-bit, 64-bit >>> Byte Order:Little Endian >>> CPU(s):48 >>> On-line CPU(s) list: 0-47 >>> Thread(s) per core:2 >>> Core(s) per socket:12 >>> Socket(s): 2 >>> NUMA node(s): 2 >>> Vendor ID: GenuineIntel >>> CPU family:6 >>> Model: 63 >>> Stepping: 2 >>> CPU MHz: 2599.988 >>> BogoMIPS: 4600.75 >>> Virtualization:VT-x >>> L1d cache: 32K >>> L1i cache: 32K >>> L2 cache: 256K >>> L3 cache: 30720K >>> NUMA node0 CPU(s): >0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46 >>> NUMA node1 CPU(s): >1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31,33,35,37,39,41,43,45,47 >>> >>> >>> ovs-vsctl list open_vswitch >>> _uuid : 61d9066b-178e-4672-b8b3-9dcb5587565d >>> bridges : [cc2605fb-fcbb-4627-8834-080c43534119] >>> cur_cfg : 68 >>> datapath_types : [netdev, system] >>> db_version : [] >>> external_ids: {} >>> iface_types : [dpdk, dpdkr, dpdkvhostuser, dpdkvhostuserclient, >geneve, gre, internal, ipsec_gre, lisp, patch, stt, system, tap, vxlan] >>> manager_options : [] >>> next_cfg: 68 >>> other_config: {dpdk-init="true", dpdk-lcore-mask="0xf", dpdk-socket- >mem="1024,1024", pmd-cpu-mask="f0"} >
Re: [ovs-dev] [ovs dpdk] why all the ovs threads pinned to master lcore?
>-Original Message- >From: Aaron Conole [mailto:acon...@redhat.com] >Sent: Tuesday, October 25, 2016 2:36 PM >To: Bodireddy, Bhanuprakash >Cc: Christian Ehrhardt ; ychen >; dev@openvswitch.org >Subject: Re: [ovs-dev] [ovs dpdk] why all the ovs threads pinned to master >lcore? > >"Bodireddy, Bhanuprakash" writes: > >>> >>>++ pidof ovs-vswitchd >>>+ ps -To tid,pid,psr,comm -p 25932 >>> TID PID PSR COMMAND >>>25932 25932 1 ovs-vswitchd >>>25934 25932 0 eal-intr-thread >>>25935 25932 1 dpdk_watchdog1 >>>25936 25932 2 vhost_thread2 >>>25937 25932 1 pdump-thread >>>25938 25932 8 urcu3 >>>25939 25932 2 ovs-vswitchd >>>25959 25932 1 ct_clean4 >>>26013 25932 3 handler18 >>>26014 25932 9 handler17 >>>26015 25932 3 handler20 >>>26016 25932 11 handler19 >>>26022 25932 4 handler21 >>>26028 25932 9 handler22 >>>26033 25932 10 handler23 >>>26037 25932 10 handler24 >>>26042 25932 9 revalidator25 >>>26045 25932 10 revalidator26 >>>26046 25932 0 revalidator27 >>>26054 25932 2 revalidator28 >>>26143 25932 5 pmd29 >>>26144 25932 4 pmd30 >> >> This is for the reason that 'dpdk-lcore-mask' hasn't been specified in >> the above case. >> If one specify dpdk-lcore-mask=, all the >> threads(revalidator/handler/ovs-vswitchd/urc) will automatically be >> pinned to the lowest core of the 'mask'. >> >> For example with 'ovs-vsctl --no-wait set Open_vSwitch . >> other_config:dpdk-lcore-mask=f00', I would expect the threads to float >> between cores 8 and 11, but all the threads get pinned to lowest core >> of the core-mask which is core 8. Hope this clears the confusion. > >I'm sorry, I don't understand this behaviors. I don't see that this is >anything to >do with ovs, specifically. Meaning, when that option is specified, we pass it >to >the dpdk library directly, and take no actions as far as thread binding is >concerned. I have to admit, I've not delved that deeply into the dpdk side. Completely agree Aaron. This isn't anything to do with OVS. This is the current behavior of DPDK and more to do with how it parses the coremask. For the given coremask, DPDK assumes the lowest core of the mask as 'mastercore/control core' and pins the control threads to this core, leaving the remaining cores for actual packet forwarding. This behavior is reproducible with DPDK sample app. Regards, Bhanu Prakash. > >Do you have the output from the EAL logs? There are logs which state what >dpdk thinks it should have set for an affinity. Perhaps those shed some light? > >-Aaron ___ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev