FYI latest CSIT status re l2fib performance regression in 17.07. I expect the fix to get included in vpp 17.07 maintenance release.
In parallel, we owe the community an explanation that the observed performance degradation is due to adding missing mandatory L2 bridging functionality and this is the current cost of doing so, as no work comes for free. And that it is a one-off degradation, and not a bad trend that will impact "best-on-the-planet” network data plane performance properties of VPP. I will work with John Lo who owns this feature, and VPP data plane gurus on cc: to arrive to a satisfactory explanation to community. Hope this makes sense.. -Maciek Begin forwarded message: From: Maciek Konstantynowicz <mkons...@cisco.com<mailto:mkons...@cisco.com>> Subject: Re: [vpp-dev] VPP Performance drop from 17.04 to 17.07 Date: 13 September 2017 at 15:27:52 BST To: Billy McFall <bmcf...@redhat.com<mailto:bmcf...@redhat.com>>, "csit-...@lists.fd.io<mailto:csit-...@lists.fd.io>" <csit-...@lists.fd.io<mailto:csit-...@lists.fd.io>>, vpp-dev <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> Cc: "Maciek Konstantynowicz (mkonstan)" <mkons...@cisco.com<mailto:mkons...@cisco.com>> Hello, RECOMMENDATION After reviewing the results, CSIT team recommends to apply the l2fib MAC scale fix to vpp17.07 ASAP, as the fix greatly improves the NDR and PDR performance for all tested L2BD MAC scale scenarios. However, CSIT team wants to note that the VPP performance after the fix still shows a small regression compared to vpp17.04. Detail below.. FURTHER DETAIL Here the final update on CSIT verifying the code fix to correct VPP frame throughput for L2 bridging with higher scale MAC tables (bigger L2FIBs). Following number of CSIT jenkins jobs have been executed, each execution yielding one complete set of data, referred below as a sample. vpp master (with fix) tests - 10 samples vpp 17.04 tests - 6 samples The tests have been executed across all three physical testbeds present in FD.io<http://fd.io/> CSIT labs operated by LF IT and CSIT project team. Testbeds selection was pseudo-random based on testbed availability during jjb testbed allocation request. Breakdown of test results is included in updated .xlsx attachments to CSIT jira ticket CSIT-794 [5]. All other references for breakdown data stay unchanged [1]..[8]. In summary we report following relative FPS/PPS throughput change between vpp17.04 and vpp-master after the fix: 1,000,000 MAC entries in L2FIB up to 5% relative throughput drop 100,000 MAC entries in L2FIB up to 3% relative throughput drop 10,000 MAC entries in L2FIB up to 5% relative throughput drop In addition we have performed IXIA based soak tests over a period of over 36hrs (it's still running), with IXIA running at NDR rate (testcase: l2bdscale1mmaclrn-ndrdisc), with IXIA reporting 0.001% frame loss over the current duration of the test. Regards, -Maciek [1] CSIT-786 L2FIB scale testing [https://gerrit.fd.io/r/#/c/8145/ ge8145] [https://jira.fd.io/browse/CSIT-786 CSIT-786]; L2FIB scale testing for 10k, 100k, 1M FIB entries ./l2: 10ge2p1x520-eth-l2bdscale10kmaclrn-ndrpdrdisc.robot 10ge2p1x520-eth-l2bdscale100kmaclrn-ndrpdrdisc.robot 10ge2p1x520-eth-l2bdscale1mmaclrn-ndrpdrdisc.robot 10ge2p1x520-eth-l2bdscale10kmaclrn-eth-2vhostvr1024-1vm-cfsrr1-ndrpdrdisc 10ge2p1x520-eth-l2bdscale100kmaclrn-eth-2vhostvr1024-1vm-cfsrr1-ndrpdrdisc 10ge2p1x520-eth-l2bdscale1mmaclrn-eth-2vhostvr1024-1vm-cfsrr1-ndrpdrdisc [2] VPP master branch [https://gerrit.fd.io/r/#/c/8173/ ge8173]; [3] VPP stable/1707 [https://gerrit.fd.io/r/#/c/8167/ ge8167<https://gerrit.fd.io/r/#/c/8167/%20ge8167>]; [4] VPP stable/1704 [https://gerrit.fd.io/r/#/c/8172/ ge8172<https://gerrit.fd.io/r/#/c/8172/%20ge8172>]; [5] CSIT-794 VPP v17.07 L2BD yields lower NDR and PDR performance vs. v17.04, 20170825_l2fib_regression_10k_100k_1M.xlsx, [https://jira.fd.io/browse/CSIT-794 CSIT-794<https://jira.fd.io/browse/CSIT-794%20CSIT-794>]; [6] TRex v2.28 Ethernet FCS mis-calculation issue [https://jira.fd.io/browse/CSIT-793 CSIT-793<https://jira.fd.io/browse/CSIT-793%20CSIT-793>]; [7] commit 25ff2ea3a31e422094f6d91eab46222a29a77c4b; [8] VPP v17.07 L2BD NDR and PDR multi-thread performance broken [https://jira.fd.io/browse/VPP-963 VPP-963<https://jira.fd.io/browse/VPP-963%20VPP-963>]; On 28 Aug 2017, at 18:11, Maciek Konstantynowicz (mkonstan) <mkons...@cisco.com<mailto:mkons...@cisco.com>> wrote: On 28 Aug 2017, at 17:47, Billy McFall <bmcf...@redhat.com<mailto:bmcf...@redhat.com>> wrote: On Mon, Aug 28, 2017 at 8:53 AM, Maciek Konstantynowicz (mkonstan) <mkons...@cisco.com<mailto:mkons...@cisco.com>> wrote: + csit-dev Billy, Per the last week CSIT project call, from CSIT perspective, we classified your reported issue as Test coverage escape. Summary ======= CSIT test coverage got fixed, see more detail below. The CSIT tests uncovered regression for L2BD with MAC learning with higher total number of MACs in L2FIB, >>10k MAC, for multi-threaded configurations. Single- threaded configurations seem to be not impacted. Billy, Karl, Can you confirm this aligns with your findings? When you say "multi-threaded configuration", I assume you mean multiple worker threads? Yes, I should have said multiple data plane threads, in VPP land that’s worker threads indeed. Karl's tests had 4 workers, one for each NIC (physical and vhost-user). He only tested multi-threaded, so we can not confirm that single-threaded configurations seem to be not impacted. Okay. Still your result align with our tests, both CSIT and offline with IXIA. Our numbers are a little different from yours, but we are both seeing drops between releases. Your numbers are different most likely due to different MAC scale. You quote MAC scale per direction, we quote total MAC scale, i.e. total number of VPP l2fib entries. We had a bigger drop off with 10k flows, but seems to be similar with the million flow tests. Our 10k flows is equivalent of 2* 5k flows, defined as: flow-ab1 => (smac-a1,dmac-b1) flow-ab2 => (smac-a2,dmac-b2) .. flow-ab5000 => (smac-a5000,dmac-b5000) flow-ba1 => (smac-b1,dmac-a1) flow-ba2 => (smac-b2,dmac-a2) .. flow-ba5000 => (smac-b5000,dmac-a5000) In your case, based on description provided by Karl on the last CSIT call I read 10k flows tests has 2*10k flows, defined as: flow-ab1 => (smac-a1,dmac-b1) flow-ab2 => (smac-a2,dmac-b2) .. flow-ab10000 => (smac-a10000,dmac-b10000) flow-ba1 => (smac-b1,dmac-a1) flow-ba2 => (smac-b2,dmac-a2) .. flow-ba10000 => (smac-b10000,dmac-a10000) Also, your PDR packet loss tolerance at 0.002% Drop Rate is different than CSIT PDR (0.5% pkt loss rate tolerance) and NDR (zero pkt loss rate tolerance). I was a little disappointed the MAC limit change by John Lo on 8/23 didn't improve master number some. Thanks for all the hard work and adding these additional test cases. You are welcome. Thanks again for reporting this regression. Let’s wait for vpp-dev fix, so that we retest verify the fix. -Maciek Billy More detail =========== MAC scale tests have been now added L2BD and L2BD+vhost CSIT suites, as a simple extension to existing L2 testing suites. Some known issues with TG prevented CSIT to add those tests in the past, but now as TG issues have been addressed, the tests could be added swiftly. The complete list of added tests is listed in [1] - thanks to Peter Mikus for great work there! Results from running those tests multiple times within FD.io<http://fd.io/> CSIT lab infra can be glanced over by checking dedicated test trigger commits [2][3][4], summary graphs in linked xls [5]. The results confirm there is regression in VPP l2fib code affecting all scaled up MAC tests in multi-thread configuration. Single-thread configurations seems not be impacted. The tests in commit [1] are not merged yet, as they're waiting for TG/TRex team to fix TRex issue with mis-calculating Ethernet FCS with large number of L2 MAC flows (>10k MAC flows). Issue is tracked by [6], TRex v2.29 with the fix ETA is w/e 1-Sep i.e. this week. Reported CSIT test results are using Ethernet frames with UDP headers that's masking the TRex issue. We have also vpp git bisected the problem between v17.04 (good) and v17.07 (bad) in a separate IXIA based lab in SJC, and found the culprit vpp patch [7]. Awaiting fix from vpp-dev, jira ticket raised [8]. Many thanks for reporting this regression and working with CSIT to plug this hole in testing. -Maciek [1] CSIT-786 L2FIB scale testing [https://gerrit.fd.io/r/#/c/8145/ ge8145] [https://jira.fd.io/browse/CSIT-786 CSIT-786<https://gerrit.fd.io/r/#/c/8145/%20ge8145]%20[https://jira.fd.io/browse/CSIT-786%20CSIT-786>]; L2FIB scale testing for 10k, 100k, 1M FIB entries ./l2: 10ge2p1x520-eth-l2bdscale10kmaclrn-ndrpdrdisc.robot 10ge2p1x520-eth-l2bdscale100kmaclrn-ndrpdrdisc.robot 10ge2p1x520-eth-l2bdscale1mmaclrn-ndrpdrdisc.robot 10ge2p1x520-eth-l2bdscale10kmaclrn-eth-2vhostvr1024-1vm-cfsrr1-ndrpdrdisc 10ge2p1x520-eth-l2bdscale100kmaclrn-eth-2vhostvr1024-1vm-cfsrr1-ndrpdrdisc 10ge2p1x520-eth-l2bdscale1mmaclrn-eth-2vhostvr1024-1vm-cfsrr1-ndrpdrdisc [2] VPP master branch [https://gerrit.fd.io/r/#/c/8173/ ge8173<https://gerrit.fd.io/r/#/c/8173/%20ge8173>]; [3] VPP stable/1707 [https://gerrit.fd.io/r/#/c/8167/ ge8167<https://gerrit.fd.io/r/#/c/8167/%20ge8167>]; [4] VPP stable/1704 [https://gerrit.fd.io/r/#/c/8172/ ge8172<https://gerrit.fd.io/r/#/c/8172/%20ge8172>]; [5] CSIT-794 VPP v17.07 L2BD yields lower NDR and PDR performance vs. v17.04, 20170825_l2fib_regression_10k_100k_1M.xlsx, [https://jira.fd.io/browse/CSIT-794 CSIT-794<https://jira.fd.io/browse/CSIT-794%20CSIT-794>]; [6] TRex v2.28 Ethernet FCS mis-calculation issue [https://jira.fd.io/browse/CSIT-793 CSIT-793<https://jira.fd.io/browse/CSIT-793%20CSIT-793>]; [7] commit 25ff2ea3a31e422094f6d91eab46222a29a77c4b; [8] VPP v17.07 L2BD NDR and PDR multi-thread performance broken [https://jira.fd.io/browse/VPP-963 VPP-963<https://jira.fd.io/browse/VPP-963%20VPP-963>]; On 14 Aug 2017, at 23:40, Billy McFall <bmcf...@redhat.com<mailto:bmcf...@redhat.com>> wrote: In the last VPP call, I reported some internal Red Hat performance testing was showing a significant drop in performance between releases 17.04 to 17.07. This with l2-bridge testing - PVP - 0.002% Drop Rate: VPP-17.04: 256 Flow 7.8 MP/s 10k Flow 7.3 MP/s 1m Flow 5.2 MP/s VPP-17.07: 256 Flow 7.7 MP/s 10k Flow 2.7 MP/s 1m Flow 1.8 MP/s The performance team re-ran some of the tests for me with some additional data collected. Looks like the size of the L2 FIB table was reduced in 17.07. Below are the number of entries in the MAC Table after the tests are run: 17.04: show l2fib 4000008 l2fib entries 17.07: show l2fib 1067053 l2fib entries with 1048576 learned (or non-static) entries This caused more packets to be flooded (see out of 'show node counters' below). I looked but couldn't find anything. Is the size of the L2 FIB Table table configurable? Thanks, Billy McFall 17.04: show node counters Count Node Reason : 313035313 l2-input L2 input packets 555726 l2-flood L2 flood packets : 310115490 l2-input L2 input packets 824859 l2-flood L2 flood packets : 313508376 l2-input L2 input packets 1041961 l2-flood L2 flood packets : 313691024 l2-input L2 input packets 698968 l2-flood L2 flood packets 17.07: show node counters Count Node Reason : 97810569 l2-input L2 input packets 72557612 l2-flood L2 flood packets : 97830674 l2-input L2 input packets 72478802 l2-flood L2 flood packets : 97714888 l2-input L2 input packets 71655987 l2-flood L2 flood packets : 97710374 l2-input L2 input packets 70058006 l2-flood L2 flood packets -- Billy McFall SDN Group Office of Technology Red Hat _______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> https://lists.fd.io/mailman/listinfo/vpp-dev -- Billy McFall SDN Group Office of Technology Red Hat _______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> https://lists.fd.io/mailman/listinfo/vpp-dev
_______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev