Hi Billy, Thanks for the follow up information. VPP does have a MAC learn limit of around 1M for quite a while. It is controlled by 2 variables global_learn_limit and global_learn_count defined in l2_learn.h. At init time, global_learn_limit is set to about 1M in the function l2_learn_init() in l2_learn.c: mp->global_learn_limit = L2FIB_NUM_BUCKETS * 16;
The variable global_learn_count is kept to track current learned MACs. For 1707, we added display of global_learn_count in the output of “show l2fib”. There were minor fixes in this area between 1704 and 1707 and I suppose it makes the learn limit check more effective in 1707 while not in 1704. Unfortunately, there is no API/CLI to change this limit at runtime. We need to change this limit to 4M or more for your 1M flow test case. The learn limit, however, cannot fully explain the performance difference you observed for 10K flows. Perhaps we should concentrate on the 10K flow case as to why 1707 performance is low. Did you also observe similar flooding behavior for packet forwarding with 10K flows? I see that the test is done with two bridge domains (BDs) with a PNIC and a vHost port in each BD. A few more questions on the test setup and script: · Are there any other BDs or ethernet ports involved in the test? · Is MAC aging enabled on the BDs? I assume not as it is not enabled by default. · Is VPP running with multiple worker threads? If so, how many? · Are all MACs learned? Is VPP API/CLI used to add/delete MACs in L2FIB? If so, are these MACs added as static MAC? · Is there any calls to VPP API/CLI to clear or flush L2FIB? It will be helpful to get the following information: · The output of CLI “show bridge” · The output of CLI “show bridge <bd-id> details” for each BD. It will be helpful to get the following information for 1704 and 1707 while the 100K flow test is running in its optimal forwarding MPPS state: · Use the CLI “clear node counters”, wait 2 seconds, get the output of “show node counters” · Use the CLI “clear run”, wait 2 seconds, get the output of “show run”. The “show node counters” output will give indication of how much learning or flooding is ongoing. The “show run” output will tell us, for the time period between clear-run and show-run, which VPP nodes were called, how many calls to each node, how many packets were processed for each call, average clocks used by each node to process a packet. We can then compare the difference between 1704 and 1707 to get more clues. Regards, John From: Billy McFall [mailto:bmcf...@redhat.com] Sent: Thursday, August 17, 2017 2:14 PM To: John Lo (loj) <l...@cisco.com> Cc: vpp-dev@lists.fd.io Subject: Re: [vpp-dev] VPP Performance drop from 17.04 to 17.07 On Tue, Aug 15, 2017 at 8:05 AM, John Lo (loj) <l...@cisco.com<mailto:l...@cisco.com>> wrote: Hi Billy, The output of “show l2fib” is showing how many MAC entries exist in the L2FIB and is not relevant to the size of L2FIB table. The L2FIB table size is not configurable. It is a bi-hash table with size set by the following #def’s in l2_fib.h and has not changed for quite a while, definitely not between 1704, 1707 and current master: /* * The size of the hash table */ #define L2FIB_NUM_BUCKETS (64 * 1024) #define L2FIB_MEMORY_SIZE (256<<20) I previously looked through the git tree for recent changes and noticed that these did not change. I was hoping these were more a MAX size but I guess not. It is interesting to note that at the end of the test run, there is different number of MAC entries in the L2FIB. I think this may have to do with a change in 1707 where an interface up/down would cause MACs learned on that interface to be flushed. So when the interface come back up, the MACs needs to be learned again. With 1704, the stale learned MACs from an interface will remain in L2FIB even if the interface is down or deleted unless aging is enabled to removed them at the BD aging interval. Our test team reexamined scripts and re-ran some tests and they don't believe the interfaces are going up and down throughout the test. Here is there finding: I've looked over our code and I don't see how we might be causing any up/down or device add/delete actions during the course of a test. I ran some more tests yesterday where I did more fine grained flow scaling. I ran the following flow counts (N): 100 1,000 10,000 100,000 200,000 400,000 I used the profiler I wrote based on Billy's recommendations and I compared the l2fib entries for each of the flow counts. They are equal up to 400,000 flows where they diverge. For the smaller flow counts (<= 200,000) the number of entries in the l2fib is usually 4N the flow count plus a few more (the entries for the physical nics and the vhost-user nics). At 400,000 flows VPP17.04 has the expected 1.6M entries but VPP17.07 has the ~1M entries. I believe the 4N factor makes sense because our topology looks like this: physical nic 1 <--> bridge 1 <--> vhost-user nic 1 physical nic 2 <--> bridge 2 <--> vhost-user nic 2 When we use N flows, that is N flows in each direction -- meaning there are really 2N flows in total. Packets that go in physical nic 1 exit physical nic 2 and vice versa -- so all packets traverse both bridges so they require 2 l2fib entries each. 2 entries * 2N flows = 4N entries for N flows since this test is bidirectional. I can't yet explain the ~1M entry limit that we are seeing in VPP17.07 but the fact that VPP17.04 and VPP17.07 have the same number of entries for smaller flow counts leads me to believe it is not a reflection of actions that are taken as part of running our test. Another improvement added in 1707 was a check in the l2-fwd node so when a MAC entry is found in L2FIB, its sequence number is checked to make sure it is not stale and subject to flushing (such as MAC learned when this interface sw_if_index was up but went down, or if this sw_if_index was used, deleted and reused). If the MAC is stale, the packet will be flooded instead of making use of the stale MAC entry to forward it. The number of flooded packets is definitely the issue. I wonder if there is a false trigger making the MAC look stale. I wonder if the test script for the performance does create/delete interfaces or set interface to admin up/down states causing stale MACs be flushed in 1707? With 1704, it may be using stale MAC entries to forward packets rather than flooding to learn the MACs again. This can explain the l2-flood and l2-input count ratio difference between 1704 and 1707. When measuring l2-bridg forwarding performance, are you setup to measure the forwarding rate in the steady forwarding state? If all the 10K or 1M flows are started at the same time for a particular test, there will be an initial low PPS throughput period when all packets needs to be flooded and MACs earned before it settle down to a higher steady state PPS forwarding rate. If there is interface flap or other events that causes MAC flush, the MAC will needs to be learned again. I wonder if the forwarding performance for 10K or 1M flows is measure at the steady forwarding state or not. I think the test is constantly adding flows, so the measurements are probably not in the steady forwarding state. However, I think the issue is really the difference between 17.04 and 17.07. 17.04 was the same test case, so 17.04 was not measuring in the steady forwarding state either. The issue appears to be (see number below from previous email) that a magnitude more packets were flooded than before. Billy Above are a few generic comments I can think of, without knowing much details about how the tests are setup and measured. Hope it can help to explain the different behavior observed between 1704 and 1707. Regards, John From: vpp-dev-boun...@lists.fd.io<mailto:vpp-dev-boun...@lists.fd.io> [mailto:vpp-dev-boun...@lists.fd.io<mailto:vpp-dev-boun...@lists.fd.io>] On Behalf Of Billy McFall Sent: Monday, August 14, 2017 6:40 PM To: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> Subject: [vpp-dev] VPP Performance drop from 17.04 to 17.07 In the last VPP call, I reported some internal Red Hat performance testing was showing a significant drop in performance between releases 17.04 to 17.07. This with l2-bridge testing - PVP - 0.002% Drop Rate: VPP-17.04: 256 Flow 7.8 MP/s 10k Flow 7.3 MP/s 1m Flow 5.2 MP/s VPP-17.07: 256 Flow 7.7 MP/s 10k Flow 2.7 MP/s 1m Flow 1.8 MP/s The performance team re-ran some of the tests for me with some additional data collected. Looks like the size of the L2 FIB table was reduced in 17.07. Below are the number of entries in the MAC Table after the tests are run: 17.04: show l2fib 4000008 l2fib entries 17.07: show l2fib 1067053 l2fib entries with 1048576 learned (or non-static) entries This caused more packets to be flooded (see out of 'show node counters' below). I looked but couldn't find anything. Is the size of the L2 FIB Table table configurable? Thanks, Billy McFall 17.04: show node counters Count Node Reason : 313035313 l2-input L2 input packets 555726 l2-flood L2 flood packets : 310115490 l2-input L2 input packets 824859 l2-flood L2 flood packets : 313508376 l2-input L2 input packets 1041961 l2-flood L2 flood packets : 313691024 l2-input L2 input packets 698968 l2-flood L2 flood packets 17.07: show node counters Count Node Reason : 97810569 l2-input L2 input packets 72557612 l2-flood L2 flood packets : 97830674 l2-input L2 input packets 72478802 l2-flood L2 flood packets : 97714888 l2-input L2 input packets 71655987 l2-flood L2 flood packets : 97710374 l2-input L2 input packets 70058006 l2-flood L2 flood packets -- Billy McFall SDN Group Office of Technology Red Hat -- Billy McFall SDN Group Office of Technology Red Hat
_______________________________________________ vpp-dev mailing list vpp-dev@lists.fd.io https://lists.fd.io/mailman/listinfo/vpp-dev