>> on all the logic hidden behind rte_eth_rx_burst call, > won’t rte_eth_rx_burst just end up using the pmd registered functions.
Yes. But I personally do not have visibility to DPDK software, not to mention NIC firmware and hardware. From the fact that small-ish traffic does not lead to 1 vector/node, I conclude batching happens somewhere. And I do not see any batching in VPP code (except the 32 packet threshold one). > why would the rx threshold or the number of bits in the vector matter to us? The threshold is there to increase efficiency. If less then 32 packets are read, traffic is probably not very high, and VPP chooses to reduce latency by processing packets immediately. If 32 or more packets are read, traffic is probably high enough for risk of packet loss becoming more important than latency. In that case, VPP attempts immediate reads, to increase vectors/node even more for the processing phase. There is a possibility for experimentation, by re-compiling VPP with a different threshold value, to see how does latency, throughput and vectors/node change. > there is no copying or extra storage involved > Is there some other memory limitation? Once again, I personally have no clear idea. If I had to guess, I would say there is "copying" over PCI (between NIC and CPU cache). Packet size matters there. Vratko. From: csit-...@lists.fd.io <csit-...@lists.fd.io> On Behalf Of Jeremy Brown via lists.fd.io Sent: Monday, 2020-July-06 16:56 To: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) <vrpo...@cisco.com>; Dave Barach (dbarach) <dbar...@cisco.com>; vpp-dev@lists.fd.io; Dany Gregoire <dany_grego...@affirmednetworks.com> Cc: csit-...@lists.fd.io Subject: Re: [csit-dev] [vpp-dev] Vectors/node and packet size Thanks for you reply.. I looked at your references, and I did notice in docs.fd.io they seem to see the same behavior, which is good to corroborate what were seeing… but I am still unclear as to why… I assumed that we are simply dealing with pointers to the pkt, and there is no copying or extra storage involved… why would the rx threshold or the number of bits in the vector matter to us? Is there some other memory limitation? Also, won’t rte_eth_rx_burst just end up using the pmd registered functions. We noticed the same behavior in both i40e and mlx drivers. I’m not saying that both pmd drivers can’t have a similar problem, just a data point. Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10 From: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco)<mailto:vrpo...@cisco.com> Sent: Monday, July 6, 2020 8:10 AM Subject: RE: [vpp-dev] Vectors/node and packet size > I was wondering if anyone had noticed some similar behavior. Not until you mentioned it, but the pattern is visible in CSIT results. See for example here [0] (the page is very long, and you have to scroll over multicore results to compare packet sizes). > dpdk-input In your result, calls outnumber vectors 100:1. As the subsequent nodes are only called if there was a non-zero number of vectors at input, VPP mostly waits for DPDK to read new packets. In the CSIT results, pps is higher, and VPP spends more time actually processing the packets. > Using a 64 byte packet, we see a vectors/node of ~80. One packet takes (64 + 20) * 8 = 672 bits on wire, so one node call handles ~53760 bits. > Simply changing that packet size to 1400 we see the same vectors/node fall > down to ~2. 2 * (1400 + 20) * 8 = 22720 bits per node call. That is ~42% of the previous value. > there seems to be a non-linear decrease In VPP code I see one non-linearity here [1]. The threshold value (32 packets per call) is 40% of the observed ~80 value, so that might be it. Assuming DPDK reads return with a fixed frequency. But of course, the real behavior heavily depends on all the logic hidden behind rte_eth_rx_burst call, and we know [2] that can be complicated. Vratko. [0] https://docs.fd.io/csit/rls2001/report/test_operational_data/vpp_performance_operational_data_2n_clx/ip4_xxv710.html#n1l-25ge2p1xxv710-ethip4-ip4base-ndrpdr [1] https://github.com/FDio/vpp/blob/dfb19cabe20ccf1cbd1aa714f493ccd322839b91/src/plugins/dpdk/device/node.c#L316-L317 [2] https://jira.fd.io/browse/VPP-1876 From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> On Behalf Of Jeremy Brown via lists.fd.io Sent: Wednesday, 2020-July-01 18:10 To: Dave Barach (dbarach) <dbar...@cisco.com<mailto:dbar...@cisco.com>>; vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>; Dany Gregoire <dany_grego...@affirmednetworks.com<mailto:dany_grego...@affirmednetworks.com>> Subject: Re: [vpp-dev] Vectors/node and packet size VPP is restarted before each run, all test runs are the same duration. There is no dead air time. Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10 From: Dave Barach (dbarach)<mailto:dbar...@cisco.com> Sent: Wednesday, July 1, 2020 9:52 AM To: bjerem...@yahoo.com<mailto:bjerem...@yahoo.com>; vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>; Dany Gregoire<mailto:dany_grego...@affirmednetworks.com> Subject: RE: [vpp-dev] Vectors/node and packet size In order for the statistics to be accurate, please be sure to do the following: Start traffic... “clear run”... wait a while to accumulate data... “show run” Otherwise, the statistics will probably include a huge amount of dead airtime, data from previous runs, etc. HTH... Dave From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> On Behalf Of Jeremy Brown via lists.fd.io Sent: Monday, June 29, 2020 12:22 PM To: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>; Dany Gregoire <dany_grego...@affirmednetworks.com<mailto:dany_grego...@affirmednetworks.com>> Subject: [vpp-dev] Vectors/node and packet size Greetings, This is my first post to the forum, so if this is not the right place for this post please let me know. I had a question on VPP performance. We are running two testcases, we limit it to single threaded and just using one core in order to reduce as many variables as we can. In the two testcases, the only thing that changes, is the size incoming packet to VPP. Using a 64 byte packet, we see a vectors/node of ~80. Simply changing that packet size to 1400 we see the same vectors/node fall down to ~2. This is regardless of pps… there seems to be a non-linear decrease in vectors/node with increasing packet size. I was wondering if anyone had noticed some similar behavior. 64- byte packets Thread 1 vpp_wk_0 (lcore 2) Time 98.9, average vectors/node 80.35, last 128 main loops 0.00 per node 0.00 vector rates in 1.2643e5, out 1.2643e5, drop 0.0000e0, punt 2.0228e-2 Name State Calls Vectors Suspends Clocks Vectors/Call VirtualFuncEthernet88/10/4-out active 90915 6249981 0 1.06e1 68.75 VirtualFuncEthernet88/10/4-tx active 90915 6249981 0 4.06e1 68.75 VirtualFuncEthernet88/11/5-out active 73270 6249981 0 9.27e0 85.30 VirtualFuncEthernet88/11/5-tx active 73270 6249981 0 4.05e1 85.30 arp-input active 2 2 0 3.51e4 1.00 dpdk-input polling 1166129337 12499964 0 1.38e4 .01 error-punt active 2 2 0 5.56e3 1.00 ethernet-input active 2 2 0 1.47e4 1.00 gtpu4-encap active 90914 6249980 0 1.01e2 68.75 gtpu4-input active 73270 6249981 0 7.29e1 85.30 interface-output active 2 2 0 2.20e3 1.00 ip4-input-no-checksum active 145570 12499962 0 2.22e1 85.87 ip4-load-balance active 90914 6249980 0 1.77e1 68.75 ip4-local active 73272 6249983 0 2.45e1 85.29 ip4-lookup active 218840 18749943 0 3.79e1 85.68 ip4-punt active 2 2 0 1.27e3 1.00 ip4-rewrite active 236482 18749940 0 2.75e1 79.29 ip4-udp-lookup active 73270 6249981 0 2.44e1 85.301400-byte packets Thread 1 vpp_wk_0 (lcore 2) Time 102.1, average vectors/node 2.37, last 128 main loops 0.00 per node 0.00 vector rates in 1.1841e5, out 1.1438e5, drop 4.0334e3, punt 1.9588e-2 Name State Calls Vectors Suspends Clocks Vectors/Call VirtualFuncEthernet88/10/4-out active 2815250 5838981 0 8.18e1 2.07 VirtualFuncEthernet88/10/4-tx active 2815250 5838981 0 1.25e2 2.07 VirtualFuncEthernet88/11/5-out active 2765634 5839804 0 8.42e1 2.11 VirtualFuncEthernet88/11/5-tx active 2765634 5839804 0 2.32e2 2.11 arp-input active 9 825 0 2.25e3 91.67 dpdk-input polling 1136982388 12089787 0 1.44e4 .01 error-drop active 397116 411823 0 1.37e2 1.04 error-punt active 2 2 0 5.58e3 1.00 ethernet-input active 9 825 0 7.42e1 91.67 gtpu4-encap active 2815249 5838980 0 2.21e2 2.07 gtpu4-input active 3161920 6249981 0 2.10e2 1.98 interface-output active 2 2 0 2.42e3 1.00 ip4-glean active 397109 411000 0 1.58e2 1.03 ip4-input-no-checksum active 3733176 12088962 0 1.09e2 3.24 ip4-load-balance active 2815249 5838980 0 1.07e2 2.07 ip4-local active 3161922 6249983 0 1.12e2 1.98 ip4-lookup active 6895096 18338943 0 1.52e2 2.66 ip4-punt active 2 2 0 2.03e3 1.00 ip4-rewrite active 6151314 17516940 0 9.56e1 2.85 ip4-udp-lookup active 3161920 6249981 0 8.69e1 1.98 Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#16931): https://lists.fd.io/g/vpp-dev/message/16931 Mute This Topic: https://lists.fd.io/mt/75423872/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-