[dpdk-dev] Performance impact of "declaring" more CPU cores

Tom Barbette Thu, 24 Oct 2019 10:33:02 -0700

Hi all,

We're experiencing a very strange problem. The code of our applicationis modified to always use 8 cores. However, when running with "-l0-MAXCPU", with MAXCPU varying between 7 and 15 (therefore "allocating"more *unused* cores), the performance of the application change. It canbe multiple Gbps at very high speed (100G) with a large number of cores,and it is not linear in the sense that more cores will not necessarilyincrease, nor degrade performance.

An example can be seen here :https://kth.box.com/s/v5v1hyidd51ebd7b8ixmcw513lfqwj4a . Note theerrorbars (10 runs per points) show that it is not "global" variancethat create the difference between cores. Once you end up in a "class"of performance, you stay there even when changing some parameters (suchas the number of cores you actually use). From a research perspectivethis is problematic as we cannot trust results with X cores are indeed acertain %age better than Y core because it could be due to this problem,not what we "improved" in the application.

That application is a NAT and a FW, but we could observe that with otherVNFs, at different scales. The link is saturated at 100G with realpackets (avg size 1040B).

We have Mellanox ConnectX 5, DPDK 19.02. This could be observed onbetween Skylake and Cascade Lake machines. The example is asingle-socket machine with 8 cores and HT. But this happened with a NUMAmachine, or using only variation of the 18 cores of one sockets.

The only useful observation we made is that when we are in a "bad case",the LLC has more cache misses.

We could not really reproduce with XL710, but at 40G the problem can bebarely seen with MLX, so that path is not very conclusive. We do nothave other 100G NICs (hurry up Intel! :p). Similarly reproducing withtestpmd is hard, as the problem really shows with something like 8cores, and we need the load to be less than the NIC wire speed, but highenough to observe the effect...

To rule out that problem, we hardcoded the number of cores to be usedeverywhere inside the application. So except for DPDK internals, allresources are allocated for 8 cores.

And that is still the best lead. Could it be that we simply getlucky/unlucky buffer allocations for "something per lcore" and thereforepacket evict each others because of the limited associativity of thecache? At those speeds, failure of DDIO is death...Would there be a way to fix/verify the memory allocation? base-virtaddrdid not work though. ASLR disabled (enabled does not change anything).



Thanks for the help,

Tom

[dpdk-dev] Performance impact of "declaring" more CPU cores

Reply via email to