Hi, I'm working on a DPDK-based packet generator [1] and I recently tried to upgrade from DPDK 1.7.1 to 2.0.0. However, I noticed that DPDK 1.7.1 is about 25% faster than 2.0.0 for my use case.
So I ran some basic performance tests on the l2fwd example with DPDK 1.7.1, 1.8.0 and 2.0.0. I used an Intel Xeon E5-2620 v3 CPU clocked down to 1.2 GHz in order to ensure that the CPU and not the network bandwidth is the bottleneck. I configured l2fwd to forward between two interfaces of an X540 NIC using only a single CPU core (-q2) and measured the following throughput under full bidirectional load: Version TP [Mpps] Cycles/Pkt 1.7.1 18.84 84.925690021 1.8.0 16.78 95.351609058 2.0.0 16.40 97.56097561 DPDK 1.7.1 is about 15% faster in this scenario. The obvious suspect is the new mbuf structure introduced in DPDK 1.8, so I profiled L1 cache misses: Version L1 miss ratio 1.7.1 6.5% 1.8.0 13.8% 2.0.0 13.4% FWIW the performance results with my packet generator on the same 1.2 GHz CPU core are: Version TP [Mpps] L1 cache miss ratio 1.7 11.77 4.3% 2.0 9.5 8.4% The discussion about the original patch [2] which introduced the new mbuf structure addresses this potential performance degradation and mentions that it is somehow mitigated. It even claims a 20% *increase* in performance in a specific scenario. However, that doesn't seem to be the case for both l2fwd and my packet generator. Any ideas how to fix this? A 25% loss in throughput prevents me from upgrading to DPDK 2.0.0. I need the new lcore features and the 40 GBit driver updates, so I can't stay on 1.7.1 forever. Paul [1] https://github.com/emmericp/MoonGen [2] http://comments.gmane.org/gmane.comp.networking.dpdk.devel/5155