On Wed, Sep 7, 2016 at 3:42 PM, Saeed Mahameed <sae...@mellanox.com> wrote:
> Packet rate performance testing was done with pktgen 64B packets and on > TX side and, TC drop action on RX side compared to XDP fast drop. > > CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz > > Comparison is done between: > 1. Baseline, Before this patch with TC drop action > 2. This patch with TC drop action > 3. This patch with XDP RX fast drop > > Streams Baseline(TC drop) TC drop XDP fast Drop > -------------------------------------------------------------- > 1 5.51Mpps 5.14Mpps 13.5Mpps > 2 11.5Mpps 10.0Mpps 25.1Mpps > 4 16.3Mpps 17.2Mpps 35.4Mpps > 8 29.6Mpps 28.2Mpps 45.8Mpps* > 16 34.0Mpps 30.1Mpps 45.8Mpps* Rana, Guys, congrat!! When you say X streams, does each stream mapped by RSS to different RX ring? or we're on the same RX ring for all rows of the above table? In the CX3 work, we had X sender "streams" that all mapped to the same RX ring, I don't think we went beyond one RX ring. Here, I guess you want to 1st get an initial max for N pktgen TX threads all sending the same stream so you land on single RX ring, and then move to M * N pktgen TX threads to max that further. I don't see how the current Linux stack would be able to happily drive 34M PPS (== allocate SKB, etc, you know...) on a single CPU, Jesper? Or.