On Wed, Sep 7, 2016 at 4:32 PM, Or Gerlitz <gerlitz...@gmail.com> wrote: > On Wed, Sep 7, 2016 at 3:42 PM, Saeed Mahameed <sae...@mellanox.com> wrote: > >> Packet rate performance testing was done with pktgen 64B packets and on >> TX side and, TC drop action on RX side compared to XDP fast drop. >> >> CPU: Intel(R) Xeon(R) CPU E5-2680 v3 @ 2.50GHz >> >> Comparison is done between: >> 1. Baseline, Before this patch with TC drop action >> 2. This patch with TC drop action >> 3. This patch with XDP RX fast drop >> >> Streams Baseline(TC drop) TC drop XDP fast Drop >> -------------------------------------------------------------- >> 1 5.51Mpps 5.14Mpps 13.5Mpps >> 2 11.5Mpps 10.0Mpps 25.1Mpps >> 4 16.3Mpps 17.2Mpps 35.4Mpps >> 8 29.6Mpps 28.2Mpps 45.8Mpps* >> 16 34.0Mpps 30.1Mpps 45.8Mpps* > > Rana, Guys, congrat!! > > When you say X streams, does each stream mapped by RSS to different RX ring? > or we're on the same RX ring for all rows of the above table?
Yes, I will make this more clear in the actual submission, Here we are talking about different RSS core rings. > > In the CX3 work, we had X sender "streams" that all mapped to the same RX > ring, > I don't think we went beyond one RX ring. Here we did, the first row is what you are describing the other rows are the same test with increasing the number of the RSS receiving cores, The xmit side is sending as many streams as possible to be as much uniformly spread as possible across the different RSS cores on the receiver. > > Here, I guess you want to 1st get an initial max for N pktgen TX > threads all sending > the same stream so you land on single RX ring, and then move to M * N pktgen > TX > threads to max that further. > > I don't see how the current Linux stack would be able to happily drive 34M PPS > (== allocate SKB, etc, you know...) on a single CPU, Jesper? > > Or.