On Tue, Feb 6, 2018 at 5:51 AM, Tal Gilboa <ta...@mellanox.com> wrote: > On 1/24/2018 5:09 PM, Eric Dumazet wrote: >> >> On Wed, 2018-01-24 at 16:42 +0200, Tal Gilboa wrote: >>> >>> Hi Eric, >>> My choice of words in my comment was misplaced, and I apologies. It >>> completely missed the point. I understand, of course, the importance of >>> optimizing real-life scenarios. >>> >>> We are currently evaluating this patch and if/how it might affect our >>> customers. We would also evaluate your suggestion below. >>> >>> We will contact you if and when we have a real concern. >>> Thanks. >> >> >> Sure, I am curious how a 50% regression can be possible as a matter of >> fact, so please update even if this caused by some specific synthetic >> test conditions. >> >> Thanks. >> > > Sorry for the delay in the response. I ran super_netperf for 64B, 128B, 256B > and 512B and 500, 1000, 2000 and 4000 streams and compared these > (consecutive) commits: > Base - f331981 tcp: pass previous skb to tcp_shifted_skb() > rb_tree - 75c119a tcp: implement rb-tree based retransmit queue > > I got lower BW with rb-tree for all cases. > Example - 2000 streams results (in Gb/s): > size | Base | rb-tree | degradation > 64 | 25.6 | 23.3 | -9% > 128 | 52.8 | 44.43 | -16% > 256 | 89.8 | 66.1 | -26.5% > 512 | 87.7 | 67.8 | -22.7% > > I'm currently working on improving our CPU utilization in TX flow (by better > utilizing payload aggregation mechanisms). It somewhat improves the rb-tree > results when applied on top of it, but not for all cases and not to the > "base" results.
Hi Please give exact details. Sending 64, 128, 256 or 512 bytes at a time on TCP_STREAM makes little sense. We are not optimizing stack for pathological cases, sorry. If you are using MSG_EOR to force silly skbs in the rtx queue, then you should not do that ... Thanks.