From: Edward Cree <ec...@solarflare.com> Date: Tue, 6 Aug 2019 14:52:06 +0100
> This series listifies part of GRO processing, in a manner which allows those > packets which are not GROed (i.e. for which dev_gro_receive returns > GRO_NORMAL) to be passed on to the listified regular receive path. > dev_gro_receive() itself is not listified, nor the per-protocol GRO > callback, since GRO's need to hold packets on lists under napi->gro_hash > makes keeping the packets on other lists awkward, and since the GRO control > block state of held skbs can refer only to one 'new' skb at a time. > Instead, when napi_frags_finish() handles a GRO_NORMAL result, stash the skb > onto a list in the napi struct, which is received at the end of the napi > poll or when its length exceeds the (new) sysctl net.core.gro_normal_batch. > > Performance figures with this series, collected on a back-to-back pair of > Solarflare sfn8522-r2 NICs with 120-second NetPerf tests. In the stats, > sample size n for old and new code is 6 runs each; p is from a Welch t-test. > Tests were run both with GRO enabled and disabled, the latter simulating > uncoalesceable packets (e.g. due to IP or TCP options). The receive side > (which was the device under test) had the NetPerf process pinned to one CPU, > and the device interrupts pinned to a second CPU. CPU utilisation figures > (used in cases of line-rate performance) are summed across all CPUs. > net.core.gro_normal_batch was left at its default value of 8. ... > The above results are fairly mixed, and in most cases not statistically > significant. But I think we can roughly conclude that the series > marginally improves non-GROable throughput, without hurting latency > (except in the large-payload busy-polling case, which in any case yields > horrid performance even on net-next (almost triple the latency without > busy-poll). Also, drivers which, unlike sfc, pass UDP traffic to GRO > would expect to see a benefit from gaining access to batching. > > Changed in v3: > * gro_normal_batch sysctl now uses SYSCTL_ONE instead of &one > * removed RFC tags (no comments after a week means no-one objects, right?) > > Changed in v2: > * During busy poll, call gro_normal_list() to receive batched packets > after each cycle of the napi busy loop. See comments in Patch #3 for > complications of doing the same in busy_poll_stop(). > > [1]: Cohen 1959, doi: 10.1080/00401706.1959.10489859 Series applied, thanks Edward.