This series listifies part of GRO processing, in a manner which allows those packets which are not GROed (i.e. for which dev_gro_receive returns GRO_NORMAL) to be passed on to the listified regular receive path. I have not listified dev_gro_receive() itself, or the per-protocol GRO callback, since GRO's need to hold packets on lists under napi->gro_hash makes keeping the packets on other lists awkward, and since the GRO control block state of held skbs can refer only to one 'new' skb at a time. Nonetheless the batching of the calling code yields some performance gains in the GRO case as well.
Herewith the performance figures obtained in a NetPerf TCP stream test (with four streams, and irqs bound to a single core): net-next: 7.166 Gbit/s (sigma 0.435) after #2: 7.715 Gbit/s (sigma 0.145) = datum + 7.7% after #4: 7.890 Gbit/s (sigma 0.217) = datum + 10.1% (Note that the 'net-next' results were distinctly bimodal, with two results of about 8 Gbit/s and the remaining ten around 7 Gbit/s. I don't have a good explanation for this.) And with GRO disabled through ethtool -K (thus simulating traffic which is not GRO-able but, being TCP, is still passed to the GRO entry point): net-next: 4.756 Gbit/s (sigma 0.240) after #4: 5.355 Gbit/s (sigma 0.232) = datum + 12.6% Edward Cree (4): net: introduce list entry point for GRO sfc: use batched receive for GRO net: make listified RX functions return number of good packets net/core: handle GRO_NORMAL skbs as a list in napi_gro_receive_list drivers/net/ethernet/sfc/efx.c | 11 +++- drivers/net/ethernet/sfc/net_driver.h | 1 + drivers/net/ethernet/sfc/rx.c | 16 +++++- include/linux/netdevice.h | 6 +- include/net/ip.h | 4 +- include/net/ipv6.h | 4 +- net/core/dev.c | 104 ++++++++++++++++++++++++++-------- net/ipv4/ip_input.c | 39 ++++++++----- net/ipv6/ip6_input.c | 37 +++++++----- 9 files changed, 157 insertions(+), 65 deletions(-)