On Wed, Jul 11, 2018 at 11:06 PM, Jesper Dangaard Brouer <bro...@redhat.com> wrote:
> Well, I would prefer you to implement those. I just did a quick > implementation (its trivially easy) so I have something to benchmark > with. The performance boost is quite impressive! sounds good, but wait > One reason I didn't "just" send a patch, is that Edward so-fare only > implemented netif_receive_skb_list() and not napi_gro_receive_list(). sfc does't support gro?! doesn't make sense.. Edward? > And your driver uses napi_gro_receive(). This sort-of disables GRO for > your driver, which is not a choice I can make. Interestingly I get > around the same netperf TCP_STREAM performance. Same TCP performance with GRO and no rx-batching or without GRO and yes rx-batching is by far not intuitive result to me unless both these techniques mostly serve to eliminate lots of instruction cache misses and the TCP stack is so much optimized that if the code is in the cache, going through it once with 64K byte GRO-ed packet is like going through it ~40 (64K/1500) times with non GRO-ed packets. What's the baseline (with GRO and no rx-batching) number on your setup? > I assume we can get even better perf if we "listify" napi_gro_receive. yeah, that would be very interesting to get there