On 12/4/20 6:46 AM, Sieng Piaw Liew wrote:
> We can increase the efficiency of rx path by using buffers to receive
> packets then build SKBs around them just before passing into the network
> stack. In contrast, preallocating SKBs too early reduces CPU cache
> efficiency.
>
> Check if we're in NAPI context when refilling RX. Normally we're almost
> always running in NAPI context. Dispatch to napi_alloc_frag directly
> instead of relying on netdev_alloc_frag which still runs
> local_bh_disable/enable.
>
> Tested on BCM6328 320 MHz and iperf3 -M 512 to measure packet/sec
> performance. Included netif_receive_skb_list and NET_IP_ALIGN
> optimizations.
>
> Before:
> [ ID] Interval Transfer Bandwidth Retr
> [ 4] 0.00-10.00 sec 49.9 MBytes 41.9 Mbits/sec 197 sender
> [ 4] 0.00-10.00 sec 49.3 MBytes 41.3 Mbits/sec receiver
>
> After:
> [ ID] Interval Transfer Bandwidth Retr
> [ 4] 0.00-30.00 sec 171 MBytes 47.8 Mbits/sec 272 sender
> [ 4] 0.00-30.00 sec 170 MBytes 47.6 Mbits/sec receiver
Please test this again after GRO has been added to this driver.
Problem with build_skb() is that overall skb truesize after GRO might be
increased
a lot, since we have sizeof(struct skb_shared_info) added overhead per MSS,
and this can double the truesize depending on device MTU.
This matters on long RTT flows, because an inflation of skb->truesize reduces
TCP receive window quite a lot.
Ideally if you want best performance, this driver should use napi_gro_frags(),
so that skb->len/skb->truesize is the smallest one.
In order to test your change you need to set up a testbed with
10ms or 50ms delay between the hosts, unless this driver is only used
by hosts on the same LAN (which I doubt)