On 12/4/20 6:46 AM, Sieng Piaw Liew wrote:
> We can increase the efficiency of rx path by using buffers to receive
> packets then build SKBs around them just before passing into the network
> stack. In contrast, preallocating SKBs too early reduces CPU cache
> efficiency.
> 
> Check if we're in NAPI context when refilling RX. Normally we're almost
> always running in NAPI context. Dispatch to napi_alloc_frag directly
> instead of relying on netdev_alloc_frag which still runs
> local_bh_disable/enable.
> 
> Tested on BCM6328 320 MHz and iperf3 -M 512 to measure packet/sec
> performance. Included netif_receive_skb_list and NET_IP_ALIGN
> optimizations.
> 
> Before:
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-10.00  sec  49.9 MBytes  41.9 Mbits/sec  197         sender
> [  4]   0.00-10.00  sec  49.3 MBytes  41.3 Mbits/sec            receiver
> 
> After:
> [ ID] Interval           Transfer     Bandwidth       Retr
> [  4]   0.00-30.00  sec   171 MBytes  47.8 Mbits/sec  272         sender
> [  4]   0.00-30.00  sec   170 MBytes  47.6 Mbits/sec            receiver

Please test this again after GRO has been added to this driver.

Problem with build_skb() is that overall skb truesize after GRO might be 
increased
a lot, since we have sizeof(struct skb_shared_info) added overhead per MSS,
and this can double the truesize depending on device MTU.

This matters on long RTT flows, because an inflation of skb->truesize reduces
TCP receive window quite a lot.

Ideally if you want best performance, this driver should use napi_gro_frags(),
so that skb->len/skb->truesize is the smallest one.

In order to test your change you need to set up a testbed with 
10ms or 50ms delay between the hosts, unless this driver is only used
by hosts on the same LAN (which I doubt)


Reply via email to