Hi Sven, > Subject: [PATCH net-next v2 1/5] lan743x: boost performance on cpu archs > w/o dma cache snooping > > EXTERNAL EMAIL: Do not click links or open attachments unless you know the > content is safe > > From: Sven Van Asbroeck <thesve...@gmail.com> > > The buffers in the lan743x driver's receive ring are always 9K, even when the > largest packet that can be received (the mtu) is much smaller. This performs > particularly badly on cpu archs without dma cache snooping (such as ARM): > each received packet results in a 9K dma_{map|unmap} operation, which is > very expensive because cpu caches need to be invalidated. > > Careful measurement of the driver rx path on armv7 reveals that the cpu > spends the majority of its time waiting for cache invalidation. > > Optimize by keeping the rx ring buffer size as close as possible to the mtu. > This limits the amount of cache that requires invalidation. > > This optimization would normally force us to re-allocate all ring buffers when > the mtu is changed - a disruptive event, because it can only happen when > the network interface is down. > > Remove the need to re-allocate all ring buffers by adding support for multi- > buffer frames. Now any combination of mtu and ring buffer size will work. > When the mtu changes from mtu1 to mtu2, consumed buffers of size mtu1 > are lazily replaced by newly allocated buffers of size mtu2. > > These optimizations double the rx performance on armv7. > Third parties report 3x rx speedup on armv8. > > Tested with iperf3 on a freescale imx6qp + lan7430, both sides set to mtu > 1500 bytes, measure rx performance: > > Before: > [ ID] Interval Transfer Bandwidth Retr > [ 4] 0.00-20.00 sec 550 MBytes 231 Mbits/sec 0 > After: > [ ID] Interval Transfer Bandwidth Retr > [ 4] 0.00-20.00 sec 1.33 GBytes 570 Mbits/sec 0 > > Signed-off-by: Sven Van Asbroeck <thesve...@gmail.com>
Looks good Reviewed-by: Bryan Whitehead <bryan.whiteh...@microchip.com>