Re: [PATCH net v1 2/2] lan743x: boost performance: limit PCIe bandwidth requirement

2020-12-16 Thread Sven Van Asbroeck
On Wed, Dec 16, 2020 at 8:01 PM Florian Fainelli wrote: > > x86 is a fully cache and device coherent memory architecture and there > are smarts like DDIO to bring freshly DMA'd data into the L3 cache > directly. For ARMv7, it depends on the hardware you have, most ARMv7 > SoCs do not have hardware

Re: [PATCH net v1 2/2] lan743x: boost performance: limit PCIe bandwidth requirement

2020-12-16 Thread Florian Fainelli
On 12/16/20 4:57 PM, Sven Van Asbroeck wrote: > Hi Andrew, > > On Wed, Dec 9, 2020 at 9:10 AM Andrew Lunn wrote: >> >> 9K is not a nice number, since for each allocation it probably has to >> find 4 contiguous pages. See what the performance difference is with >> 2K, 4K and 8K. If there is a big

Re: [PATCH net v1 2/2] lan743x: boost performance: limit PCIe bandwidth requirement

2020-12-16 Thread Sven Van Asbroeck
Hi Andrew, On Wed, Dec 9, 2020 at 9:10 AM Andrew Lunn wrote: > > 9K is not a nice number, since for each allocation it probably has to > find 4 contiguous pages. See what the performance difference is with > 2K, 4K and 8K. If there is a big difference, you might want to special > case when the MT

Re: [PATCH net v1 2/2] lan743x: boost performance: limit PCIe bandwidth requirement

2020-12-09 Thread Andrew Lunn
On Tue, Dec 08, 2020 at 10:49:16PM -0500, Sven Van Asbroeck wrote: > On Tue, Dec 8, 2020 at 6:36 PM Florian Fainelli wrote: > > > > dma_sync_single_for_{cpu,device} is what you would need in order to make > > a partial cache line invalidation. You would still need to unmap the > > same address+len

Re: [PATCH net v1 2/2] lan743x: boost performance: limit PCIe bandwidth requirement

2020-12-08 Thread Sven Van Asbroeck
On Tue, Dec 8, 2020 at 6:36 PM Florian Fainelli wrote: > > dma_sync_single_for_{cpu,device} is what you would need in order to make > a partial cache line invalidation. You would still need to unmap the > same address+length pair that was used for the initial mapping otherwise > the DMA-API debugg

Re: [PATCH net v1 2/2] lan743x: boost performance: limit PCIe bandwidth requirement

2020-12-08 Thread Andrew Lunn
> dma_sync_single_for_{cpu,device} is what you would need in order to make > a partial cache line invalidation. You would still need to unmap the > same address+length pair that was used for the initial mapping otherwise > the DMA-API debugging will rightfully complain. But often you don't unmap i

Re: [PATCH net v1 2/2] lan743x: boost performance: limit PCIe bandwidth requirement

2020-12-08 Thread Florian Fainelli
On 12/8/20 3:02 PM, Sven Van Asbroeck wrote: > Hi Andrew, > > On Tue, Dec 8, 2020 at 5:51 PM Andrew Lunn wrote: >> >>> >>> So I assumed that it's a PCIe dma bandwidth issue, but I could be wrong - >>> I didn't do any PCIe bandwidth measurements. >> >> Sometimes it is actually cache operations whi

Re: [PATCH net v1 2/2] lan743x: boost performance: limit PCIe bandwidth requirement

2020-12-08 Thread Jakub Kicinski
On Tue, 8 Dec 2020 16:54:33 -0500 Sven Van Asbroeck wrote: > > > Tested with iperf3 on a freescale imx6 + lan7430, both sides > > > set to mtu 1500 bytes. > > > > > > Before: > > > [ ID] Interval Transfer Bandwidth Retr > > > [ 4] 0.00-20.00 sec 483 MBytes 203 Mbits/sec

Re: [PATCH net v1 2/2] lan743x: boost performance: limit PCIe bandwidth requirement

2020-12-08 Thread Jakub Kicinski
On Tue, 8 Dec 2020 18:02:30 -0500 Sven Van Asbroeck wrote: > On Tue, Dec 8, 2020 at 5:51 PM Andrew Lunn wrote: > > > So I assumed that it's a PCIe dma bandwidth issue, but I could be wrong - > > > I didn't do any PCIe bandwidth measurements. > > > > Sometimes it is actually cache operations whic

Re: [PATCH net v1 2/2] lan743x: boost performance: limit PCIe bandwidth requirement

2020-12-08 Thread Sven Van Asbroeck
Hi Andrew, On Tue, Dec 8, 2020 at 5:51 PM Andrew Lunn wrote: > > > > > So I assumed that it's a PCIe dma bandwidth issue, but I could be wrong - > > I didn't do any PCIe bandwidth measurements. > > Sometimes it is actually cache operations which take all the > time. This needs to invalidate the c

Re: [PATCH net v1 2/2] lan743x: boost performance: limit PCIe bandwidth requirement

2020-12-08 Thread Andrew Lunn
> That's a good question. I used perf to create a flame graph of what > the cpu was doing when receiving data at high speed. It showed that > __dma_page_dev_to_cpu took up most of the cpu time. Which is triggered > by dma_unmap_single(9K, DMA_FROM_DEVICE). > > So I assumed that it's a PCIe dma ban

Re: [PATCH net v1 2/2] lan743x: boost performance: limit PCIe bandwidth requirement

2020-12-08 Thread Sven Van Asbroeck
Hi Jakub, thank you so much for reviewing this patchset ! On Tue, Dec 8, 2020 at 2:43 PM Jakub Kicinski wrote: > > > When the chip is working with the default 1500 byte MTU, a 9K > > dma buffer goes from chip -> cpu per 1500 byte frame. This means > > that to get 1G/s ethernet bandwidth, we need

Re: [PATCH net v1 2/2] lan743x: boost performance: limit PCIe bandwidth requirement

2020-12-08 Thread Jakub Kicinski
On Sat, 5 Dec 2020 22:44:08 -0500 Sven Van Asbroeck wrote: > From: Sven Van Asbroeck > > To support jumbo frames, each rx ring dma buffer is 9K in size. > But the chip only stores a single frame per dma buffer. > > When the chip is working with the default 1500 byte MTU, a 9K > dma buffer goes

[PATCH net v1 2/2] lan743x: boost performance: limit PCIe bandwidth requirement

2020-12-05 Thread Sven Van Asbroeck
From: Sven Van Asbroeck To support jumbo frames, each rx ring dma buffer is 9K in size. But the chip only stores a single frame per dma buffer. When the chip is working with the default 1500 byte MTU, a 9K dma buffer goes from chip -> cpu per 1500 byte frame. This means that to get 1G/s ethernet