On Wed, Dec 16, 2020 at 8:01 PM Florian Fainelli wrote:
>
> x86 is a fully cache and device coherent memory architecture and there
> are smarts like DDIO to bring freshly DMA'd data into the L3 cache
> directly. For ARMv7, it depends on the hardware you have, most ARMv7
> SoCs do not have hardware
On 12/16/20 4:57 PM, Sven Van Asbroeck wrote:
> Hi Andrew,
>
> On Wed, Dec 9, 2020 at 9:10 AM Andrew Lunn wrote:
>>
>> 9K is not a nice number, since for each allocation it probably has to
>> find 4 contiguous pages. See what the performance difference is with
>> 2K, 4K and 8K. If there is a big
Hi Andrew,
On Wed, Dec 9, 2020 at 9:10 AM Andrew Lunn wrote:
>
> 9K is not a nice number, since for each allocation it probably has to
> find 4 contiguous pages. See what the performance difference is with
> 2K, 4K and 8K. If there is a big difference, you might want to special
> case when the MT
On Tue, Dec 08, 2020 at 10:49:16PM -0500, Sven Van Asbroeck wrote:
> On Tue, Dec 8, 2020 at 6:36 PM Florian Fainelli wrote:
> >
> > dma_sync_single_for_{cpu,device} is what you would need in order to make
> > a partial cache line invalidation. You would still need to unmap the
> > same address+len
On Tue, Dec 8, 2020 at 6:36 PM Florian Fainelli wrote:
>
> dma_sync_single_for_{cpu,device} is what you would need in order to make
> a partial cache line invalidation. You would still need to unmap the
> same address+length pair that was used for the initial mapping otherwise
> the DMA-API debugg
> dma_sync_single_for_{cpu,device} is what you would need in order to make
> a partial cache line invalidation. You would still need to unmap the
> same address+length pair that was used for the initial mapping otherwise
> the DMA-API debugging will rightfully complain.
But often you don't unmap i
On 12/8/20 3:02 PM, Sven Van Asbroeck wrote:
> Hi Andrew,
>
> On Tue, Dec 8, 2020 at 5:51 PM Andrew Lunn wrote:
>>
>>>
>>> So I assumed that it's a PCIe dma bandwidth issue, but I could be wrong -
>>> I didn't do any PCIe bandwidth measurements.
>>
>> Sometimes it is actually cache operations whi
On Tue, 8 Dec 2020 16:54:33 -0500 Sven Van Asbroeck wrote:
> > > Tested with iperf3 on a freescale imx6 + lan7430, both sides
> > > set to mtu 1500 bytes.
> > >
> > > Before:
> > > [ ID] Interval Transfer Bandwidth Retr
> > > [ 4] 0.00-20.00 sec 483 MBytes 203 Mbits/sec
On Tue, 8 Dec 2020 18:02:30 -0500 Sven Van Asbroeck wrote:
> On Tue, Dec 8, 2020 at 5:51 PM Andrew Lunn wrote:
> > > So I assumed that it's a PCIe dma bandwidth issue, but I could be wrong -
> > > I didn't do any PCIe bandwidth measurements.
> >
> > Sometimes it is actually cache operations whic
Hi Andrew,
On Tue, Dec 8, 2020 at 5:51 PM Andrew Lunn wrote:
>
> >
> > So I assumed that it's a PCIe dma bandwidth issue, but I could be wrong -
> > I didn't do any PCIe bandwidth measurements.
>
> Sometimes it is actually cache operations which take all the
> time. This needs to invalidate the c
> That's a good question. I used perf to create a flame graph of what
> the cpu was doing when receiving data at high speed. It showed that
> __dma_page_dev_to_cpu took up most of the cpu time. Which is triggered
> by dma_unmap_single(9K, DMA_FROM_DEVICE).
>
> So I assumed that it's a PCIe dma ban
Hi Jakub, thank you so much for reviewing this patchset !
On Tue, Dec 8, 2020 at 2:43 PM Jakub Kicinski wrote:
>
> > When the chip is working with the default 1500 byte MTU, a 9K
> > dma buffer goes from chip -> cpu per 1500 byte frame. This means
> > that to get 1G/s ethernet bandwidth, we need
On Sat, 5 Dec 2020 22:44:08 -0500 Sven Van Asbroeck wrote:
> From: Sven Van Asbroeck
>
> To support jumbo frames, each rx ring dma buffer is 9K in size.
> But the chip only stores a single frame per dma buffer.
>
> When the chip is working with the default 1500 byte MTU, a 9K
> dma buffer goes
From: Sven Van Asbroeck
To support jumbo frames, each rx ring dma buffer is 9K in size.
But the chip only stores a single frame per dma buffer.
When the chip is working with the default 1500 byte MTU, a 9K
dma buffer goes from chip -> cpu per 1500 byte frame. This means
that to get 1G/s ethernet
14 matches
Mail list logo