On Wed, 2020-10-21 at 14:11 +0200, Arnd Bergmann wrote:
> (replying to my own mail from a different address to deal with the
> regular one being blacklisted somewhere, sorry for any duplicates)
> 
> On Wed, Oct 21, 2020 at 9:16 AM Arnd Bergmann <a...@arndb.de> wrote:
> > 
> > On Wed, Oct 21, 2020 at 12:10 AM Benjamin Herrenschmidt
> > <b...@kernel.crashing.org> wrote:
> > > On Tue, 2020-10-20 at 21:49 +0200, Arnd Bergmann wrote:
> > > > On Tue, Oct 20, 2020 at 11:37 AM Dylan Hung <dylan_h...@aspeedtech.com> 
> > > > wrote:
> > > > > > +1 @first is system memory from dma_alloc_coherent(), right?
> > > > > > 
> > > > > > You shouldn't have to do this. Is coherent DMA memory broken on your
> > > > > > platform?
> > > > > 
> > > > > It is about the arbitration on the DRAM controller.  There are two 
> > > > > queues in the dram controller, one is for the CPU access and the 
> > > > > other is for the HW engines.
> > > > > When CPU issues a store command, the dram controller just 
> > > > > acknowledges cpu's request and pushes the request into the queue.  
> > > > > Then CPU triggers the HW MAC engine, the HW engine starts to fetch 
> > > > > the DMA memory.
> > > > > But since the cpu's request may still stay in the queue, the HW 
> > > > > engine may fetch the wrong data.
> > > 
> > > Actually, I take back what I said earlier, the above seems to imply
> > > this is more generic.
> > > 
> > > Dylan, please confirm, does this affect *all* DMA capable devices ? If
> > > yes, then it's a really really bad design bug in your chips
> > > unfortunately and the proper fix is indeed to make dma_wmb() do a dummy
> > > read of some sort (what address though ? would any dummy non-cachable
> > > page do ?) to force the data out as *all* drivers will potentially be
> > > affected.
> > > 
> > > I was under the impression that it was a specific timing issue in the
> > > vhub and ethernet parts, but if it's more generic then it needs to be
> > > fixed globally.
> > 
> > We have CONFIG_ARM_HEAVY_MB for SoCs with similar problems,
> > it turns mb() and wmb() into a platform specific function call, though it
> > doesn't do that for dma_wmb() and smp_wmb(), which should not
> > be affected if the problem is only missing serialization between DMA
> > and CPU writes.

Right. I got mixed up by David mention of dma_wmb, it's not the issue
here, it's indeed the ordering of writes to "coherent" memory vs
iowrite.

> > > > If either of the two is the case, then the READ_ONCE() would just
> > > > introduce a long delay before the iowrite32() that makes it more likely
> > > > that the data is there, but the inconsistent state would still be 
> > > > observable
> > > > by the device if it is still working on previous frames.
> > > 
> > > I think it just get stuck until we try another packet, ie, it doesn't
> > > see the new descriptor valid bit. But Dylan can elaborate.
> > 
> > Ok, that would point to an insufficient barrier in iowrite32() as well,
> > not in dma_wmb().

Correct.

> > At the moment, the only chips that need the heavy barrier are
> > omap4 and mstar_v7, and early l2 cache controllers (not the one
> > on Cortex-A7) have another synchronization callback that IIRC
> > is used for streaming mappings.

 .../...

> > Obviously, adding one of these for ast2600 would slow down every
> > mb() and writel() a lot, but if it is a chip-wide problem rather than
> > one isolated to the network device, it would be the correct solution,
> > provided that a correct code sequence can be found.

I'm surprised that problem doesn't already exist on the ast2400 and
2500 and I thus worry about the performance impact of such a workaround
applied generally to every MMIO writes....

But we did kill mmiowb so ... ;-)

Cheers,
Ben.


Reply via email to