On Wed, 2020-10-21 at 14:11 +0200, Arnd Bergmann wrote: > (replying to my own mail from a different address to deal with the > regular one being blacklisted somewhere, sorry for any duplicates) > > On Wed, Oct 21, 2020 at 9:16 AM Arnd Bergmann <a...@arndb.de> wrote: > > > > On Wed, Oct 21, 2020 at 12:10 AM Benjamin Herrenschmidt > > <b...@kernel.crashing.org> wrote: > > > On Tue, 2020-10-20 at 21:49 +0200, Arnd Bergmann wrote: > > > > On Tue, Oct 20, 2020 at 11:37 AM Dylan Hung <dylan_h...@aspeedtech.com> > > > > wrote: > > > > > > +1 @first is system memory from dma_alloc_coherent(), right? > > > > > > > > > > > > You shouldn't have to do this. Is coherent DMA memory broken on your > > > > > > platform? > > > > > > > > > > It is about the arbitration on the DRAM controller. There are two > > > > > queues in the dram controller, one is for the CPU access and the > > > > > other is for the HW engines. > > > > > When CPU issues a store command, the dram controller just > > > > > acknowledges cpu's request and pushes the request into the queue. > > > > > Then CPU triggers the HW MAC engine, the HW engine starts to fetch > > > > > the DMA memory. > > > > > But since the cpu's request may still stay in the queue, the HW > > > > > engine may fetch the wrong data. > > > > > > Actually, I take back what I said earlier, the above seems to imply > > > this is more generic. > > > > > > Dylan, please confirm, does this affect *all* DMA capable devices ? If > > > yes, then it's a really really bad design bug in your chips > > > unfortunately and the proper fix is indeed to make dma_wmb() do a dummy > > > read of some sort (what address though ? would any dummy non-cachable > > > page do ?) to force the data out as *all* drivers will potentially be > > > affected. > > > > > > I was under the impression that it was a specific timing issue in the > > > vhub and ethernet parts, but if it's more generic then it needs to be > > > fixed globally. > > > > We have CONFIG_ARM_HEAVY_MB for SoCs with similar problems, > > it turns mb() and wmb() into a platform specific function call, though it > > doesn't do that for dma_wmb() and smp_wmb(), which should not > > be affected if the problem is only missing serialization between DMA > > and CPU writes.
Right. I got mixed up by David mention of dma_wmb, it's not the issue here, it's indeed the ordering of writes to "coherent" memory vs iowrite. > > > > If either of the two is the case, then the READ_ONCE() would just > > > > introduce a long delay before the iowrite32() that makes it more likely > > > > that the data is there, but the inconsistent state would still be > > > > observable > > > > by the device if it is still working on previous frames. > > > > > > I think it just get stuck until we try another packet, ie, it doesn't > > > see the new descriptor valid bit. But Dylan can elaborate. > > > > Ok, that would point to an insufficient barrier in iowrite32() as well, > > not in dma_wmb(). Correct. > > At the moment, the only chips that need the heavy barrier are > > omap4 and mstar_v7, and early l2 cache controllers (not the one > > on Cortex-A7) have another synchronization callback that IIRC > > is used for streaming mappings. .../... > > Obviously, adding one of these for ast2600 would slow down every > > mb() and writel() a lot, but if it is a chip-wide problem rather than > > one isolated to the network device, it would be the correct solution, > > provided that a correct code sequence can be found. I'm surprised that problem doesn't already exist on the ast2400 and 2500 and I thus worry about the performance impact of such a workaround applied generally to every MMIO writes.... But we did kill mmiowb so ... ;-) Cheers, Ben.