Hi Stefan, On 3 April 2017 at 08:26, Brüns, Stefan <stefan.bru...@rwth-aachen.de> wrote: > On Montag, 3. April 2017 01:23:17 CEST you wrote: >> Hi Stefan, >> >> On 2 April 2017 at 15:34, Stefan Bruens <stefan.bru...@rwth-aachen.de> > wrote: >> > On Sonntag, 2. April 2017 17:43:38 CEST Simon Glass wrote: >> >> Hi Stefan, >> >> >> >> On 2 April 2017 at 07:10, Stefan Bruens <stefan.bru...@rwth-aachen.de> >> > >> > wrote: >> >> > On Sonntag, 2. April 2017 05:01:41 CEST Marek Vasut wrote: >> >> >> On 04/02/2017 01:40 AM, Simon Glass wrote: >> >> >> > Hi Marek, >> >> >> > >> >> >> > On 1 April 2017 at 14:15, Marek Vasut <ma...@denx.de> wrote: >> >> >> >> On 04/01/2017 08:05 PM, Simon Glass wrote: >> >> >> >>> On Raspberry Pi 2 and 3 a problem was noticed when enabling driver >> >> >> >>> model >> >> >> >>> for USB: the cache invalidate after an incoming transfer does not >> >> >> >>> seem >> >> >> >>> to >> >> >> >>> work correctly. >> >> >> >>> >> >> >> >>> This may be a problem with the underlying caching implementation >> >> >> >>> on >> >> >> >>> armv7 >> >> >> >>> and armv8 but this seems very unlikely. As a work-around, use >> >> >> >>> separate >> >> >> >>> buffers for input and output. This ensures that the input buffer >> >> >> >>> will >> >> >> >>> not >> >> >> >>> hold dirty cache data. >> >> >> >> >> >> >> >> What do you think of this patch: >> >> >> >> [U-Boot] usb: dwc2: invalidate the dcache before starting the DMA >> >> >> > >> >> >> > Yes that matches what I did as a hack. I didn't realise that the DMA >> >> >> > would go through the cache. Thanks for the pointer. >> >> >> >> >> >> DMA should not go through the cache. I have yet to review that patch, >> >> >> but IMO it's relevant to this problem you observe. >> >> > >> >> > DMA transfers not going through the cache is probably the problem here: >> >> > >> >> > Assume we have the aligned_buffer at address 0xdead0000 >> >> > >> >> > 1. The cpu writes to address 0xdead0002. This is fine, as it is the >> >> > current >> >> > owner of the address. The cacheline is marked dirty. >> >> > 2. The cpu no longer needs the corresponding address range, and it is >> >> > reallocated (i.e. freed and then allocated from dwc2) or reused (i.e. >> >> > formerly out buffer, now in buffer). >> >> > 3. The CPU starts the DMA transfer >> >> > 4. The DMA transfer writes to e.g. 0xdead0000-0xdead0200 in memory. >> >> > 5. The CPU fetches an address aliasing with 0xdead0000. The dirty cache >> >> > line is evicted, and the 0xdead0000-0xdead0040 memory contents are >> >> > overwritten. >> >> >> >> This is the part I don't understand. This should be an invalidate, not >> >> a clean and invalidate, so there should be not memory write. >> >> >> >> Also if the CPU fetches from cached 0xdead0000 without an invalidate, >> >> it will not cause a cash clean. It will simple read the data from the >> >> cache and ignore what the DMA wrote. >> > >> > The CPU does not fetch 0xdead0000, but from an address *aliasing* with >> > 0xdead000. As 0xdead0000 is *dirty* (we have neither flushed (clears dirty >> > bit) or invalidated (implicitly clears dirty for the address)), the cache >> > controller has to write out the 0xdead0000 cache line to memory. >> >> That doesn't make any sense to me. Can you explain it a bit more? >> >> If the CPU fetches from a cache-alias of 0xdead0000, say 0xa11a5000 >> then I expect the cache line would contain the data for that alias, >> not for 0xdead0000. > > The important part is the dirty flag in the 0xdead0000 cacheline. By reading > an aliasing address, you are causing eviction of the current cache line > contents, and writing back its contents to memory. Reading of an address may > cause write of a different address. You can see it as an dcache_flush_range > done by the cache controller.
OK so I think you are saying that reading from 0xa11a5000 causes dirty data to be flushed from the cache to 0xdead0000. Makes perfect sense. But why is there dirty data at 0xdead0000? - If we did a write last time, then it would have been dirty until we flushed the cache line, which we did before telling the DMA to start up. - If we did a read last time, then it is clean. We have read the data, but not changed it. What am I missing? > > I think you are assuming a write-through cache here, which leads to your > confusion. No that's a separate issue. > >> So a later invalidate of 0xdead0000 should at most >> clean the cache line and write to memory at 0xa11a5000. If it were to >> write cached data intended for 0xa11a5000 to memory at 0xdead0000, >> then surely this would be a bug? > > After the cache line for 0xdead0000 has been evicted, any flush/invalidate > operations are noops for that address. OK good, so that's not the problem. > >> Therefore I cannot see the situation where the CPU should write to >> 0xdead0000 when that address is invalidated. > > It is not the invalidation which causes the write, but eviction from the > cache. > > > >> On armv8 we appear not to suppose invalidate in the code, so it makes >> >> sense for rpi_3. >> >> >> >> But for rpi_2 which seems to do a proper invalidate, I still don't see >> >> the problem. >> > >> > Which part of the code is different between rpi2 and rpi3? The dwc2 code >> > is >> > identical, is the memory invalidated in some other place? >> >> It is the invalidate_dcache_range() function. > > Thats for pointing that out, for anyone not having read the code: > > ARMv7 has different operations for flush_dcache_range and > invalidate_dcache_range, the former does a writeback of dirty cache lines and > sets the invalid tags for the corresponding cache lines, the latter only does > the invalidate part. > > ARMv8 does a writeback for both operations. I assume thats what you call > "improper". Yes, I believe it is wrong. Linux has a proper invalidate, why not U-Boot? > > The important part is, calling flush multiple times in a row is *exactly* the > same thing as calling flush once. Calling flush instead of invalidate is the > same thing *if* the dirty flag is not set, as the writeback part is skipped in > that case. Yes indeed. > >> >> > Obviously, the dirty cache line from (1.) has to be cleared at the >> >> > beginning of (3.), as Eddys patch does. >> >> >> >> But I still don't understand why we have to clean instead of just >> >> invalidate? >> > >> > The patch by Eddie Cai just does an invalidate_dcache_range on the >> > transfer >> > buffer, nothing else. Where do you see a "clean" (whatever that refers >> > to)? >> >> In the armv8 invalidate_dcache_range() function. > > The writeback does not happen, as the cacheline is not dirty. It should not > even be in the cache after invalidate has been called once. > > We have to make sure the buffers address range is not in the cache prior to > starting the DMA. We can either use invalidate_dcache_range or > flush_dcache_range to guarantee this. Which one we use does not matter here, > although invalidate only is typically a little bit more lightweight. Yes. Just to restate my assertion. It should be possible to: - have some dirty data in the cache - start up DMA - invalidate that data - read it in that order. It should not be necessary to move the invalidate to before the DMA start-up, right? Regards, Simon _______________________________________________ U-Boot mailing list U-Boot@lists.denx.de https://lists.denx.de/listinfo/u-boot