On 09/06/2020 19:47, Michael Nazzareno Trimarchi wrote: Hi,
that looks much better now, thanks! Some suggestion below. > On Tue, Jun 9, 2020 at 6:13 AM Bin Meng <bmeng...@gmail.com> wrote: >> >> Hi André, >> >> On Mon, Jun 8, 2020 at 9:52 PM André Przywara <andre.przyw...@arm.com> wrote: >>> >>> On 07/06/2020 12:22, Jagan Teki wrote: >>> >>> Hi, >>> >>> (CC: ing Mark) >>> >>> Without looking to deep, I think invalidating the cache might be the >>> right thing to do, but the rationale or at least the wording of it seems >>> somehow flawed: >>> >>>> Some architecture like ARM Cortex A53, A72 would need >>> >>> Please don't mix the term "architecture" with actual implementations. >>> And the problem is more general, I would say. This *might* be a case >>> here where this is due to speculation, and then I would expect it to >>> only show on an A72, for instance. I guess this is about NVMe on RK3399? >>> Does U-Boot run on an A53 or an A72 core? >>> >>>> to invalidate dcache to sync the cache with the memory >>>> contents before flushing the cache to memory. >>> >>> That is somewhat confusing. What does "sync" and "flush" mean here? >>> AFAIK only "clean" and "invalidate" are clearly defined terms. >>> The command buffer is cleaned to DRAM in nvme_submit_cmd(), so this is >>> purely about the buffer for the *return* data, in which case I would >>> expect this being a pure invalidation problem. >>> >>> This issue looks like the case described on slide 30 and 31 here: >>> https://elinux.org/images/6/6d/Rutland2.pdf >> >> Thanks for the slides. This is really helpful. >> >>> >>>> The NVME here submitting the admin command using dma_addr >>>> to the memory without prior cache invalidation. This causing >>>> dma_addr is pointing to an invalid location in the memory >>> >>> This does not sound right: dma_addr is always pointing to the right >>> location in memory. >>> The actual reason this fixes something might be that (some) cache lines >>> of the DMA buffer were dirty and were evicted *before* the existing >>> invalidation, but *after* the DMA transfer. That sounds unlikely in an >>> U-Boot context, though, but would match the case described in Mark's slides. >>> >>> So there might be more to this. Are we missing a barrier here? Should we >>> use coherent buffers (memory mapped as un-cached) for the return DMA in >>> the first place (but I don't think U-Boot provides those)? >>> >>>> and found the sane nvme_ctrl result. >>>> >>>> Below example shows the nvme disk scan result improper result >>>> >>>> => nvme scan >>>> nvme_get_info_from_identify: nn = 544502629, vwc = 100, >>>> sn = dev_0T, mn = `�\�, fr = t_part, mdts = 105 >>>> >>>> So, invalidating the cache before submitting the admin command >>>> makes the dma_addr points to a valid location in the memory. >>> >>> If this shows up already, I think we should address all the comments in >>> the driver where it says we should do cache maintenance. And it makes me >>> wonder how this actually works today ... >>> >> >> I believe the driver was only validated on x86 and NXP's PowerPC >> series where cache coherence is ensured between the DMA master and the >> system memory. I suspect it was never validated on ARM hence this >> patch. >> >> I agree we should address all the comments in the driver about cache >> maintenance. > > What about this description: > > nvme: Invalidate dcache before submitting admin cmd > > This patch tries to avoid eviction of dirty lines during DMA > transfer. The code right now executes the following step: > > - allocate the buffer > - start a DMA operation using the non-coherent DMA buffer > - invalidate cache lines associated with the buffer > - read the buffer > > This can lead to reading back not valid information. CPU can still read > stale data. In order to avoid the eviction of dirty lines, a new > invalidation is required before starting the DMA operation. The patch > just adds an invalidation before submitting the DMA command. As I think this case is not obvious, I would make it more clear what actually happens here, for instance (replace your paragraph above): ====================== This can lead to reading back not valid information, because the cache controller could evict dirty cache lines belonging to the buffer *after* the DMA operation has started to fill the DRAM. In order to avoid this, a new invalidation is required *before* starting the DMA operation. The patch just adds an invalidation before submitting the DMA command. ====================== And your point about NXP PCIe and x86 being cache coherent is a good one: Actually the ARM SBSA specification demands the same, so systems complying with this (servers, typically) would probably also work already. But those smartphone-class SoCs are probably not coherent. It seems like drivers for PCI devices in U-Boot seem to assume some properties based on the host controllers and systems they have been tested on, as I found other issues lately. Some drivers do cache maintenance, others don't at all. I will try to test NVMe on some other ARM platforms as soon as I can get a hand on one. Many thanks! Andre > The example below shows the nvme disk scan result without the following > patch > > => nvme scan > nvme_get_info_from_identify: nn = 544502629, vwc = 100, > sn = dev_0T, mn = `�\�, fr = t_part, mdts = 105 > > So, invalidating the cache before submitting the admin command, > fix the cpu read. > > Michael >> >> Regards, >> Bin > > > > > -- > | Michael Nazzareno Trimarchi Amarula Solutions BV | > | COO - Founder Cruquiuskade 47 | > | +31(0)851119172 Amsterdam 1018 AM NL | > | [`as] http://www.amarulasolutions.com | >