On 2/2/21 10:12 AM, Bin Meng wrote:
[...]
cache aligned in memory, however the cache operations are called on
the structure sizes, which themselves might not be cache aligned. Add
the necessary rounding to fix this, which permits the nvme to work on
arm64.
+ARM guys
Which ARM64 SoC did you test this with?
RCar3, although that's irrelevant, the problem will happen on any arm or
arm64, and possibly any other system which needs cache management.
There was a recent change to nvme.c that fixed a cache issue on ARMv8
so I thought this might be platform related.
I used master, so unlikely.
It's strange this issue was not exposed last time when this driver was
tested on ARMv8.
Which ARMv8 platform ?
The round down in this patch should be unnecessary.
Can you explain why ?
I just took a further look and most of the start address should be
cache line aligned (4KiB aligned) except the
nvme_read_completion_status(). It's only 16 bytes aligned which might
not be cache line aligned.
Right, there are various arm chips with 32B/64B alignment requirements.
But it's better to
figure out which call to dcache_xxx() with an unaligned end address.
If you look at the code, most of them can (and do) trigger this,
therefore they need such alignment, as explained in the commit message.
Now I wonder what's the correct implementation of the
invalidate_dcache_range() and flush_dcache_range() in U-Boot?
Shouldn't the round down/up happen in these APIs instead of doing such
in drivers?
Definitely not, because then the rounding might flush/invalidate cache
over areas where this could cause a problem (e.g. neighboring DMA
descriptors). The driver must do the cache management correctly.
Well we can implement in these APIs and document its expected usage.
That would be harmful, because you cannot really flush/invalidate half
of cache line. Consider you have two 16B DMA descriptors next to each
other, which is often the case, then the driver has to implement proper
cache handling.
That this particular driver already has structures reasonably well
padded that they can be flushed by adding rounding is an exception.
Either way a driver has to do the cache management correctly. Not
doing it in the driver eliminates some duplications of rounding
up/down.
Indeed, and rounding up/down is the correct way to do it in this driver,
see the commit message.
For this case, I believe we just need to take care of
nvme_read_completion_status().
Can you validate this ?