On Mon, 8 Feb 2021 16:49:58 +0100 Marek Vasut <marek.va...@gmail.com> wrote:
Hi, > On 2/8/21 2:32 PM, Andre Przywara wrote: > [...] > >>>>>>>>>>> +static void nvme_flush_dcache_range(void *start, unsigned long > >>>>>>>>>>> size) > >>>>>>>>>>> +{ > >>>>>>>>>>> + unsigned long s, e; > >>>>>>>>>>> + nvme_align_dcache_range(start, size, &s, &e); > >>>>>>>>>>> + flush_dcache_range(s, e); > >>>>>>>>> > >>>>>>>>> There is no good reason for alignment restrictions when it comes to > >>>>>>>>> clean (& invalidate), so there is no need for this wrapper. > >>>>>>>> > >>>>>>>> Is that on ARM64-specific or is that applicable in general ? The > >>>>>>>> driver > >>>>>>>> is expected to work on any CPU. > >>>>>>> > >>>>>>> Cache clean (actually: cache clean&invalidate) is what happens on > >>>>>>> evictions > >>>>>>> all of the time, at the cache controller's discretion. So there is no > >>>>>>> real harm in that operation per se. When an eviction happens on a > >>>>>>> *clean* cache line, this is basically just an invalidate, which is > >>>>>>> also not > >>>>>>> harmful. > >>>>>>> > >>>>>>> There are harmful cases when buffers sharing a cache line are both > >>>>>>> "just invalidated" > >>>>>>> and "cleaned" at different points in time. > >>>>>> > >>>>>> Is that on ARM64-specific or is that applicable in general ? (the above > >>>>>> does not answer that question) > >>>>> > >>>>> I would say that's a property of *every* write-back cache > >>>>> implementation: > >>>>> https://en.wikipedia.org/wiki/Cache_(computing)#/media/File:Write-back_with_write-allocation.svg > >>>>> > >>>> > >>>> I've been reading and digesting the thread as it goes, and the only > >>>> thing I do want to chime in on here right now is that yes, U-Boot > >>>> does and will continue to support every CPU that someone wants to run it > >>>> on, and one of the takeaways I see from this thread is we need some > >>>> better documented abstractions around cache, as it's very tricky to get > >>>> right all the time. > >>> > >>> Documenting the u-boot cache function behavior precisely is fine by me, > >>> but > >>> that is somewhat separate topic from this bugfix. > >>> > >>> So I will ask a simple question -- is there anything blocking this bugfix > >>> from being applied ? > >> > >> While I can fix the typo, I'm waiting for an Ack/Reviewed-by from Bin > >> (as he's spent the most time on this driver of late) and I'd really like > >> one from Andre, or at least him agreeing this patch is a step in the > >> right direction. > > > > As I said: I don't see how this patch changes anything on arm64, which > > the commit messages claims to be the reason for this post. > > If someone please can confirm, but invalidate_dcache_range() always > > works on arm64, in fact does the very rounding already that this patch > > introduces here. So what is the actual fix here? > > You can NOT depend on the behavior of cache functions on specific > architecture, U-Boot is universal and this must work on every single > architecture, not just aarch64. I totally understand and support that! See my patch to address the problem. What I am wondering is how your patch fixes anything on *arm64*, when there is no actual change in the argument to the "dc ivac" instruction? > The entire point of this patch is to call flush/invalidate only with > cacheline-aligned addresses, which is what they expect, otherwise > these functions do no operation at all. First, "doing no operation at all" is some debatable behaviour, I'd rather see it panic or at least always print an error. Secondly: this is not the case in this particular case (arm64) that you mention in the commit message: The invalidate goes through anyway. > > Plus, I consider this blanket rounding more harmful than helpful, since > > it actually just silences the check_cache_range() check. > > Can you back this claim with something ? Because I spent my fair share > of time analyzing the driver and the rounding is exactly sufficient to > make the cache ops work correctly. So here is my point (again): When we want to invalidate a buffer, it must be cache line aligned to work correctly - not (only) because of U-Boot's check, but because you run into problems with invalidating *other* (potentially dirty) data sharing the same cache line otherwise. That is the whole point of check_cache_range() in the first place. And I think we agree on this. Now: How do you prevent this problem by just rounding the *addresses* you pass to the cache maintenance instruction? If a buffer address is not aligned, you want to know about it! Normally all buffer addresses should be *always* aligned, the driver needs to be designed this way. So there is no need for rounding. When now an unaligned address sneaks in - either due to a bug or an inherent driver design problem - I actually want to see fireworks! At least the check_cache_range() check should trigger. With blanket rounding you deliberately disable this check, and invalidate more than you want to invalidate - without any warnings. > > If we include armv7 here, which in fact would ignore(!) an unaligned > > invalidate > > Yes, on armv7 and older, the cache ops behave as they were designed to > behave. Fair enough, we can discuss enabling those checks for armv8 as well, but at the moment they are not in place yet, so play no role in this particular problem. > >, this is my analysis (just for invalidate): > > nvme_read_completion_status(): NEEDS ALIGNMENT > > size smaller than cache line, cqes[1] base address unaligned. > > fix needed, rounding *could* be a temporary fix with comments > > as of why this is legitimate in this case. > > Better alternative sketched in a previous email. > > nvme_identify(): OK > > struct nvme_id_ctrl is 4K, if I counted correctly, so is fine. > > buffer comes the callers, the in-file users pass an aligned > > address, the only external user I see is in nvme_show.c, which > > also explicitly aligns the buffer. > > nvme_blk_rw(): OK > > total_len seems to be a multiple of block size, so is hopefully > > already cache line aligned. > > buffer comes from the upper layers, I guess it's page > > aligned already (at least 512b sector aligned)? > > > > I turned my sketch for the cqes[] fix into a proper patch and will send > > this in a minute > Please add check_cache_range() to armv8 cache ops and test whether there > are no warnings from it, thanks. I don't have a working setup at the moment, I need to first find my NVMe adaptor and stuff that into a Juno - which in the *middle* of a pile of boxes :-( And did *you* do this? Can you report what is the particular problem? Which of the functions trigger the check? From how I understand the code, enabling the check (just the check!) would indeed trigger the warning at the moment, from nvme_read_completion_status(), but wouldn't change anything otherwise: cache.S: __asm_invalidate_dcache_range already rounds the address passed in, so you end up invalidating the same area, before and after. Cheers, Andre