On 10/19/2016 04:32 PM, york sun wrote:
On 10/19/2016 12:18 PM, Stephen Warren wrote:
On 10/19/2016 09:25 AM, Stephen Warren wrote:
On 10/14/2016 02:17 PM, York Sun wrote:
Current code turns off d-cache first, then flush all levels of cache.
This results data loss. As soon as d-cache is off, the dirty cache
is discarded according to the test on LS2080A. This issue was not
seen as long as external L3 cache was flushed to push the data to
main memory. However, external L3 cache is not guaranteed to have
the data. To fix this, flush the d-cache by way/set first to make
sure cache is clean before turning it off.

diff --git a/arch/arm/cpu/armv8/cache_v8.c
b/arch/arm/cpu/armv8/cache_v8.c

@@ -478,9 +478,9 @@ void dcache_disable(void)

+    flush_dcache_all();
     set_sctlr(sctlr & ~(CR_C|CR_M));

-    flush_dcache_all();
     __asm_invalidate_tlb_all();

I talked to Mark Rutland at ARM, and I believe the current code is
correct. Here's my interpretation of what he said:

The dcache must be disabled first. This prevents allocation of new
entries in the cache during the flush operation, which prevents the race
conditions that were mentioned in the other thread.

Then, the flush operation must be invoked. Since the cache is now
disabled, this can fully flush the cache without worrying about racing
with things being added to the cache.

This all implies that the implementation of dcache_disable(),
set_sctlr(), flush_dcache_all(), and any code they call must not access
data in DRAM at all; since because the dcache is off, any DRAM access
will[1] read potentially stale data from DRAM, rather than any dirty
data that might be in the cache.

[1] I'm not sure if that's "will" or "may", i.e. whether this is
architecturally guaranteed in ARMv8 or is implementation defined. At
least the Cortex A72 TRM says "will" for that CPU; not sure about others.

Perhaps the most obvious upshot of this is that the stack can't be used.
This implies to me that we need to recode all those functions purely in
assembly, or just possibly play some tricks to 100% force gcc not to
touch memory anywhere inside dcache_disable() or the functions it calls.
We're just getting lucky here right now since everything happens to be
inlined, but I don't think we're doing anything to 100% guarantee this.

What worries me here is that at least on Tegra, a "flush the entire
dcache" operation requires an SMC call to the secure monitor. That will
certainly access DRAM when the secure monitor runs, but perhaps this
doesn't matter since that's at a different exception level, and we know
the secure monitor accesses DRAM regions that are separate from U-Boot's
DRAM? I suspect life isn't that convenient. I'm wondering if this all
implies that, like patch 2 in this series, we need to get 100% away from
flush-by-set/way, even with SoC-specific hooks to make that work
reliably, and just flush everything by VA, which IIRC is architecturally
guaranteed to work without SoC-specific logic. That way, we can
encapsulate everything into an assembly function without worrying about
calling SMCs or SoC-specific hook functions without using DRAM. Of
course, how that assembly function knows which VAs to flush without
decoding the page tables or other data structure is another matter:-(

Re: the last paragraph there:

After reading the ARMv8 ARM, I see that EL1, EL2, and EL3 all have
separate cache enable bits in SCTLR_ELx. I believe U-Boot only needs to
flush/... its own RAM out of the dcache, since that's all that's
relevant at the EL U-Boot is running at, and doesn't care about anything
EL3 might have mapped cached. So, it's safe to invoke SMCs from the
cache flush code in U-Boot even if the EL3 code touches its own DRAM.
There might be a corner case where this isn't true if EL3 has some
EL1/EL2-owned RAM mapped, but I don't expect that to be the case here.

Re: my other series to add more cache hooks: I'll re-implement the Tegra
hook in assembly so it's guaranteed not to touch RAM, retest, and resumbit.

If it turns out that dcache_disable() ever starts touching DRAM at the
wrong time, we can deal with that then; it doesn't now at least.


Stephen,

I think one of our difference is whether the data is returned correctly
with dirty dcache being disabled. With current code flow, the dcache is
disabled, followed by flushing/invalidating by way/set, then L3 cache is
flushed. I see correct stack after these steps. During my recent attempt
to remove flushing L3, I added code to flush the stack by VA (which I
already confirmed the data will end up in main memory) to replace
flushing L3. I found as soon as the dcache is disabled, the dirty cache
data is lost when trying to access from the core (debugged by JTAG
tool). This makes me to think the order may be wrong. By reverting the
order, i.e. flushing dcache first then disable it, I can see correct
data in main memory. I understand the proposed new code requires not
touching stack or any data after the first flush. I prefer to not to
make this change if we have an explanation of what I saw.

I think the confusion is: what does "lost" mean?

On ARMv8 at least:

Starting off with dcache enabled and containing some dirty data:

CPU reads will read the dirty data from the cache.

Now if dcache is disabled:

CPU reads will be of the uncached/uncachable type since dcache is off. The dcache will not respond. So, the CPU will receive (potentially) stale data directly from RAM.

In this state, it is not possible to access the dirty data in the caches. However, it's not (permanently) lost...

Now if the dcache is fully flushed:

All dirty data that was in the dcache will be written to RAM. After this process is complete, any reads from RAM will return the most up-to-date possible data; the temporarily "lost" dirty data is now no longer lost.


The cache disable process must proceed as above; we can't flush the dcache and then disable it. Attempting to do things in that order would introduce race conditions. At all times before the dcache is disabled, any CPU writes to memory will allocate new lines in the dcache. Also, the dcache is fully responsive to the cache coherency protocol, so some other agent on the bus (another CPU, cluster, DMA agent, ...) could cause a (clean or dirty) line to be brought into the cache. This might happen after that part of the cache would have been flushed. In other words, it's impossible to fully flush the cache (by set/way) while it's enabled.


I'm not sure that flushing by VA instead of by set/way solves anything much here. I believe flushing by VA runs less risk of interference due to the coherency protocol bringing lines back into the cache, since flush by VA will affect other caches? However this still doesn't allow use of the CPU's own stack or other DRAM writes; if we:

a) Flush by VA
(this step is assumed to perform some function calls/returns, or other DRAM writes)

b) Disable cache

... then irrespective of how step (a) was performed, since the DRAM writes performed in that step are likely still in the cache as dirty data, so step (b) will cause them to be "lost" until a full flush operation is performed. Thus we must always flush after disabling the cache, so there's no point also flushing before.

_______________________________________________
U-Boot mailing list
U-Boot@lists.denx.de
http://lists.denx.de/mailman/listinfo/u-boot

Reply via email to