x86-64 MTTCG?

Emilio G. Cota Mon, 24 Jul 2017 14:25:59 -0700

(Adding some Cc's)

On Mon, Jul 24, 2017 at 19:05:33 +0000, Andrew Baumann via Qemu-devel wrote:
> Hi all,
>
> I'm trying to track down what appears to be a translation bug in either
> the aarch64 target or x86_64 TCG (in multithreaded mode). The symptoms
> are entirely consistent with a torn read/write -- that is, a 64-bit
> load or store that was translated to two 32-bit loads and stores --
> but that's obviously not what happens in the common path through the
> translation for this code, so I'm wondering: are there any cases in
> which qemu will split a 64-bit memory access into two 32-bit accesses?


That would be a bug in MTTCG.

> The code: Guest CPU A writes a 64-bit value to an aligned memory
> location that was previously 0, using a regular store; e.g.:
>       f9000034 str         x20,[x1]
>
> Guest CPU B (who is busy-waiting) reads a value from the same location:
>       f9400280 ldr         x0,[x20]
>
> The symptom: CPU B loads a value that is neither NULL nor the value
> written. Instead, x0 gets only the low 32-bits of the value written
> (high bits are all zero). By the time this value is dereferenced (a
> few instructions later) and the exception handlers run, the memory
> location from which it was loaded has the correct 64-bit value with
> a non-zero upper half.
>
> Obviously on a real ARM memory barriers are critical, and indeed
> the code has such barriers in it, but I'm assuming that any possible
> mistranslation of the barriers is irrelevant because for a 64-bit load
> and a 64-bit store you should get all or nothing. Other clues that may
> be relevant: the code is _near_ a LDREX/STREX pair (the busy-waiting
> is used to resolve a race when updating another variable), and the
> busy-wait loop has a yield instruction in it (although those appear
> to be no-ops with MTTCG).

This might have to do with how ldrex/strex is emulated; are you relying
on the exclusive pair detecting ABA? If so, your code won't work in
QEMU since it uses cmpxchg to emulate ldrex/strex.

> The bug repros more easily with more guest VCPUs, and more load on the
> host (i.e. more context switching to expose the race). It doesn't repro
> for the single-threaded TCG. Unfortunately it's hard to get detailed
> trace information, because the bug only repros roughly every one in 40
> attempts, and it's a long way into the guest OS boot before it arises.
>
> I'm not yet 100% convinced this is a qemu bug -- the obvious path
> through the translator for those instructions does 64-bit memory
> accesses on the host -- but at the same time, it has never been seen
> outside qemu, and after staring long and hard at the guest code, we're
> pretty sure it's correct. It's also extremely unlikely to be a wild
> write, given that it occurs on a wide variety of guest call-stacks,
> and the memory is later inconsistent with what was loaded.
>
> Any clues or debugging suggestions appreciated!

- Pin the QEMU-MTTCG process to a single CPU. Can you repro then?

- Force the emulation of cmpxchg via EXCP_ATOMIC with:

diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 87f673e..771effe5 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -2856,7 +2856,7 @@ void tcg_gen_atomic_cmpxchg_i64(TCGv_i64 retv, TCGv addr, 
TCGv_i64 cmpv,
         }
         tcg_temp_free_i64(t1);
     } else if ((memop & MO_SIZE) == MO_64) {
-#ifdef CONFIG_ATOMIC64
+#if 0
         gen_atomic_cx_i64 gen;

         gen = table_cmpxchg[memop & (MO_SIZE | MO_BSWAP)];

This will halt all other vCPUs before the calling vCPU performs
the cmpxchg. Can you reproduce then?

                Emilio

Re: [Qemu-devel] Torn read/write possible on aarch64/x86-64 MTTCG?

Reply via email to