Hi all, I'm trying to track down what appears to be a translation bug in either the aarch64 target or x86_64 TCG (in multithreaded mode). The symptoms are entirely consistent with a torn read/write -- that is, a 64-bit load or store that was translated to two 32-bit loads and stores -- but that's obviously not what happens in the common path through the translation for this code, so I'm wondering: are there any cases in which qemu will split a 64-bit memory access into two 32-bit accesses?
The code: Guest CPU A writes a 64-bit value to an aligned memory location that was previously 0, using a regular store; e.g.: f9000034 str x20,[x1] Guest CPU B (who is busy-waiting) reads a value from the same location: f9400280 ldr x0,[x20] The symptom: CPU B loads a value that is neither NULL nor the value written. Instead, x0 gets only the low 32-bits of the value written (high bits are all zero). By the time this value is dereferenced (a few instructions later) and the exception handlers run, the memory location from which it was loaded has the correct 64-bit value with a non-zero upper half. Obviously on a real ARM memory barriers are critical, and indeed the code has such barriers in it, but I'm assuming that any possible mistranslation of the barriers is irrelevant because for a 64-bit load and a 64-bit store you should get all or nothing. Other clues that may be relevant: the code is _near_ a LDREX/STREX pair (the busy-waiting is used to resolve a race when updating another variable), and the busy-wait loop has a yield instruction in it (although those appear to be no-ops with MTTCG). The bug repros more easily with more guest VCPUs, and more load on the host (i.e. more context switching to expose the race). It doesn't repro for the single-threaded TCG. Unfortunately it's hard to get detailed trace information, because the bug only repros roughly every one in 40 attempts, and it's a long way into the guest OS boot before it arises. I'm not yet 100% convinced this is a qemu bug -- the obvious path through the translator for those instructions does 64-bit memory accesses on the host -- but at the same time, it has never been seen outside qemu, and after staring long and hard at the guest code, we're pretty sure it's correct. It's also extremely unlikely to be a wild write, given that it occurs on a wide variety of guest call-stacks, and the memory is later inconsistent with what was loaded. Any clues or debugging suggestions appreciated! Thanks, Andrew