On 5/16/23 06:56, Peter Maydell wrote:
On Tue, 16 May 2023 at 14:51, Richard Henderson
<richard.hender...@linaro.org> wrote:

On 5/16/23 06:29, Peter Maydell wrote:
On Mon, 15 May 2023 at 15:38, Richard Henderson
<richard.hender...@linaro.org> wrote:

We have code in atomic128.h noting that through GCC 8, there
was no support for atomic operations on __uint128.  This has
been fixed in GCC 10.  But we can still improve over any
basic compare-and-swap loop using the ldxp/stxp instructions.

Signed-off-by: Richard Henderson <richard.hender...@linaro.org>
---
   accel/tcg/ldst_atomicity.c.inc | 60 ++++++++++++++++++++++++++++++++--
   1 file changed, 57 insertions(+), 3 deletions(-)

diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc
index 69c1c61997..c3b2b35823 100644
--- a/accel/tcg/ldst_atomicity.c.inc
+++ b/accel/tcg/ldst_atomicity.c.inc
@@ -263,7 +263,22 @@ static Int128 load_atomic16_or_exit(CPUArchState *env, 
uintptr_t ra, void *pv)
        * In system mode all guest pages are writable, and for user-only
        * we have just checked writability.  Try cmpxchg.
        */
-#if defined(CONFIG_CMPXCHG128)
+#if defined(__aarch64__)
+    /* We can do better than cmpxchg for AArch64.  */
+    {
+        uint64_t l, h;
+        uint32_t fail;
+
+        /* The load must be paired with the store to guarantee not tearing. */
+        asm("0: ldxp %0, %1, %3\n\t"
+            "stxp %w2, %0, %1, %3\n\t"
+            "cbnz %w2, 0b"
+            : "=&r"(l), "=&r"(h), "=&r"(fail) : "Q"(*p));
+
+        qemu_build_assert(!HOST_BIG_ENDIAN);
+        return int128_make128(l, h);
+    }

The compiler (well, clang 11, anyway) seems able to generate equivalent
code to this inline asm:

See above, where GCC 8 can do nothing, and that is still a supported compiler.

Yeah, but it'll work fine even without the explicit inline
asm, right?

No, GCC < 10 does not support __sync_* or __atomic_* on __uint128_t at all.

Is the performance difference critical enough to justify
an inline asm implementation that's only needed for older
compilers ?

Yes.  It's the difference between stop-the-world and not.


r~


Reply via email to