On 5/16/23 06:56, Peter Maydell wrote:
On Tue, 16 May 2023 at 14:51, Richard Henderson
<richard.hender...@linaro.org> wrote:
On 5/16/23 06:29, Peter Maydell wrote:
On Mon, 15 May 2023 at 15:38, Richard Henderson
<richard.hender...@linaro.org> wrote:
We have code in atomic128.h noting that through GCC 8, there
was no support for atomic operations on __uint128. This has
been fixed in GCC 10. But we can still improve over any
basic compare-and-swap loop using the ldxp/stxp instructions.
Signed-off-by: Richard Henderson <richard.hender...@linaro.org>
---
accel/tcg/ldst_atomicity.c.inc | 60 ++++++++++++++++++++++++++++++++--
1 file changed, 57 insertions(+), 3 deletions(-)
diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc
index 69c1c61997..c3b2b35823 100644
--- a/accel/tcg/ldst_atomicity.c.inc
+++ b/accel/tcg/ldst_atomicity.c.inc
@@ -263,7 +263,22 @@ static Int128 load_atomic16_or_exit(CPUArchState *env,
uintptr_t ra, void *pv)
* In system mode all guest pages are writable, and for user-only
* we have just checked writability. Try cmpxchg.
*/
-#if defined(CONFIG_CMPXCHG128)
+#if defined(__aarch64__)
+ /* We can do better than cmpxchg for AArch64. */
+ {
+ uint64_t l, h;
+ uint32_t fail;
+
+ /* The load must be paired with the store to guarantee not tearing. */
+ asm("0: ldxp %0, %1, %3\n\t"
+ "stxp %w2, %0, %1, %3\n\t"
+ "cbnz %w2, 0b"
+ : "=&r"(l), "=&r"(h), "=&r"(fail) : "Q"(*p));
+
+ qemu_build_assert(!HOST_BIG_ENDIAN);
+ return int128_make128(l, h);
+ }
The compiler (well, clang 11, anyway) seems able to generate equivalent
code to this inline asm:
See above, where GCC 8 can do nothing, and that is still a supported compiler.
Yeah, but it'll work fine even without the explicit inline
asm, right?
No, GCC < 10 does not support __sync_* or __atomic_* on __uint128_t at all.
Is the performance difference critical enough to justify
an inline asm implementation that's only needed for older
compilers ?
Yes. It's the difference between stop-the-world and not.
r~