On Mon, Aug 15, 2016 at 11:46:32 +0100, Alex Bennée wrote: > As far as I'm aware the following work is still ongoing: > > Emilo: cmpxchg atomics > Alvise: LL/SC modelling
I've been tinkering with an experimental patch to do proper LL/SC. The idea is to rely on hardware transactional memory, so that stores don't have to be tracked. The trickiest thing is the fallback path, for which I'm trying to (ab)use EXCP_ATOMIC to execute exclusively from the ldrex all the way to the strex. To test it, I'm using aarch64-linux-user running qht-bench compiled on an aarch64 machine. I'm running on an Intel Skylake host (Skylake has no known TSX bugs) However, I'm finding issues that might not have to do with the patch itself. - On the latest MTTCG+cmpxchg tree (45c11751ed7 a.k.a. bennee/mttcg/base-patches-v4-with-cmpxchg-atomics-v2), QEMU loops forever without making progress in the instruction stream, even with taskset -c 0. - On the cmpxchg tree (rth's atomic-2 branch [1]), it works more reliably, although tb_lock is held around tb_find_fast so parallelism isn't very high. Still, it sometimes triggers the assert below. - Applying the "remove tb_lock around hot path" patch makes it easier to trigger this assert in cpu-exec.c:650 (approx.): /* Assert that the compiler does not smash local variables. */ g_assert(cpu == current_cpu) I've also seen triggered the assert immediately after that one, as well as the rcu_read_unlock depth assert. The asserts are usually triggered when all threads exit (by returning NULL) at roughly the same time. However, they cannot be triggered with taskset -c 0, which makes me suspect that somehow start_exclusive isn't working as intended. Any tips would be appreciated! I'll reply with a patch that uses RTM, the one below is fallback path all the way, and the best to reproduce the above. Thanks, Emilio [1] https://github.com/rth7680/qemu/commits/atomic-2 >From ed6af6eb364e5a36e81d7cc8143c0e9783c50587 Mon Sep 17 00:00:00 2001 From: "Emilio G. Cota" <c...@braap.org> Date: Mon, 15 Aug 2016 00:27:42 +0200 Subject: [PATCH] aarch64: use TSX for ldrex/strex (fallback path only) Signed-off-by: Emilio G. Cota <c...@braap.org> --- linux-user/main.c | 5 +++-- target-arm/helper-a64.c | 23 +++++++++++++++++++++++ target-arm/helper-a64.h | 4 ++++ target-arm/translate-a64.c | 15 +++++++++------ 4 files changed, 39 insertions(+), 8 deletions(-) diff --git a/linux-user/main.c b/linux-user/main.c index 9880505..6922faa 100644 --- a/linux-user/main.c +++ b/linux-user/main.c @@ -192,8 +192,9 @@ static void step_atomic(CPUState *cpu) /* Since we got here, we know that parallel_cpus must be true. */ parallel_cpus = false; - cpu_exec_step(cpu); - parallel_cpus = true; + while (!parallel_cpus) { + cpu_exec_step(cpu); + } end_exclusive(); } diff --git a/target-arm/helper-a64.c b/target-arm/helper-a64.c index 8ce518b..a97b631 100644 --- a/target-arm/helper-a64.c +++ b/target-arm/helper-a64.c @@ -579,3 +579,26 @@ uint64_t HELPER(paired_cmpxchg64_be)(CPUARMState *env, uint64_t addr, return !success; } + +void HELPER(xbegin)(CPUARMState *env) +{ + uintptr_t ra = GETPC(); + + if (parallel_cpus) { + cpu_loop_exit_atomic(ENV_GET_CPU(env), ra); + } +} + +void HELPER(xend)(void) +{ + assert(!parallel_cpus); + parallel_cpus = true; +} + +uint64_t HELPER(x_ok)(void) +{ + if (!parallel_cpus) { + return 1; + } + return 0; +} diff --git a/target-arm/helper-a64.h b/target-arm/helper-a64.h index dd32000..e7ede43 100644 --- a/target-arm/helper-a64.h +++ b/target-arm/helper-a64.h @@ -48,3 +48,7 @@ DEF_HELPER_FLAGS_3(crc32_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32) DEF_HELPER_FLAGS_3(crc32c_64, TCG_CALL_NO_RWG_SE, i64, i64, i64, i32) DEF_HELPER_FLAGS_4(paired_cmpxchg64_le, TCG_CALL_NO_WG, i64, env, i64, i64, i64) DEF_HELPER_FLAGS_4(paired_cmpxchg64_be, TCG_CALL_NO_WG, i64, env, i64, i64, i64) + +DEF_HELPER_1(xbegin, void, env) +DEF_HELPER_0(x_ok, i64) +DEF_HELPER_0(xend, void) diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c index 450c359..cfcf440 100644 --- a/target-arm/translate-a64.c +++ b/target-arm/translate-a64.c @@ -1760,6 +1760,8 @@ static void gen_load_exclusive(DisasContext *s, int rt, int rt2, TCGv_i64 tmp = tcg_temp_new_i64(); TCGMemOp be = s->be_data; + gen_helper_xbegin(cpu_env); + g_assert(size <= 3); if (is_pair) { TCGv_i64 hitmp = tcg_temp_new_i64(); @@ -1825,6 +1827,9 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2, tcg_gen_brcond_i64(TCG_COND_NE, addr, cpu_exclusive_addr, fail_label); tmp = tcg_temp_new_i64(); + gen_helper_x_ok(tmp); + tcg_gen_brcondi_i64(TCG_COND_EQ, tmp, 0, fail_label); + if (is_pair) { if (size == 2) { TCGv_i64 val = tcg_temp_new_i64(); @@ -1844,16 +1849,14 @@ static void gen_store_exclusive(DisasContext *s, int rd, int rt, int rt2, } } else { TCGv_i64 val = cpu_reg(s, rt); - tcg_gen_atomic_cmpxchg_i64(tmp, addr, cpu_exclusive_val, val, - get_mem_index(s), - size | MO_ALIGN | s->be_data); - tcg_gen_setcond_i64(TCG_COND_NE, tmp, tmp, cpu_exclusive_val); + tcg_gen_qemu_st_i64(val, addr, get_mem_index(s), s->be_data + size); } tcg_temp_free_i64(addr); - - tcg_gen_mov_i64(cpu_reg(s, rd), tmp); tcg_temp_free_i64(tmp); + + tcg_gen_movi_i64(cpu_reg(s, rd), 0); + gen_helper_xend(); tcg_gen_br(done_label); gen_set_label(fail_label); -- 2.7.4