On 28/08/2015 10:08, Pavel Dovgaluk wrote: >> From: Aurelien Jarno [mailto:aurel...@aurel32.net] >> On 2015-08-13 14:12, Leon Alrae wrote: >>> On 10/07/2015 10:57, Pavel Dovgalyuk wrote: >>>> @@ -2364,14 +2363,12 @@ static void gen_st_cond (DisasContext *ctx, >>>> uint32_t opc, int rt, >>>> #if defined(TARGET_MIPS64) >>>> case OPC_SCD: >>>> case R6_OPC_SCD: >>>> - save_cpu_state(ctx, 1); >>>> op_st_scd(t1, t0, rt, ctx); >>>> opn = "scd"; >>>> break; >>>> #endif >>>> case OPC_SC: >>>> case R6_OPC_SC: >>>> - save_cpu_state(ctx, 1); >>>> op_st_sc(t1, t0, rt, ctx); >>>> opn = "sc"; >>>> break; >>> >>> Wouldn't we be better off assuming that conditional stores in linux-user >>> always take an exception (we generate fake EXCP_SC exception) and avoid >>> retranslation? After applying these changes I observed significant impact on >>> performance in linux-user multithreaded apps, for instance c11-atomic-exec >>> test before the change took just 2 seconds to finish, whereas now more than >>> 30... >> >> This really show the impact of retranslation and why we should avoid >> it when not necessary. Coming back to the issue here, the fact that we >> go through retranslation is actually due to the fact that >> helper_raise_exception has been changed to go through retranslation. >> >> Given the code path between user-mode and softmmu is quite different, >> we definitely need a different code path wrt exception and retranslation >> for the two cases. That said if we want deterministic code execution >> (the original purpose of this patch), I don't see how we can do without >> forcing retranslation. Pavel, do you have an idea for that? > > There is only one case when we can execute without retranslation - > when the instruction is the last instruction in translation block. > Then we can setup PC and flags before this last instruction. > If the exception happens, we can just break the execution. > The drawback of this method is breaking translation blocks into > the smaller parts.
c11-atomic-exec.4 test execution time in linux-user: * no changes: real 0m3.039s user 0m2.976s sys 0m1.908s * tb_lock + patch: real 1m1.167s user 0m57.240s sys 0m36.678s * tb_lock + patch + SC-without-retranslation: real 0m3.016s user 0m2.988s sys 0m1.848s I had to add tb_lock() to cpu_restore_state() in the first place, otherwise all of my multithreaded user mode tests crash QEMU with this patch. SC-without-retranslation (the diff below) seems to improve the situation, and if I understand correctly we retain deterministic code execution. Therefore if there are no objections I'll apply this patch + SC correction to mips-next. Thanks, Leon diff --git a/target-mips/translate.c b/target-mips/translate.c index 99b99c5..006cb96 100644 --- a/target-mips/translate.c +++ b/target-mips/translate.c @@ -2060,7 +2060,7 @@ static inline void op_st_##insn(TCGv arg1, TCGv arg2, int rt, DisasContext *ctx) tcg_gen_movi_tl(t0, rt | ((almask << 3) & 0x20)); \ tcg_gen_st_tl(t0, cpu_env, offsetof(CPUMIPSState, llreg)); \ tcg_gen_st_tl(arg1, cpu_env, offsetof(CPUMIPSState, llnewval)); \ - gen_helper_0e0i(raise_exception, EXCP_SC); \ + generate_exception_end(ctx, EXCP_SC); \ gen_set_label(l2); \ tcg_gen_movi_tl(t0, 0); \ gen_store_gpr(t0, rt); \