> From: Leon Alrae [mailto:leon.al...@imgtec.com] > On 28/08/2015 10:08, Pavel Dovgaluk wrote: > >> From: Aurelien Jarno [mailto:aurel...@aurel32.net] > >> On 2015-08-13 14:12, Leon Alrae wrote: > >>> On 10/07/2015 10:57, Pavel Dovgalyuk wrote: > >>>> @@ -2364,14 +2363,12 @@ static void gen_st_cond (DisasContext *ctx, > >>>> uint32_t opc, int rt, > >>>> #if defined(TARGET_MIPS64) > >>>> case OPC_SCD: > >>>> case R6_OPC_SCD: > >>>> - save_cpu_state(ctx, 1); > >>>> op_st_scd(t1, t0, rt, ctx); > >>>> opn = "scd"; > >>>> break; > >>>> #endif > >>>> case OPC_SC: > >>>> case R6_OPC_SC: > >>>> - save_cpu_state(ctx, 1); > >>>> op_st_sc(t1, t0, rt, ctx); > >>>> opn = "sc"; > >>>> break; > >>> > >>> Wouldn't we be better off assuming that conditional stores in linux-user > >>> always take an exception (we generate fake EXCP_SC exception) and avoid > >>> retranslation? After applying these changes I observed significant impact > >>> on > >>> performance in linux-user multithreaded apps, for instance c11-atomic-exec > >>> test before the change took just 2 seconds to finish, whereas now more > >>> than 30... > >> > >> This really show the impact of retranslation and why we should avoid > >> it when not necessary. Coming back to the issue here, the fact that we > >> go through retranslation is actually due to the fact that > >> helper_raise_exception has been changed to go through retranslation. > >> > >> Given the code path between user-mode and softmmu is quite different, > >> we definitely need a different code path wrt exception and retranslation > >> for the two cases. That said if we want deterministic code execution > >> (the original purpose of this patch), I don't see how we can do without > >> forcing retranslation. Pavel, do you have an idea for that? > > > > There is only one case when we can execute without retranslation - > > when the instruction is the last instruction in translation block. > > Then we can setup PC and flags before this last instruction. > > If the exception happens, we can just break the execution. > > The drawback of this method is breaking translation blocks into > > the smaller parts. > > c11-atomic-exec.4 test execution time in linux-user: > > * no changes: > real 0m3.039s > user 0m2.976s > sys 0m1.908s > > * tb_lock + patch: > real 1m1.167s > user 0m57.240s > sys 0m36.678s > > * tb_lock + patch + SC-without-retranslation: > real 0m3.016s > user 0m2.988s > sys 0m1.848s > > I had to add tb_lock() to cpu_restore_state() in the first place, otherwise > all of my multithreaded user mode tests crash QEMU with this patch. > > SC-without-retranslation (the diff below) seems to improve the situation, > and if I understand correctly we retain deterministic code execution. > Therefore if there are no objections I'll apply this patch + SC correction > to mips-next.
diff below implements exactly what I meant. Pavel Dovgalyuk > > diff --git a/target-mips/translate.c b/target-mips/translate.c > index 99b99c5..006cb96 100644 > --- a/target-mips/translate.c > +++ b/target-mips/translate.c > @@ -2060,7 +2060,7 @@ static inline void op_st_##insn(TCGv arg1, TCGv arg2, > int rt, > DisasContext *ctx) > tcg_gen_movi_tl(t0, rt | ((almask << 3) & 0x20)); > \ > tcg_gen_st_tl(t0, cpu_env, offsetof(CPUMIPSState, llreg)); > \ > tcg_gen_st_tl(arg1, cpu_env, offsetof(CPUMIPSState, llnewval)); > \ > - gen_helper_0e0i(raise_exception, EXCP_SC); > \ > + generate_exception_end(ctx, EXCP_SC); > \ > gen_set_label(l2); > \ > tcg_gen_movi_tl(t0, 0); > \ > gen_store_gpr(t0, rt); > \