On Thu, Mar 29, 2018 at 16:19:39 +0100, Alex Bennée wrote: > > Emilio G. Cota <c...@braap.org> writes: > > > Use the recently-gained QHT feature of returning the matching TB if it > > already exists. This allows us to get rid of the lookup we perform > > right after acquiring tb_lock. > > > > Suggested-by: Richard Henderson <r...@twiddle.net> > > Signed-off-by: Emilio G. Cota <c...@braap.org> > > --- > > accel/tcg/cpu-exec.c | 14 ++------------ > > accel/tcg/translate-all.c | 47 > > ++++++++++++++++++++++++++++++++++++++--------- > > 2 files changed, 40 insertions(+), 21 deletions(-) > > > > diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c > > index 7c83887..8aed38c 100644 > > --- a/accel/tcg/cpu-exec.c > > +++ b/accel/tcg/cpu-exec.c > > @@ -243,10 +243,7 @@ void cpu_exec_step_atomic(CPUState *cpu) > > if (tb == NULL) { > > mmap_lock(); > > tb_lock(); > > - tb = tb_htable_lookup(cpu, pc, cs_base, flags, cf_mask); > > - if (likely(tb == NULL)) { > > - tb = tb_gen_code(cpu, pc, cs_base, flags, cflags); > > - } > > + tb = tb_gen_code(cpu, pc, cs_base, flags, cflags); > > tb_gen_code needs to be renamed to reflect it's semantics. > tb_get_or_gen_code? Or maybe tb_get_code with a sub-helper to do the > generation.
I think it can remain as tb_gen_code. The caller still gets a TB, and whether that TB has been generated by this thread or any other thread is irrelevant. (snip) > > --- a/accel/tcg/translate-all.c > > +++ b/accel/tcg/translate-all.c > > @@ -1503,12 +1503,16 @@ static inline void tb_page_add(PageDesc *p, > > TranslationBlock *tb, > > * (-1) to indicate that only one page contains the TB. > > * > > * Called with mmap_lock held for user-mode emulation. > > + * > > + * Returns @tb or an existing TB that matches @tb. > > That's just confusing to read. So this returns a TB like the @tb we > passed in but actually a different one matching the same conditions? Good point. Here tb_link_page is not a great name, but instead of adding a long name such as tb_link_page_or_get_existing, in v2 I've expanded the above comment. It now looks as follows: * Returns a pointer @tb, or a pointer to an existing TB that matches @tb. * Note that in !user-mode, another thread might have already added a TB * for the same block of guest code that @tb corresponds to. In that case, * the caller should discard the original @tb, and use instead the returned TB. > > @@ -1706,7 +1727,15 @@ TranslationBlock *tb_gen_code(CPUState *cpu, > > * memory barrier is required before tb_link_page() makes the TB > > visible > > * through the physical hash table and physical page list. > > */ > > - tb_link_page(tb, phys_pc, phys_page2); > > + existing_tb = tb_link_page(tb, phys_pc, phys_page2); > > + /* if the TB already exists, discard what we just translated */ > > So are we in the position now that we could potentially do a translation > but be beaten by another thread generating the same code? Exactly. > I suspect we could > do with a bit of explanatory commentary for the tb_gen_code functions. As I said above I don't think tb_gen_code changes at all to its callers, since the caller still gets a TB pointer that it did not have before. tb_link_page is the key here -- I hope the updated comment I quoted above is enough to make things clear. > Also I think the "Translation Blocks" section needs updating in the > MTTCG design document to make this clear. I've added a comment at the bottom of that section: Parallel code generation is supported. QHT is used at insertion time as the synchronization point across threads, thereby ensuring that we only keep track of a single TranslationBlock for each guest code block. > I'm curious if we should be counting unused translations somewhere in > the JIT stats. I'm guessing you need to work at a pathalogical case to > hit this much? This should be extremely rare on most workloads. Given that and the fact that we won't have unused translated code (we discard it by resetting code_gen_ptr), I wouldn't worry too much about this. In the unlikely case that it ever became a problem, TCG profiling time would account for it, and on a perf profile we'd see the slow path in tb_link_page being taken. Thanks, Emilio