On 05/02/2019 15:18, Peter Maydell wrote:

> In commit f7b78602fdc6c6e4be we added the CPU cluster number to the
> cflags field of the TB hash; this included adding it to the value
> kept in tb->cflags, since we pass that field directly into the hash
> calculation in some places. Unfortunately we forgot to check whether
> other parts of the code were doing comparisons against tb->cflags
> that would need to be updated.
> 
> It turns out that there is exactly one such place: the
> tb_lookup__cpu_state() function checks whether the TB it has
> found in the tb_jmp_cache has a tb->cflags matching the cf_mask
> that is passed in. The tb->cflags has the cluster_index in it
> but the cf_mask does not.
> 
> Hoist the "add cluster index to the cf_mask" code up from
> tb_htable_lookup() to tb_lookup__cpu_state() so it can be considered
> in the "did this TB match in the jmp cache" condition, as well as
> when we do the full hash lookup by physical PC, flags, etc.
> (tb_htable_lookup() is only called from tb_lookup__cpu_state(),
> so this change doesn't require any further knock-on changes.)
> 
> Fixes: f7b78602fdc6c6e4be ("accel/tcg: Add cluster number to TCG TB hash")
> Reported-by: Howard Spoelstra <hsp.c...@gmail.com>
> Reported-by: Cleber Rosa <cr...@redhat.com>
> Signed-off-by: Peter Maydell <peter.mayd...@linaro.org>
> ---
> Does anybody know why tb_lookup__cpu_state() has that odd
> double-underscore in the middle of its name?
> 
> Since the jmp_cache is per-vcpu we know that we're always going
> to match on the cluster_index, so the other option would be to
> leave the cluster_index bits out of the comparison, and leave the
> "fold in cluster index to cf_mask" code in tb_htable_lookup().
> Or we could require the callers of tb_lookup__cpu_state() to all
> provide the cluster index, but that's more places to change,
> so I prefer this.
> ---
>  include/exec/tb-lookup.h | 4 ++++
>  accel/tcg/cpu-exec.c     | 3 ---
>  2 files changed, 4 insertions(+), 3 deletions(-)
> 
> diff --git a/include/exec/tb-lookup.h b/include/exec/tb-lookup.h
> index 492cb682894..26921b6dafd 100644
> --- a/include/exec/tb-lookup.h
> +++ b/include/exec/tb-lookup.h
> @@ -28,6 +28,10 @@ tb_lookup__cpu_state(CPUState *cpu, target_ulong *pc, 
> target_ulong *cs_base,
>      cpu_get_tb_cpu_state(env, pc, cs_base, flags);
>      hash = tb_jmp_cache_hash_func(*pc);
>      tb = atomic_rcu_read(&cpu->tb_jmp_cache[hash]);
> +
> +    cf_mask &= ~CF_CLUSTER_MASK;
> +    cf_mask |= cpu->cluster_index << CF_CLUSTER_SHIFT;
> +
>      if (likely(tb &&
>                 tb->pc == *pc &&
>                 tb->cs_base == *cs_base &&
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index 7cf1292546f..60d87d5a19b 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -325,9 +325,6 @@ TranslationBlock *tb_htable_lookup(CPUState *cpu, 
> target_ulong pc,
>      struct tb_desc desc;
>      uint32_t h;
>  
> -    cf_mask &= ~CF_CLUSTER_MASK;
> -    cf_mask |= cpu->cluster_index << CF_CLUSTER_SHIFT;
> -
>      desc.env = (CPUArchState *)cpu->env_ptr;
>      desc.cs_base = cs_base;
>      desc.flags = flags;
> 

Looks good to me: without performing a detailed benchmark, with this patch 
applied
the performance seems to be back to where it was before f7b78602fdc6c6e4be was 
merged.

Tested-by: Mark Cave-Ayland <mark.cave-ayl...@ilande.co.uk>


ATB,

Mark.

Reply via email to