Computing TranslationBlock flags is pretty expensive on ARM, especially 32-bit. Because tbflags are computed on every tb lookup, it is not unlikely to see cpu_get_tb_cpu_state close to the top of the profile now that QHT makes the hash table much more efficient.
However, most tbflags only change when the EL is switched or after MSR instructions. Based on this observation, this series caches these tbflags in CPUARMState, resulting in a 10-15% speedup on 32-bit code. Paolo Paolo Bonzini (3): target-arm: introduce cpu_dynamic_tb_cpu_flags target-arm: add env->tbflags target-arm: cache most tbflags target-arm/cpu.c | 2 ++ target-arm/cpu.h | 58 ++++++++++++++++++++++++++++++++-------------- target-arm/helper.c | 2 ++ target-arm/helper.h | 1 + target-arm/op_helper.c | 7 ++++++ target-arm/translate-a64.c | 4 ++++ target-arm/translate.c | 12 ++++++++-- target-arm/translate.h | 1 + 8 files changed, 68 insertions(+), 19 deletions(-) -- 2.7.4