Based on an idea forwarded by Emilio, which suggests a 5-6% speed gain is possible. I have not spent too much time measuring this, as the code size gains are significant.
I believe that I posted an x86_64-only patch some time ago, but this now includes i386, aarch64 and arm32. In late testing I do some failures on i386, for sparc guest. I'll follow up on that later. The main feature here is sharing code to place these out-of-line thunks. We want them to be within a direct call. Once we've emitted a thunk we remember (at least within a given tcg_region) reusing it until we find that the relocation is out of range. At which point we generate another copy. The second main change is that the entire TCGMemOpIdx is built into each thunk. There simply are not enough free registers for i386 (or arm32 for that matter) to pass in the mmu_idx to the thunk. For x86, this displacement is 2GB, and we've already constrained the whole code_gen_buffer to be in range. For aarch64, this displacement is 128MB; for arm32 it is 16MB. In every case, the range is significant, and for any smp guest may well cover the entire tcg_region. Other than these three targets, I have compile-tested the generic change on ppc64le. I have not even compile-tested mips, s390x, or sparc host. r~ Richard Henderson (17): tcg/i386: Add constraints for r8 and r9 tcg/i386: Return a base register from tcg_out_tlb_load tcg/i386: Change TCG_REG_L[01] to not overlap function arguments tcg/i386: Force qemu_ld/st arguments into fixed registers tcg: Return success from patch_reloc tcg: Add TCG_TARGET_NEED_LDST_OOL_LABELS tcg/i386: Use TCG_TARGET_NEED_LDST_OOL_LABELS tcg/aarch64: Add constraints for x0, x1, x2 tcg/aarch64: Parameterize the temps for tcg_out_tlb_read tcg/aarch64: Parameterize the temp for tcg_out_goto_long tcg/aarch64: Use B not BL for tcg_out_goto_long tcg/aarch64: Use TCG_TARGET_NEED_LDST_OOL_LABELS tcg/arm: Parameterize the temps for tcg_out_tlb_read tcg/arm: Add constraints for R0-R5 tcg/arm: Reduce the number of temps for tcg_out_tlb_read tcg/arm: Force qemu_ld/st arguments into fixed registers tcg/arm: Use TCG_TARGET_NEED_LDST_OOL_LABELS tcg/aarch64/tcg-target.h | 2 +- tcg/arm/tcg-target.h | 2 +- tcg/i386/tcg-target.h | 2 +- tcg/tcg.h | 4 + tcg/aarch64/tcg-target.inc.c | 318 +++++++++--------- tcg/arm/tcg-target.inc.c | 535 +++++++++++++++--------------- tcg/i386/tcg-target.inc.c | 611 ++++++++++++++++++++--------------- tcg/mips/tcg-target.inc.c | 29 +- tcg/ppc/tcg-target.inc.c | 47 +-- tcg/s390/tcg-target.inc.c | 37 ++- tcg/sparc/tcg-target.inc.c | 13 +- tcg/tcg-ldst-ool.inc.c | 94 ++++++ tcg/tcg-pool.inc.c | 5 +- tcg/tcg.c | 28 +- tcg/tci/tcg-target.inc.c | 3 +- 15 files changed, 974 insertions(+), 756 deletions(-) create mode 100644 tcg/tcg-ldst-ool.inc.c -- 2.17.2