This has been on my to-do list for several years, and I've finally spent a rainy weekend doing something about it.
The current tcg bswap opcode is fairly strict: for swaps smaller than the TCGv size, it requires zero-extended input and provides zero-extended output. This has meant that various tcg/ backends have their own handling of bswap when it comes to memory, to minimize overhead for stores (which do not care about zero-extended output) or for signed loads (which would rather not sign-extend after zero-extending). Solve this by adding some operation flags to the tcg bswap opcode: TCG_BSWAP_IZ -- Input is Zero extended TCG_BSWAP_OZ -- Output is Zero extended TCG_BSWAP_OS -- Output is Sign extended For instance, bswap before store would not set any of these flags, allowing unextended input and producing unextended output. The patch set can be broken into sections: * patches 1 - 16 implement the functionality in the backend, but do not provide the interface to use it, * patch 17 enables the interface, * patches 18 - 25 use the new interface in the front ends * patches 26 - 28 remove some tcg backend complexity, leaving the bswap handling to the middle-end. r~ Richard Henderson (28): tcg: Add flags argument to bswap opcodes tcg/i386: Support bswap flags tcg/aarch64: Support bswap flags tcg/arm: Support bswap flags tcg/ppc: Split out tcg_out_ext{8,16,32}s tcg/ppc: Split out tcg_out_sari{32,64} tcg/ppc: Split out tcg_out_bswap16 tcg/ppc: Split out tcg_out_bswap32 tcg/ppc: Split out tcg_out_bswap64 tcg/ppc: Support bswap flags tcg/ppc: Use power10 byte-reverse instructions tcg/s390: Support bswap flags tcg/mips: Support bswap flags in tcg_out_bswap16 tcg/mips: Support bswap flags in tcg_out_bswap32 tcg/tci: Support bswap flags tcg: Handle new bswap flags during optimize tcg: Add flags argument to tcg_gen_bswap16_*, tcg_gen_bswap32_i64 tcg: Make use of bswap flags in tcg_gen_qemu_ld_* tcg: Make use of bswap flags in tcg_gen_qemu_st_* target/arm: Improve REV32 target/arm: Improve vector REV target/arm: Improve REVSH target/i386: Improve bswap translation target/sh4: Improve swap.b translation target/mips: Fix gen_mxu_s32ldd_s32lddr tcg/arm: Unset TCG_TARGET_HAS_MEMORY_BSWAP tcg/aarch64: Unset TCG_TARGET_HAS_MEMORY_BSWAP tcg/riscv: Remove MO_BSWAP handling include/tcg/tcg-op.h | 8 +- include/tcg/tcg-opc.h | 10 +- include/tcg/tcg.h | 12 ++ tcg/aarch64/tcg-target.h | 2 +- tcg/arm/tcg-target.h | 2 +- target/arm/translate-a64.c | 21 +-- target/arm/translate.c | 4 +- target/i386/tcg/translate.c | 14 +- target/mips/tcg/mxu_translate.c | 6 +- target/s390x/translate.c | 4 +- target/sh4/translate.c | 3 +- tcg/optimize.c | 56 ++++++- tcg/tcg-op.c | 143 +++++++++++------ tcg/tci.c | 3 +- tcg/aarch64/tcg-target.c.inc | 99 +++++------- tcg/arm/tcg-target.c.inc | 272 ++++++++++++-------------------- tcg/i386/tcg-target.c.inc | 20 ++- tcg/mips/tcg-target.c.inc | 99 ++++++------ tcg/ppc/tcg-target.c.inc | 199 ++++++++++++++--------- tcg/riscv/tcg-target.c.inc | 64 ++++---- tcg/s390/tcg-target.c.inc | 34 +++- tcg/tci/tcg-target.c.inc | 23 ++- tcg/README | 18 ++- 23 files changed, 607 insertions(+), 509 deletions(-) -- 2.25.1