This patchset uses the TCG vector ops for some MVE instructions. We can only do this when we know that none of the MVE lanes are predicated, ie when neither tail predication nor VPT predication nor ECI partial insn execution are happening.
Changes v1->v2: The major change is that instead of just updating the local s->mve_no_pred flag when we translate an insn that changes the predication state, we end the TB with DISAS_UPDATE_NONCHAIN. The exceptions are the code called from vfp_access_check() (gen_preserve_fp_state() and gen_update_fp_context()). We can definitely determine the new flag value in one of these cases, but in the other we can't always. So patch 1 is new, and adds support to gen_jmp_tb() for looking at the existing value of is_jmp so it can honour a preceding request for an UPDATE_NOCHAIN or UPDATE_EXIT. (We already were assuming this because gen_preserve_fp_state() can set is_jmp to DISAS_UPDATE_EXIT if icount is in use.) Patch 2 (new) enforces that FPDSCR.LTPSIZE is 4 on inbound migration, because we now rely on this architectural invariant. Patch 3 is the old patch 1, updated as noted above. Patches 4-6 have been reviewed (they have been very slightly tweaked to use a new mve_no_predication() function that checks both s->eci and s->mve_no_pred, rather than v1's direct check of mve_no_pred.) Patches 7-12 are new, and add optimized variants of VDUP, VMVN, various shifts, the shift-and-inserts, and the 1-operand-immediate insns. I think this should now be the complete set of optimizations it's worth implementing at this point. thanks -- PMM Peter Maydell (12): target/arm: Avoid goto_tb if we're trying to exit to the main loop target/arm: Enforce that FPDSCR.LTPSIZE is 4 on inbound migration target/arm: Add TB flag for "MVE insns not predicated" target/arm: Optimize MVE logic ops target/arm: Optimize MVE arithmetic ops target/arm: Optimize MVE VNEG, VABS target/arm: Optimize MVE VDUP target/arm: Optimize MVE VMVN target/arm: Optimize MVE VSHL, VSHR immediate forms target/arm: Optimize MVE VSHLL and VMOVL target/arm: Optimize MVE VSLI and VSRI target/arm: Optimize MVE 1op-immediate insns target/arm/cpu.h | 4 +- target/arm/translate.h | 2 + target/arm/helper.c | 33 ++++ target/arm/machine.c | 13 ++ target/arm/translate-m-nocp.c | 8 +- target/arm/translate-mve.c | 310 ++++++++++++++++++++++++++-------- target/arm/translate-vfp.c | 33 +++- target/arm/translate.c | 42 ++++- 8 files changed, 361 insertions(+), 84 deletions(-) -- 2.20.1