Hi, This RFC patch set tries to fix the issue of https://gitlab.com/qemu-project/qemu/-/issues/2137.
In this new version, we added patches that try to load/store more data at a time in part of vector continuous load/store (unit-stride/whole register) instructions with some assumptions (e.g. no masking, no tail agnostic, perform virtual address resolution once for the entire vector, etc.) as suggested by Richard Henderson. This version can improve the performance of the test case provided in https://gitlab.com/qemu-project/qemu/-/issues/2137#note_1757501369 (from ~13.5 sec to ~1.5 sec) on QEMU user mode. PS: This RFC patch set only focuses on the vle8.v/vse8.v/vl8re8.v/vs8r.v instructions. The next version will try to complete other instructions. Series based on riscv-to-apply.next branch (commit 1806da7). Max Chou (6): target/riscv: Separate vector segment ld/st instructions accel/tcg: Avoid unnecessary call overhead from qemu_plugin_vcpu_mem_cb target/riscv: Inline vext_ldst_us and corresponding function for performance target/riscv: Add check_probe_[read|write] helper functions target/riscv: rvv: Optimize v[l|s]e8.v with limitations target/riscv: rvv: Optimize vl8re8.v/vs8r.v with limitations accel/tcg/ldst_common.c.inc | 8 +- target/riscv/helper.h | 8 + target/riscv/insn32.decode | 11 +- target/riscv/insn_trans/trans_rvv.c.inc | 454 +++++++++++++++++++++++- target/riscv/vector_helper.c | 142 ++++++-- 5 files changed, 591 insertions(+), 32 deletions(-) -- 2.34.1