Changes since v6: - Limit access size to element size to address Max Chou's review. - Fix a typo in the name of a function that this patch now calls.
With access size limited to element size this patch still provides a significant speedup. The `memcpy` benchmark from: https://github.com/embecosm/rise-rvv-tcg-qemu-tooling/tree/main/strmem-benchmarks shows up to 75% speedup with this patch: VLEN | Size | ns/inst (ratio) -----|------|----------------- 128 | 1 | 1.50 128 | 2 | 1.42 128 | 3 | 1.35 128 | 4 | 1.29 128 | 5 | 1.23 128 | 7 | 1.18 128 | 8 | 1.09 128 | 9 | 1.06 128 | 11 | 1.01 VLEN | Size | ns/inst (ratio) -----|------|----------------- 1024 | 1 | 1.75 1024 | 2 | 1.62 1024 | 3 | 1.52 1024 | 4 | 1.43 1024 | 5 | 1.35 1024 | 7 | 1.31 1024 | 8 | 1.12 1024 | 9 | 1.12 1024 | 11 | 1.01 It is not clear to me exactly why the patch is now helping. At first I thought it was due to avoiding `vext_continuous_ldst_host` calling out to `memcpy` for small sizes but trying that directly in `vext_continuous_ldst_host` was much less beneficial: VLEN | Size | ns/inst (ratio) -----|-------|----------------- 128 | 1 | 1.06 128 | 2 | 1.14 128 | 3 | 1.03 128 | 4 | 1.04 128 | 5 | 1.02 128 | 7 | 1.02 128 | 8 | 0.91 128 | 9 | 0.92 128 | 11 | 1.03 VLEN | Size | ns/inst (ratio) -----|-------|----------------- 1024 | 1 | 1.10 1024 | 2 | 1.14 1024 | 3 | 1.04 1024 | 4 | 1.05 1024 | 5 | 0.96 1024 | 7 | 1.07 1024 | 8 | 0.94 1024 | 9 | 0.93 1024 | 11 | 0.90 Previous versions: - v1: https://lore.kernel.org/all/20240717153040.11073-1-paolo.sav...@embecosm.com/ - v2: https://lore.kernel.org/all/20241002135708.99146-1-paolo.sav...@embecosm.com/ - v3: https://lore.kernel.org/all/20241014220153.196183-1-paolo.sav...@embecosm.com/ - v4: https://lore.kernel.org/all/20241029194348.59574-1-paolo.sav...@embecosm.com/ - v5: https://lore.kernel.org/all/20241111130324.32487-1-paolo.sav...@embecosm.com/ - v6: https://lore.kernel.org/all/20241204122952.53375-1-craig.blackm...@embecosm.com/ Cc: Richard Henderson <richard.hender...@linaro.org> Cc: Palmer Dabbelt <pal...@dabbelt.com> Cc: Alistair Francis <alistair.fran...@wdc.com> Cc: Bin Meng <bmeng...@gmail.com> Cc: Weiwei Li <liwei1...@gmail.com> Cc: Daniel Henrique Barboza <dbarb...@ventanamicro.com> Cc: Liu Zhiwei <zhiwei_...@linux.alibaba.com> Cc: Helene Chelin <helene.che...@embecosm.com> Cc: Nathan Egge <ne...@google.com> Cc: Max Chou <max.c...@sifive.com> Cc: Paolo Savini <paolo.sav...@embecosm.com> Craig Blackmore (2): target/riscv: rvv: fix typo in vext continuous ldst function names target/riscv: rvv: speed up small unit-stride loads and stores target/riscv/vector_helper.c | 26 +++++++++++++++++++++----- 1 file changed, 21 insertions(+), 5 deletions(-) -- 2.43.0