Previous versions: - RFC v1: https://lore.kernel.org/all/20241218170840.1090473-1-paolo.sav...@embecosm.com/ - RFC v2: https://lore.kernel.org/all/20241220153834.16302-1-paolo.sav...@embecosm.com/
Thanks Max for the feedback here: https://lore.kernel.org/all/258795e9-4e97-4cd7-949f-24e88d24f...@sifive.com/ The previous version had the issue that calls to tcg_gen_qemu_[ld/st]_i128 and tcg_gen_[ld/st]_i128 would not generate 128 bits loads and stores but generated 64-bit pairs of loads/stores. This meant that with a trap on the second load/store we weren't able to increment vstart in ldst_whole_trans by the number of elements processed by the first 64 bits load/store. I propose here the following fixes: - Split the emulation of whole register loads/stores into smaller sizes: we generate at best pairs of 64 bits loads/stores anyway so we'd rather call directly for the generation of 64 bits load/store operations and update vstart accordingly, instead of calling for 128 bits loads and store that under the hood will be split. - Emulate whole register loads/stores by 32 bits blocks for hosts with 32 bits registers: this is done again to avoid a splitting of the load or store that we want to generate without us being able to set vstart correctly in case a trap happens. - Don't generate 32 bits loads/stores but fall back to the helper function if the host has 32 bits registers and we are loading/storing vector elements of 64 bits. This is done in order to avoid that a trap stops the execution mid-element. The patch also adds a set of conditions for the use of tcg nodes or helper function that is host architecture specific. We observed a performance gain on all the combinations of vector length, element size and number of fields in the emulation of the whole register loads and stores apart from a few cases where the helper function pefroms better. We add a set of condition that cover those cases and future strings can be added if other architectures require them. The commit message changed to better reflect the new behaviour of the patch. Cc: Richard Handerson <richard.hender...@linaro.org> Cc: Palmer Dabbelt <pal...@dabbelt.com> Cc: Alistair Francis <alistair.fran...@wdc.com> Cc: Bin Meng <bmeng...@gmail.com> Cc: Weiwei Li <liwei1...@gmail.com> Cc: Daniel Henrique Barboza <dbarb...@ventanamicro.com> Cc: Liu Zhiwei <zhiwei_...@linux.alibaba.com> Cc: Helene Chelin <helene.che...@embecosm.com> Cc: Nathan Egge <ne...@google.com> Cc: Max Chou <max.c...@sifive.com> Cc: Jeremy Bennett <jeremy.benn...@embecosm.com> Cc: Craig Blackmore <craig.blackm...@embecosm.com> Paolo Savini (1): target/riscv: use tcg ops generation to emulate whole reg rvv loads/stores. target/riscv/insn_trans/trans_rvv.c.inc | 164 +++++++++++++++++------- 1 file changed, 119 insertions(+), 45 deletions(-) -- 2.34.1