https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111153
--- Comment #6 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Pan Li <pa...@gcc.gnu.org>: https://gcc.gnu.org/g:5e0f67b84a615ba186ab234a9bc43df0df5a50b6 commit r14-6528-g5e0f67b84a615ba186ab234a9bc43df0df5a50b6 Author: Juzhe-Zhong <juzhe.zh...@rivai.ai> Date: Thu Dec 14 11:23:43 2023 +0800 RISC-V: Add RVV builtin vectorization cost model This patch fixes PR11153: ble a1,zero,.L8 addiw a5,a1,-1 li a4,4 addi sp,sp,-16 mv a2,a0 sext.w a3,a1 bleu a5,a4,.L9 srliw a4,a3,2 slli a4,a4,4 mv a5,a0 add a4,a4,a0 vsetivli zero,4,e32,m1,ta,ma vmv.v.i v1,0 vse32.v v1,0(sp) .L4: vle32.v v1,0(a5) ---> This loop always processes 4 elements which is ok for VLEN = 128bits, but waste a huge amount of computation units when VLEN > 128bits vle32.v v2,0(sp) addi a5,a5,16 vadd.vv v1,v2,v1 vse32.v v1,0(sp) bne a4,a5,.L4 ld a5,0(sp) lw a4,0(sp) andi a1,a1,-4 srai a5,a5,32 addw a5,a4,a5 lw a4,8(sp) addw a5,a5,a4 ld a4,8(sp) srai a4,a4,32 addw a0,a5,a4 beq a3,a1,.L15 .L3: subw a3,a3,a1 slli a5,a1,32 slli a3,a3,32 srli a3,a3,32 srli a5,a5,30 add a2,a2,a5 vsetvli a5,a3,e8,mf4,tu,mu vsetvli a4,zero,e32,m1,ta,ma sub a1,a3,a5 vmv.v.i v1,0 vsetvli zero,a3,e32,m1,tu,ma vle32.v v2,0(a2) vmv.v.v v1,v2 bne a3,a5,.L21 .L7: vsetvli a4,zero,e32,m1,ta,ma vmv.s.x v2,zero vredsum.vs v1,v1,v2 vmv.x.s a5,v1 addw a0,a0,a5 .L15: addi sp,sp,16 jr ra .L21: slli a5,a5,2 add a2,a2,a5 vsetvli zero,a1,e32,m1,tu,ma vle32.v v2,0(a2) vadd.vv v1,v1,v2 j .L7 .L8: li a0,0 ret .L9: li a1,0 li a0,0 j .L3 The rootcause of this is we missed RVV builtin vectorization cost model. After this patch: ble a1,zero,.L4 vsetvli a5,zero,e32,m1,ta,ma vmv.v.i v1,0 .L3: vsetvli a5,a1,e32,m1,tu,ma vle32.v v2,0(a0) slli a4,a5,2 sub a1,a1,a5 add a0,a0,a4 vadd.vv v1,v2,v1 bne a1,zero,.L3 li a5,0 vsetivli zero,1,e32,m1,ta,ma vmv.s.x v2,a5 vsetvli a5,zero,e32,m1,ta,ma vredsum.vs v1,v1,v2 vmv.x.s a0,v1 ret .L4: li a0,0 ret PR target/111153 gcc/ChangeLog: * config/riscv/riscv-protos.h (struct common_vector_cost): New struct. (struct scalable_vector_cost): Ditto. (struct cpu_vector_cost): Ditto. * config/riscv/riscv-vector-costs.cc (costs::add_stmt_cost): Add RVV builtin vectorization cost * config/riscv/riscv.cc (struct riscv_tune_param): Ditto. (get_common_costs): New function. (riscv_builtin_vectorization_cost): Ditto. (TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST): New targethook. gcc/testsuite/ChangeLog: * gcc.dg/vect/costmodel/riscv/rvv/pr111153.c: New test.