LGTM :)
On Fri, Feb 2, 2024 at 9:58 AM Juzhe-Zhong <juzhe.zh...@rivai.ai> wrote: > > This patch fixes the following: > > vsetvli a5,a1,e32,m1,tu,ma > slli a4,a5,2 > sub a1,a1,a5 > vle32.v v2,0(a0) > add a0,a0,a4 > vadd.vv v1,v2,v1 > bne a1,zero,.L3 > vsetivli zero,1,e32,m1,ta,ma > vmv.s.x v2,zero > vsetvli a5,zero,e32,m1,ta,ma ---> Redundant vsetvl. > vredsum.vs v1,v1,v2 > vmv.x.s a0,v1 > ret > > VSETVL PASS is able to fuse avl = 1 of scalar move and VLMAX avl of reduction. > > However, this following RTL blocks the fusion in dependence analysis in > VSETVL PASS: > > (insn 49 24 50 5 (set (reg:RVVM1SI 98 v2 [148]) > (if_then_else:RVVM1SI (unspec:RVVMF32BI [ > (const_vector:RVVMF32BI [ > (const_int 1 [0x1]) > repeat [ > (const_int 0 [0]) > ] > ]) > (const_int 1 [0x1]) > (const_int 2 [0x2]) repeated x2 > (const_int 0 [0]) > (reg:SI 66 vl) > (reg:SI 67 vtype) > ] UNSPEC_VPREDICATE) > (const_vector:RVVM1SI repeat [ > (const_int 0 [0]) > ]) > (unspec:RVVM1SI [ > (reg:DI 0 zero) > ] UNSPEC_VUNDEF))) 3813 {*pred_broadcastrvvm1si_zero} > (nil)) > (insn 50 49 51 5 (set (reg:DI 15 a5 [151]) ----> It > set a5, blocks the following VLMAX into the scalar move above. > (unspec:DI [ > (const_int 32 [0x20]) > ] UNSPEC_VLMAX)) 2566 {vlmax_avldi} > (expr_list:REG_EQUIV (unspec:DI [ > (const_int 32 [0x20]) > ] UNSPEC_VLMAX) > (nil))) > (insn 51 50 52 5 (set (reg:RVVM1SI 97 v1 [150]) > (unspec:RVVM1SI [ > (unspec:RVVMF32BI [ > (const_vector:RVVMF32BI repeat [ > (const_int 1 [0x1]) > ]) > (reg:DI 15 a5 [151]) > (const_int 2 [0x2]) > (const_int 1 [0x1]) > (reg:SI 66 vl) > (reg:SI 67 vtype) > ] UNSPEC_VPREDICATE) > (unspec:RVVM1SI [ > (reg:RVVM1SI 97 v1 [orig:134 vect_result_14.6 ] [134]) > (reg:RVVM1SI 98 v2 [148]) > ] UNSPEC_REDUC_SUM) > (unspec:RVVM1SI [ > (reg:DI 0 zero) > ] UNSPEC_VUNDEF) > ] UNSPEC_REDUC)) 17541 {pred_redsumrvvm1si} > (expr_list:REG_DEAD (reg:RVVM1SI 98 v2 [148]) > (expr_list:REG_DEAD (reg:SI 66 vl) > (expr_list:REG_DEAD (reg:DI 15 a5 [151]) > (expr_list:REG_DEAD (reg:DI 0 zero) > (nil)))))) > > Such situation can only happen on auto-vectorization, never happen on > intrinsic codes. > Since the reduction is passed VLMAX AVL, it should be more natural to pass > VLMAX to the scalar move which initial the value of the reduction. > > After this patch: > > vsetvli a5,a1,e32,m1,tu,ma > slli a4,a5,2 > sub a1,a1,a5 > vle32.v v2,0(a0) > add a0,a0,a4 > vadd.vv v1,v2,v1 > bne a1,zero,.L3 > vsetvli a5,zero,e32,m1,ta,ma > vmv.s.x v2,zero > vredsum.vs v1,v1,v2 > vmv.x.s a0,v1 > ret > > Tested on both RV32/RV64 no regression. > > PR target/113697 > > gcc/ChangeLog: > > * config/riscv/riscv-v.cc (expand_reduction): Pass VLMAX avl to > scalar move. > > gcc/testsuite/ChangeLog: > > * gcc.target/riscv/rvv/autovec/pr113697.c: New test. > > --- > gcc/config/riscv/riscv-v.cc | 12 +++++++----- > .../gcc.target/riscv/rvv/autovec/pr113697.c | 14 ++++++++++++++ > 2 files changed, 21 insertions(+), 5 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113697.c > > diff --git a/gcc/config/riscv/riscv-v.cc b/gcc/config/riscv/riscv-v.cc > index 4bacb7fea45..0cfbd21ce6f 100644 > --- a/gcc/config/riscv/riscv-v.cc > +++ b/gcc/config/riscv/riscv-v.cc > @@ -4151,13 +4151,15 @@ expand_reduction (unsigned unspec, unsigned > insn_flags, rtx *ops, rtx init) > > rtx m1_tmp = gen_reg_rtx (m1_mode); > rtx scalar_move_ops[] = {m1_tmp, init}; > - emit_nonvlmax_insn (code_for_pred_broadcast (m1_mode), SCALAR_MOVE_OP, > - scalar_move_ops, > - need_mask_operand_p (insn_flags) ? ops[3] > - : CONST1_RTX (Pmode)); > + insn_code icode = code_for_pred_broadcast (m1_mode); > + if (need_mask_operand_p (insn_flags)) > + emit_nonvlmax_insn (icode, SCALAR_MOVE_OP, scalar_move_ops, ops[3]); > + else > + emit_vlmax_insn (icode, SCALAR_MOVE_OP, scalar_move_ops); > + > rtx m1_tmp2 = gen_reg_rtx (m1_mode); > rtx reduc_ops[] = {m1_tmp2, vector_src, m1_tmp}; > - insn_code icode = code_for_pred (unspec, vmode); > + icode = code_for_pred (unspec, vmode); > > if (need_mask_operand_p (insn_flags)) > { > diff --git a/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113697.c > b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113697.c > new file mode 100644 > index 00000000000..588b86c7e6c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/riscv/rvv/autovec/pr113697.c > @@ -0,0 +1,14 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -march=rv64gcv -mabi=lp64d -fno-schedule-insns" } */ > + > +int > +foo (int *__restrict a, int n) > +{ > + int result = 0; > + for (int i = 0; i < n; i++) > + result += a[i]; > + return result; > +} > + > +/* { dg-final { scan-assembler-times {vsetvli} 3 } } */ > +/* { dg-final { scan-assembler-not {vsetivli} } } */ > -- > 2.36.3 >