https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438
--- Comment #14 from CVS Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Pan Li <pa...@gcc.gnu.org>: https://gcc.gnu.org/g:fb906061e10662280f602886c3659ac1c7522a37 commit r14-5326-gfb906061e10662280f602886c3659ac1c7522a37 Author: Juzhe-Zhong <juzhe.zh...@rivai.ai> Date: Fri Nov 10 20:20:11 2023 +0800 Middle-end: Fix bug of induction variable vectorization for RVV PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438 1. Since SELECT_VL result is not necessary always VF in non-final iteration. Current GIMPLE IR is wrong: ... _35 = .SELECT_VL (ivtmp_33, VF); _21 = vect_vec_iv_.8_22 + { VF, ... }; E.g. Consider the total iterations N = 6, the VF = 4. Since SELECT_VL output is defined as not always to be VF in non-final iteration which needs to depend on hardware implementation. Suppose we have a RVV CPU core with vsetvl doing even distribution workload optimization. It may process 3 elements at the 1st iteration and 3 elements at the last iteration. Then the induction variable here: _21 = vect_vec_iv_.8_22 + { POLY_INT_CST [4, 4], ... }; is wrong which is adding VF, which is 4, actually, we didn't process 4 elements. It should be adding 3 elements which is the result of SELECT_VL. So, here the correct IR should be: _36 = .SELECT_VL (ivtmp_34, VF); _22 = (int) _36; vect_cst__21 = [vec_duplicate_expr] _22; 2. This issue only happens on non-SLP vectorization single rgroup since: if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo)) { tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo); if (direct_internal_fn_supported_p (IFN_SELECT_VL, iv_type, OPTIMIZE_FOR_SPEED) && LOOP_VINFO_LENS (loop_vinfo).length () == 1 && LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1 && !slp && (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo) || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ())) LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) = true; } 3. This issue doesn't appears on nested loop no matter LOOP_VINFO_USING_SELECT_VL_P is true or false. Since: # vect_vec_iv_.6_5 = PHI <_19(3), { 0, ... }(5)> # vect_diff_15.7_20 = PHI <vect_diff_9.8_22(3), vect_diff_18.5_11(5)> _19 = vect_vec_iv_.6_5 + { 1, ... }; vect_diff_9.8_22 = .COND_LEN_ADD ({ -1, ... }, vect_vec_iv_.6_5, vect_diff_15.7_20, vect_diff_15.7_20, _28, 0); ivtmp_1 = ivtmp_4 + 4294967295; .... <bb 5> [local count: 6549826]: # vect_diff_18.5_11 = PHI <vect_diff_9.8_22(4), { 0, ... }(2)> # ivtmp_26 = PHI <ivtmp_27(4), 40(2)> _28 = .SELECT_VL (ivtmp_26, POLY_INT_CST [4, 4]); goto <bb 3>; [100.00%] Note the induction variable IR: _21 = vect_vec_iv_.8_22 + { POLY_INT_CST [4, 4], ... }; update induction variable independent on VF (or don't care about how many elements are processed in the iteration). The update is loop invariant. So it won't be the problem even if LOOP_VINFO_USING_SELECT_VL_P is true. Testing passed, Ok for trunk ? PR tree-optimization/112438 gcc/ChangeLog: * tree-vect-loop.cc (vectorizable_induction): Bugfix when LOOP_VINFO_USING_SELECT_VL_P. gcc/testsuite/ChangeLog: * gcc.target/riscv/rvv/autovec/pr112438.c: New test.