https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

--- Comment #14 from CVS Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Pan Li <pa...@gcc.gnu.org>:

https://gcc.gnu.org/g:fb906061e10662280f602886c3659ac1c7522a37

commit r14-5326-gfb906061e10662280f602886c3659ac1c7522a37
Author: Juzhe-Zhong <juzhe.zh...@rivai.ai>
Date:   Fri Nov 10 20:20:11 2023 +0800

    Middle-end: Fix bug of induction variable vectorization for RVV

    PR: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=112438

    1. Since SELECT_VL result is not necessary always VF in non-final
iteration.

    Current GIMPLE IR is wrong:

    ...
    _35 = .SELECT_VL (ivtmp_33, VF);
    _21 = vect_vec_iv_.8_22 + { VF, ... };

    E.g. Consider the total iterations N = 6, the VF = 4.
    Since SELECT_VL output is defined as not always to be VF in non-final
iteration
    which needs to depend on hardware implementation.

    Suppose we have a RVV CPU core with vsetvl doing even distribution workload
optimization.
    It may process 3 elements at the 1st iteration and 3 elements at the last
iteration.
    Then the induction variable here: _21 = vect_vec_iv_.8_22 + { POLY_INT_CST
[4, 4], ... };
    is wrong which is adding VF, which is 4, actually, we didn't process 4
elements.

    It should be adding 3 elements which is the result of SELECT_VL.
    So, here the correct IR should be:

      _36 = .SELECT_VL (ivtmp_34, VF);
      _22 = (int) _36;
      vect_cst__21 = [vec_duplicate_expr] _22;

    2. This issue only happens on non-SLP vectorization single rgroup since:

         if (LOOP_VINFO_USING_DECREMENTING_IV_P (loop_vinfo))
        {
          tree iv_type = LOOP_VINFO_RGROUP_IV_TYPE (loop_vinfo);
          if (direct_internal_fn_supported_p (IFN_SELECT_VL, iv_type,
                                              OPTIMIZE_FOR_SPEED)
              && LOOP_VINFO_LENS (loop_vinfo).length () == 1
              && LOOP_VINFO_LENS (loop_vinfo)[0].factor == 1 && !slp
              && (!LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
                  || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant ()))
            LOOP_VINFO_USING_SELECT_VL_P (loop_vinfo) = true;
        }

    3. This issue doesn't appears on nested loop no matter
LOOP_VINFO_USING_SELECT_VL_P is true or false.

    Since:

      # vect_vec_iv_.6_5 = PHI <_19(3), { 0, ... }(5)>
      # vect_diff_15.7_20 = PHI <vect_diff_9.8_22(3), vect_diff_18.5_11(5)>
      _19 = vect_vec_iv_.6_5 + { 1, ... };
      vect_diff_9.8_22 = .COND_LEN_ADD ({ -1, ... }, vect_vec_iv_.6_5,
vect_diff_15.7_20, vect_diff_15.7_20, _28, 0);
      ivtmp_1 = ivtmp_4 + 4294967295;
      ....
      <bb 5> [local count: 6549826]:
      # vect_diff_18.5_11 = PHI <vect_diff_9.8_22(4), { 0, ... }(2)>
      # ivtmp_26 = PHI <ivtmp_27(4), 40(2)>
      _28 = .SELECT_VL (ivtmp_26, POLY_INT_CST [4, 4]);
      goto <bb 3>; [100.00%]

    Note the induction variable IR: _21 = vect_vec_iv_.8_22 + { POLY_INT_CST
[4, 4], ... }; update induction variable
    independent on VF (or don't care about how many elements are processed in
the iteration).

    The update is loop invariant. So it won't be the problem even if
LOOP_VINFO_USING_SELECT_VL_P is true.

    Testing passed, Ok for trunk ?

            PR tree-optimization/112438

    gcc/ChangeLog:

            * tree-vect-loop.cc (vectorizable_induction): Bugfix when
            LOOP_VINFO_USING_SELECT_VL_P.

    gcc/testsuite/ChangeLog:

            * gcc.target/riscv/rvv/autovec/pr112438.c: New test.

Reply via email to