https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110449

--- Comment #4 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Richard Sandiford <rsand...@gcc.gnu.org>:

https://gcc.gnu.org/g:7eb260c8a472568912c1e0b83fb402d22977281e

commit r15-7385-g7eb260c8a472568912c1e0b83fb402d22977281e
Author: Richard Sandiford <richard.sandif...@arm.com>
Date:   Thu Feb 6 10:30:53 2025 +0000

    vect: Move induction IV increments [PR110449]

    In this PR, we used to generate:

         .L6:
              mov     v30.16b, v31.16b
              fadd    v31.4s, v31.4s, v27.4s
              fadd    v29.4s, v30.4s, v28.4s
              stp     q30, q29, [x0]
              add     x0, x0, 32
              cmp     x1, x0
              bne     .L6

    for an unrolled induction in:

      for (int i = 0; i < 1024; i++)
        {
          arr[i] = freq;
          freq += step;
        }

    with the problem being the unnecessary MOV.

    The main induction IV was incremented by VF * step == 2 * nunits * step,
    and then nunits * step was added for the second store to arr.

    The original patch for the PR (r14-2367-g224fd59b2dc8) avoided the MOV
    by incrementing the IV by nunits * step twice.  The problem with that
    approach is that it doubles the loop-carried latency.  This change was
    deliberately not preserved when moving from loop-vect to SLP and so
    the test started failing again after r15-3509-gd34cda720988.

    I think the main problem is that we put the IV increment in the wrong
    place.  Normal IVs created by create_iv are placed before the exit
    condition where possible, but vectorizable_induction instead always
    inserted them at the start of the loop body.  The only use of the
    incremented IV is by the phi node, so the effect is to make both
    the old and new IV values live for the whole loop body, which is
    why we need the MOV.

    The simplest fix therefore seems to be to reuse the create_iv logic.

    gcc/
            PR tree-optimization/110449
            * tree-ssa-loop-manip.h (insert_iv_increment): Declare.
            * tree-ssa-loop-manip.cc (insert_iv_increment): New function,
            split out from...
            (create_iv): ...here and generalized to gimple_seqs.
            * tree-vect-loop.cc (vectorizable_induction): Use
            standard_iv_increment_position and insert_iv_increment
            to insert the IV increment.

    gcc/testsuite/
            PR tree-optimization/110449
            * gcc.target/aarch64/pr110449.c: Expect an increment by 8.0,
            but test that there is no MOV.

Reply via email to