[Bug tree-optimization/114107] poor vectorization at -O3 when dealing with arrays of different multiplicity, good with -O2

cvs-commit at gcc dot gnu.org via Gcc-bugs Wed, 12 Jun 2024 23:22:56 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114107


--- Comment #15 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Richard Biener <rgue...@gcc.gnu.org>:

https://gcc.gnu.org/g:1fe55a1794863b5ad9eeca5062782834716016b2

commit r15-1238-g1fe55a1794863b5ad9eeca5062782834716016b2
Author: Richard Biener <rguent...@suse.de>
Date:   Fri Jun 7 11:29:05 2024 +0200

    tree-optimization/114107 - avoid peeling for gaps in more cases

    The following refactors the code to detect necessary peeling for
    gaps, in particular the PR103116 case when there is no gap but
    the group size is smaller than the vector size.  The testcase in
    PR114107 shows we fail to SLP

      for (int i=0; i<n; i++)
        for (int k=0; k<4; k++)
          data[4*i+k] *= factor[i];

    because peeling one scalar iteration isn't enough to cover a gap
    of 3 elements of factor[i].  But the code detecting this is placed
    after the logic that detects cases we handle properly already as
    we'd code generate { factor[i], 0., 0., 0. } for V4DFmode vectorization
    already.  In fact the check to detect when peeling a single iteration
    isn't enough seems improperly guarded as it should apply to all cases.

    I'm not sure we correctly handle VMAT_CONTIGUOUS_REVERSE but I
    checked that VMAT_STRIDED_SLP and VMAT_ELEMENTWISE correctly avoid
    touching excess elements.

    With this change we can use SLP for the above testcase and the
    PR103116 testcases no longer require an epilogue on x86-64.  It
    might be different on other targets so I made those testcases
    runtime FAIL only instead of relying on dump scanning there's
    currently no easy way to properly constrain.

            PR tree-optimization/114107
            PR tree-optimization/110445
            * tree-vect-stmts.cc (get_group_load_store_type): Refactor
            contiguous access case.  Make sure peeling for gap constraints
            are always tested and consistently relax when we know we can
            avoid touching excess elements during code generation.  But
            rewrite the check poly-int aware.

            * gcc.dg/vect/pr114107.c: New testcase.
            * gcc.dg/vect/pr103116-1.c: Adjust.
            * gcc.dg/vect/pr103116-2.c: Likewise.

[Bug tree-optimization/114107] poor vectorization at -O3 when dealing with arrays of different multiplicity, good with -O2

Reply via email to