https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117502

            Bug ID: 117502
           Summary: Fail to SLP gcc.target/aarch64/sve/pr95199.c
           Product: gcc
           Version: 15.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rguenth at gcc dot gnu.org
  Target Milestone: ---

We fail to SLP the non-stride 1 version of gcc.target/aarch64/sve/pr95199.c
where we apply stride versioning.  Without SLP we can successfully use gathers
while with SLP we end up with

pr95199.c:8:21: missed:   Not using elementwise accesses due to variable
vectorization factor.
pr95199.c:10:8: missed:   not vectorized: relevant stmt not supported: _4 =
*_3;
pr95199.c:8:21: note:   unsupported SLP instance starting from: *_3 = _10;
pr95199.c:8:21: missed:  unsupported SLP instances

this is easier visible when adding -fno-version-loops-for-strides

We fail to get

pr95199.c:8:21: note:   ==> examining statement: _4 = *_3;
pr95199.c:8:21: note:   using gather/scatter for strided/grouped access, scale
= pr95199.c:8:21: note:   vect_model_load_cost: inside_cost = 2, prologue_cost
= 0 .

The reason is that we're using VMAT_STRIDED_SLP and consider gather only
for VMAT_ELEMENTWISE, failing to realize that the caller (get_load_store_type)
will reject both in case of a variable length access.  There's also practically
no difference between VMAT_ELEMENTWISE and VMAT_STRIDED_SLP for single element
accesses.

Reply via email to