https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
   Last reconfirmed|                            |2024-06-25
             Status|UNCONFIRMED                 |NEW
                 CC|                            |rsandifo at gcc dot gnu.org
     Ever confirmed|0                           |1

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
It doesn't look like a misaligned access at least.  Instead it looks like an
out-of-bound access?

OTOH I probably missed the fortran equivalent of #include "tree-vect.h" and
check_vect() in main.

We do vectorize the outer loop, but with partial vectors (huh, it iterates
exactly 4 times).  I'm not exactly sure the outer loop mask is the correct
one to use in the inner loop though (but we do that).  And this might explain
the issue - we're definitely accessing excess elements of AA in the inner
loop.

In particular we disallowed grouped accesses in inner loops but this
load is basically treated as a grouped access by means of having a load
permutation.  Test coverage seems to be weak here and this restriction
should be lifted ideally (I remember issues with the IV update, but that
only was for the multiple-types case which is rejected separately).

I'm not sure how one would deal with an outer .WHILE_ULT loop mask in the
inner loop.  This is AA(I,J) with I evolving in the outer loop and J
in the inner loop.  The outer loop evolution is contiguous, the inner
loop evolution applies a stride of four.

If you force GCN to use fixed length vectors (how?), does it work?  How's
it behaving on aarch64 with SVE?  (the CI was happy, but maybe doesn't
enable SVE)

Confirmed as in, it looks like wrong generated code.

Reply via email to