https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115640
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Last reconfirmed| |2024-06-25 Status|UNCONFIRMED |NEW CC| |rsandifo at gcc dot gnu.org Ever confirmed|0 |1 --- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> --- It doesn't look like a misaligned access at least. Instead it looks like an out-of-bound access? OTOH I probably missed the fortran equivalent of #include "tree-vect.h" and check_vect() in main. We do vectorize the outer loop, but with partial vectors (huh, it iterates exactly 4 times). I'm not exactly sure the outer loop mask is the correct one to use in the inner loop though (but we do that). And this might explain the issue - we're definitely accessing excess elements of AA in the inner loop. In particular we disallowed grouped accesses in inner loops but this load is basically treated as a grouped access by means of having a load permutation. Test coverage seems to be weak here and this restriction should be lifted ideally (I remember issues with the IV update, but that only was for the multiple-types case which is rejected separately). I'm not sure how one would deal with an outer .WHILE_ULT loop mask in the inner loop. This is AA(I,J) with I evolving in the outer loop and J in the inner loop. The outer loop evolution is contiguous, the inner loop evolution applies a stride of four. If you force GCN to use fixed length vectors (how?), does it work? How's it behaving on aarch64 with SVE? (the CI was happy, but maybe doesn't enable SVE) Confirmed as in, it looks like wrong generated code.