https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115192
Richard Biener <rguenth at gcc dot gnu.org> changed:
What |Removed |Added
----------------------------------------------------------------------------
Status|NEW |ASSIGNED
Assignee|unassigned at gcc dot gnu.org |rguenth at gcc dot
gnu.org
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
I'm looking into the first issue. Interesting fact:
> /space/rguenther/install/gcc-14.1/bin/g++ t.C -O3 -fopt-info-vec
> -fno-tree-slp-vectorize --param vect-epilogues-nomask=0
t.C:7:21: optimized: loop vectorized using 16 byte vectors
t.C:7:21: optimized: loop versioned for vectorization because of possible
aliasing
rguenther@localhost:/tmp> ./a.out
> /space/rguenther/install/gcc-14.1/bin/g++ t.C -O3 -fopt-info-vec
> -fno-tree-slp-vectorize --param vect-epilogues-nomask=1
t.C:7:21: optimized: loop vectorized using 16 byte vectors
t.C:7:21: optimized: loop versioned for vectorization because of possible
aliasing
t.C:7:21: optimized: loop vectorized using 8 byte vectors
rguenther@localhost:/tmp> ./a.out
Aborted (core dumped)
so avoiding the vectorized epilog fixes this (I've also placed #pragma GCC
novector on the loop in main and noipa on foo).
C testcase:
typedef float float4_t __attribute__((vector_size(4 * sizeof(float))));
void __attribute__((noipa))
foo(int n, const float *d, float4_t * __restrict a)
{
for (int y = 1; y < n; y++)
for (int c = 0; c < 2; c++)
a[y * n][c] = d[y * n] + a[(y - 1) * n][c];
}
int main()
{
const int n = 3;
float d[n*n];
float4_t a[n*n];
#pragma GCC novector
for (int i = 0; i < n * n; ++i)
d[i] = i;
foo(n, d, a);
if (a[6][1] != 9)
__builtin_abort();
}