https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115192

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |ASSIGNED
           Assignee|unassigned at gcc dot gnu.org      |rguenth at gcc dot 
gnu.org

--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> ---
I'm looking into the first issue.  Interesting fact:

> /space/rguenther/install/gcc-14.1/bin/g++ t.C -O3 -fopt-info-vec 
> -fno-tree-slp-vectorize --param vect-epilogues-nomask=0
t.C:7:21: optimized: loop vectorized using 16 byte vectors
t.C:7:21: optimized:  loop versioned for vectorization because of possible
aliasing
rguenther@localhost:/tmp> ./a.out
> /space/rguenther/install/gcc-14.1/bin/g++ t.C -O3 -fopt-info-vec 
> -fno-tree-slp-vectorize --param vect-epilogues-nomask=1
t.C:7:21: optimized: loop vectorized using 16 byte vectors
t.C:7:21: optimized:  loop versioned for vectorization because of possible
aliasing
t.C:7:21: optimized: loop vectorized using 8 byte vectors
rguenther@localhost:/tmp> ./a.out 
Aborted (core dumped)

so avoiding the vectorized epilog fixes this (I've also placed #pragma GCC
novector on the loop in main and noipa on foo).

C testcase:

typedef float float4_t __attribute__((vector_size(4 * sizeof(float))));

void __attribute__((noipa))
foo(int n, const float *d, float4_t * __restrict a)
{
  for (int y = 1; y < n; y++)
    for (int c = 0; c < 2; c++)
      a[y * n][c] = d[y * n] + a[(y - 1) * n][c];
}

int main()
{ 
  const int n = 3;
  float d[n*n];
  float4_t a[n*n];
#pragma GCC novector
  for (int i = 0; i < n * n; ++i)
    d[i] = i;
  foo(n, d, a);
  if (a[6][1] != 9)
    __builtin_abort();
}

Reply via email to