15 regression] -O3 miscompilation on x86-64 (loops with vectors and scalars) since r11-6380

rguenth at gcc dot gnu.org via Gcc-bugs Thu, 23 May 2024 01:09:53 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115192


--- Comment #9 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Richard Biener from comment #7)
> I'm looking into the first issue.  Interesting fact:
> 
> > /space/rguenther/install/gcc-14.1/bin/g++ t.C -O3 -fopt-info-vec 
> > -fno-tree-slp-vectorize --param vect-epilogues-nomask=0
> t.C:7:21: optimized: loop vectorized using 16 byte vectors
> t.C:7:21: optimized:  loop versioned for vectorization because of possible
> aliasing
> rguenther@localhost:/tmp> ./a.out
> > /space/rguenther/install/gcc-14.1/bin/g++ t.C -O3 -fopt-info-vec 
> > -fno-tree-slp-vectorize --param vect-epilogues-nomask=1
> t.C:7:21: optimized: loop vectorized using 16 byte vectors
> t.C:7:21: optimized:  loop versioned for vectorization because of possible
> aliasing
> t.C:7:21: optimized: loop vectorized using 8 byte vectors
> rguenther@localhost:/tmp> ./a.out 
> Aborted (core dumped)
> 
> so avoiding the vectorized epilog fixes this (I've also placed #pragma GCC
> novector on the loop in main and noipa on foo).

Actually with -fno-vect-cost-model even --param vect-epilogues-nomask=0 fails.
Since we are vectorizing

  for (int y = 1; y < n; y++)
    {
      a[y * n][0] = d[y * n] + a[(y - 1) * n][0];
      a[y * n][1] = d[y * n] + a[(y - 1) * n][1];
    }

with a VF of two this is a failure to identify the dependence between
the iterations, so possibly related to r11-6380 as well.

(compute_affine_dependence
  ref_a: BIT_FIELD_REF <*_37, 32, 0>, stmt_a: _38 = BIT_FIELD_REF <*_37, 32,
0>;  ref_b: BIT_FIELD_REF <*_40, 32, 0>, stmt_b: BIT_FIELD_REF <*_40, 32, 0> =
_41;
) -> dependence analysis failed

Creating dr for BIT_FIELD_REF <*_37, 32, 0>
analyze_innermost: success.
        base_address: a_23(D)
        offset from base address: 0
        constant offset from base address: 0
        step: (ssizetype) ((long unsigned int) n_20(D) * 16)
        base alignment: 16
        base misalignment: 0
        offset alignment: 128
        step alignment: 16
        base_object: BIT_FIELD_REF <*_37, 32, 0>

Creating dr for BIT_FIELD_REF <*_40, 32, 0>
analyze_innermost: success.
        base_address: (float4_t *) a_23(D) + (sizetype) n_20(D) * 16
        offset from base address: 0
        constant offset from base address: 0
        step: (ssizetype) ((long unsigned int) n_20(D) * 16)
        base alignment: 16
        base misalignment: 0
        offset alignment: 128
        step alignment: 16
        base_object: BIT_FIELD_REF <*_40, 32, 0>

and for reference

Creating dr for BIT_FIELD_REF <*_37, 32, 32>
analyze_innermost: success.
        base_address: a_23(D)
        offset from base address: 0
        constant offset from base address: 4
        step: (ssizetype) ((long unsigned int) n_20(D) * 16)
        base alignment: 16
        base misalignment: 0
        offset alignment: 128
        step alignment: 16
        base_object: BIT_FIELD_REF <*_37, 32, 32>

that looks sensible.  And 'a' is indeed properly aligned.

t.c:6:21: note:   recording new base alignment for d_22(D) + (sizetype) n_20(D)
* 4
  alignment:    4
  misalignment: 0
  based on:     _32 = *_31;
t.c:6:21: note:   recording new base alignment for a_23(D)
  alignment:    16
  misalignment: 0
  based on:     _38 = BIT_FIELD_REF <*_37, 32, 0>;
t.c:6:21: note:   recording new base alignment for (float4_t *) a_23(D) +
(sizetype) n_20(D) * 16
  alignment:    16
  misalignment: 0
  based on:     BIT_FIELD_REF <*_40, 32, 0> = _41;
t.c:6:21: note:   vect_compute_data_ref_alignment:
t.c:6:21: missed:   step doesn't divide the vector alignment.
t.c:6:21: missed:   Unknown alignment for access: *_31
t.c:6:21: note:   vect_compute_data_ref_alignment:
t.c:6:21: missed:   Unknown alignment for access: BIT_FIELD_REF <*_37, 32, 0>
t.c:6:21: note:   vect_compute_data_ref_alignment:
t.c:6:21: missed:   Unknown alignment for access: BIT_FIELD_REF <*_40, 32, 0>

[Bug c++/115192] [11/12/13/14/15 regression] -O3 miscompilation on x86-64 (loops with vectors and scalars) since r11-6380

Reply via email to