https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116765

Richard Biener <rguenth at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |needs-bisection
                 CC|                            |rguenth at gcc dot gnu.org,
                   |                            |rsandifo at gcc dot gnu.org

--- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> ---
(In reply to Hongtao Liu from comment #3)
> So -mavx2 is sufficient to reprodcue the issue.
> There're cross-iteration dependence for the inner loop
>  ok[i][j] = ok[i][j] | ok[i + 1][j] | ok[i][j - 1];
> 
> loop vectorizer shouldn't use  256-bit for vectorization(128-bit is fine
> since std::bitset<105> takes 128-bit, it's slp inside the loop)

dependence analysis computes no dependence and zero distances only.  One
issue is that we compute nearly the same base_object but have different
dimension access functions:

Creating dr for MEM <_WordT> [(struct bitset *)_128 + 8B]
analyze_innermost: success.
        base_address: (struct bitset *) &ok + (sizetype) i_148 * 1696
        offset from base address: 0
        constant offset from base address: 8
        step: 16
        base alignment: 32
        base misalignment: 0
        offset alignment: 256
        step alignment: 16 
        base_object: MEM <_WordT> [(struct bitset *)(struct bitset *) &ok +
(sizetype) i_148 * 1696]
        Access function 0: {8B, +, 16}_5
Creating dr for MEM[(const struct _Base_bitset &)_131]._M_w[0]
analyze_innermost: success.
        base_address: (struct bitset *) &ok + (sizetype) i_148 * 1696
        offset from base address: 0
        constant offset from base address: 1680
        step: 16
        base alignment: 32 
        base misalignment: 0
        offset alignment: 256
        step alignment: 16
        base_object: MEM[(const struct _Base_bitset &)(struct bitset *) &ok +
(sizetype) i_148 * 1696]
        Access function 0: 0
        Access function 1: 0
        Access function 2: {1680B, +, 16}_5

also

Creating dr for ok[i_148][j_141].D.51656._M_w[0]
analyze_innermost: success.
        base_address: &ok
        offset from base address: (ssizetype) ((sizetype) i_148 * 1696)
        constant offset from base address: 0
        step: 16
        base alignment: 32
        base misalignment: 0
        offset alignment: 32
        step alignment: 16
        base_object: ok
        Access function 0: 0
        Access function 1: 0
        Access function 2: 0
        Access function 3: {i_148, +, 1}_5
        Access function 4: i_148

we do match up base_object but also have "clever" code there to deal
with some mismatches.

I think the following should be an equivalent C testcase but that's OK,
we even fail dependence analysis with smaller vector modes here:

t.c:6:23: note:   dependence distance  = 0.
t.c:6:23: note:   dependence distance == 0 between ok[i_63][j_59][0] and
ok[i_63][j_59][0]
t.c:8:44: missed:   versioning for alias required: can't determine dependence
between ok[_60][j_59][0] and ok[i_63][j_59][0]
consider run-time aliasing test between ok[_60][j_59][0] and ok[i_63][j_59][0]
t.c:6:23: note:   dependence distance  = 1.
t.c:8:62: missed:   not vectorized, possible dependence between data-refs
ok[i_63][_54][0] and ok[i_63][j_59][0]
t.c:6:23: missed:  bad data dependence.

unsigned long ok[105][105][2];
int n = 5;
int main() {
  ok[2][2][0] |= 1 << 2;
  for (int i = n; i; i--)
    for (int j = i; j <= n; j++) {
        for (int k = 0; k <= 1; ++k)
          ok[i][j][k] = ok[i][j][k] | ok[i + 1][j][k] | ok[i][j - 1][k];
    }
  if (ok[2][5][0] & (1 << 2) != 1)
    __builtin_abort ();
  return 0;
}

Reply via email to