https://gcc.gnu.org/bugzilla/show_bug.cgi?id=116765
Richard Biener <rguenth at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Keywords| |needs-bisection CC| |rguenth at gcc dot gnu.org, | |rsandifo at gcc dot gnu.org --- Comment #4 from Richard Biener <rguenth at gcc dot gnu.org> --- (In reply to Hongtao Liu from comment #3) > So -mavx2 is sufficient to reprodcue the issue. > There're cross-iteration dependence for the inner loop > ok[i][j] = ok[i][j] | ok[i + 1][j] | ok[i][j - 1]; > > loop vectorizer shouldn't use 256-bit for vectorization(128-bit is fine > since std::bitset<105> takes 128-bit, it's slp inside the loop) dependence analysis computes no dependence and zero distances only. One issue is that we compute nearly the same base_object but have different dimension access functions: Creating dr for MEM <_WordT> [(struct bitset *)_128 + 8B] analyze_innermost: success. base_address: (struct bitset *) &ok + (sizetype) i_148 * 1696 offset from base address: 0 constant offset from base address: 8 step: 16 base alignment: 32 base misalignment: 0 offset alignment: 256 step alignment: 16 base_object: MEM <_WordT> [(struct bitset *)(struct bitset *) &ok + (sizetype) i_148 * 1696] Access function 0: {8B, +, 16}_5 Creating dr for MEM[(const struct _Base_bitset &)_131]._M_w[0] analyze_innermost: success. base_address: (struct bitset *) &ok + (sizetype) i_148 * 1696 offset from base address: 0 constant offset from base address: 1680 step: 16 base alignment: 32 base misalignment: 0 offset alignment: 256 step alignment: 16 base_object: MEM[(const struct _Base_bitset &)(struct bitset *) &ok + (sizetype) i_148 * 1696] Access function 0: 0 Access function 1: 0 Access function 2: {1680B, +, 16}_5 also Creating dr for ok[i_148][j_141].D.51656._M_w[0] analyze_innermost: success. base_address: &ok offset from base address: (ssizetype) ((sizetype) i_148 * 1696) constant offset from base address: 0 step: 16 base alignment: 32 base misalignment: 0 offset alignment: 32 step alignment: 16 base_object: ok Access function 0: 0 Access function 1: 0 Access function 2: 0 Access function 3: {i_148, +, 1}_5 Access function 4: i_148 we do match up base_object but also have "clever" code there to deal with some mismatches. I think the following should be an equivalent C testcase but that's OK, we even fail dependence analysis with smaller vector modes here: t.c:6:23: note: dependence distance = 0. t.c:6:23: note: dependence distance == 0 between ok[i_63][j_59][0] and ok[i_63][j_59][0] t.c:8:44: missed: versioning for alias required: can't determine dependence between ok[_60][j_59][0] and ok[i_63][j_59][0] consider run-time aliasing test between ok[_60][j_59][0] and ok[i_63][j_59][0] t.c:6:23: note: dependence distance = 1. t.c:8:62: missed: not vectorized, possible dependence between data-refs ok[i_63][_54][0] and ok[i_63][j_59][0] t.c:6:23: missed: bad data dependence. unsigned long ok[105][105][2]; int n = 5; int main() { ok[2][2][0] |= 1 << 2; for (int i = n; i; i--) for (int j = i; j <= n; j++) { for (int k = 0; k <= 1; ++k) ok[i][j][k] = ok[i][j][k] | ok[i + 1][j][k] | ok[i][j - 1][k]; } if (ok[2][5][0] & (1 << 2) != 1) __builtin_abort (); return 0; }