https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81740
--- Comment #4 from amker at gcc dot gnu.org --- (In reply to Jakub Jelinek from comment #3) > Testcase modified for the testsuite: > > int a[8][10] = { [2][5] = 4 }, c; > > int > main () > { > short b; > int i, d; > for (b = 4; b >= 0; b--) > for (c = 0; c <= 6; c++) > a[c + 1][b + 2] = a[c][b + 1]; > for (i = 0; i < 8; i++) > for (d = 0; d < 10; d++) > if (a[i][d] != (i == 3 && d == 6) * 4) > __builtin_abort (); > return 0; > } So without reversal of inner loop, the loop nest is illegal for vectorization. The issue is in data dependence checking of vectorizer, I believe the mentioned revision just exposed this. Previously the vectorization is skipped because of unsupported memory operation. The outer loop vectorization unrolls the outer loop into: for (b = 4; b >= 0; b -= 4) { for (c = 0; c <= 6; c++) a[c + 1][6] = a[c][5]; for (c = 0; c <= 6; c++) a[c + 1][5] = a[c][4]; for (c = 0; c <= 6; c++) a[c + 1][4] = a[c][3]; for (c = 0; c <= 6; c++) a[c + 1][3] = a[c][2]; } Then four inner loops are fused into: for (b = 4; b >= 0; b -= 4) { for (c = 0; c <= 6; c++) { a[c + 1][6] = a[c][5]; // S1 a[c + 1][5] = a[c][4]; // S2 a[c + 1][4] = a[c][3]; a[c + 1][3] = a[c][2]; } } The loop fusion needs to meet the dependence requirement. Basically, GCC's data dependence analyzer doesn't model deps between references in sibling loops, but in practice, fusion requirement can be checked by analyzing all data references after fusion, and there is no backward data dependence. Apparently, the requirement is violated because we have backward data dependence between references (a[c][5], a[c+1][5]) in S1/S2. Note, if we reverse the inner loop, the outer loop would become legal for vectorization. As for fix, we need to enforce dep checking in vectorizer for outer loop vectorization. Preparing a patch now. Thanks