Hi, I have a question about a loop that when vectorized does not get unrolled (by the rtl-level unroller), whereas the same loop when not vectorized does get unrolled. This is the testcase:
#define N 40 #define M 10 float in[N+M], coeff[M], out[N]; void fir (){ int i,j,k; float diff; for (i = 0; i < N; i++) { diff = 0; for (j = 0; j < M; j++) { diff += in[j+i]*coeff[j]; } out[i] = diff; } } When compiled as follows: gcc -O3 -funroll-loops vect-outer-fir-kernel.c -S --param max-completely-peeled-insns=5000 --param max-completely-peel-times=40 [-ftree-vectorize -maltivec] ...the inner loop gets completely unrolled (by the tree-level unroller) but the outer-loop (the i-loop, that is now an inner-loop) does not get later unrolled by the rtl unroller, although without vectorization it does. In both cases we start from this iv form (after iv_canon): [1] loop: iv = phi (iv, 40) iv = iv - 1; if (iv != 0) then goto loop else exit In the case that does get unrolled (i.e. without vectorization), after ivopts it looks like this: [2] loop: iv = phi (iv, 0) iv = iv + 4; if (iv != 160) then goto loop else exit And finally just before rtl unrolling it looks like this: [3] r258 = 0; loop: r258 = r258 + 4 if (r258 != 160) then goto loop else exit In the case that the loop gets vectorized, then we start from [1], and after vectorization we have: [4] loop: iv = phi (iv, 0) iv = iv + 1; if (iv < 10) then goto loop else exit After that, ivopts transforms it to the following: [5] loop: iv0 = &out iv = phi (iv0, iv) iv = iv + 16; LB = &out + 160; if (iv != LB) then goto loop else exit And finally at the stage of rtl unrolling it looks like this: [6] r186 = r2 + C; r318 = r186 + 160; loop: r186 = r186 + 16 if (r186 != r318) then goto loop else exit Then, in loop-unroll.c we call iv_number_of_iterations, which eventually calls iv_analyze_biv (with r258/r186), which in turn calls latch_dominating_def. There, when processing the vectorized case, it encounters the def in the loop ('r186+16'), and then the def outside the loop ('r2+C'), at which point it fails cause it found two defs (and so we get that this is not a simple iv, and not a simple loop, and unrolling fails: "Unable to prove that the loop iterates constant times"). When processing the unvectorized case, we also first encounter the def in the loop ('r258+16'), and then the def outside the loop ('0'), but this def succeeds the test "if (!bitmapset (bb_info->out,....))", and so we don't fail when we encounter the second def, and all is well. So one question I have is what is that bitmap exactly, and why does loop [6] fail rtl iv-analysis? The other question is what seems to be the most appropriate place to fix this - the vectorizer, the ivopts, or the rtl iv-analysis? Note that even when I change the vectorizer to use the same iv form as in [1], so that it produces this: [4] loop: iv = phi (iv,10) iv = iv - 1; if (iv != 0) then goto loop else exit ivopts still changes it to [5], and the loop still doesn't get unrolled, so I don't see what can be done in the vectorizer (?). thanks, dorit