Hi,

I have a question about a loop that when vectorized does not get unrolled
(by the rtl-level unroller), whereas the same loop when not vectorized does
get unrolled. This is the testcase:

#define N 40
#define M 10
float in[N+M], coeff[M], out[N];
void fir (){
  int i,j,k;
  float diff;
  for (i = 0; i < N; i++) {
    diff = 0;
    for (j = 0; j < M; j++) {
      diff += in[j+i]*coeff[j];
    }
    out[i] = diff;
  }
}

When compiled as follows: gcc -O3 -funroll-loops vect-outer-fir-kernel.c -S
--param max-completely-peeled-insns=5000 --param
max-completely-peel-times=40 [-ftree-vectorize -maltivec]

...the inner loop gets completely unrolled (by the tree-level unroller) but
the outer-loop (the i-loop, that is now an inner-loop) does not get later
unrolled by the rtl unroller, although without vectorization it does.

In both cases we start from this iv form (after iv_canon):
[1] loop:
      iv = phi (iv, 40)
      iv = iv - 1;
      if (iv != 0) then goto loop else exit

In the case that does get unrolled (i.e. without vectorization), after
ivopts it looks like this:
[2]   loop:
      iv = phi (iv, 0)
      iv = iv + 4;
      if (iv != 160) then goto loop else exit

And finally just before rtl unrolling it looks like this:
[3]   r258 = 0;
      loop:
      r258 = r258 + 4
      if (r258 != 160) then goto loop else exit

In the case that the loop gets vectorized, then we start from [1], and
after vectorization we have:
[4]   loop:
      iv = phi (iv, 0)
      iv = iv + 1;
      if (iv < 10) then goto loop else exit

After that, ivopts transforms it to the following:
[5]   loop:
      iv0 = &out
      iv = phi (iv0, iv)
      iv = iv + 16;
      LB = &out + 160;
      if (iv != LB) then goto loop else exit

And finally at the stage of rtl unrolling it looks like this:
[6]   r186 = r2 + C;
      r318 = r186 + 160;
      loop:
      r186 = r186 + 16
      if (r186 != r318) then goto loop else exit

Then, in loop-unroll.c we call iv_number_of_iterations, which eventually
calls iv_analyze_biv (with r258/r186), which in turn calls
latch_dominating_def.
There, when processing the vectorized case, it encounters the def in the
loop ('r186+16'), and then the def outside the loop ('r2+C'), at which
point it fails cause it found two defs (and so we get that this is not a
simple iv, and not a simple loop, and unrolling fails: "Unable to prove
that the loop iterates constant times").
When processing the unvectorized case, we also first encounter the def in
the loop ('r258+16'), and then the def outside the loop ('0'), but this def
succeeds the test "if (!bitmapset (bb_info->out,....))", and so we don't
fail when we encounter the second def, and all is well.

So one question I have is what is that bitmap exactly, and why does loop
[6] fail rtl iv-analysis?

The other question is what seems to be the most appropriate place to fix
this - the vectorizer, the ivopts, or the rtl iv-analysis?
Note that even when I change the vectorizer to use the same iv form as in
[1], so that it produces this:
[4]   loop:
      iv = phi (iv,10)
      iv = iv - 1;
      if (iv != 0) then goto loop else exit
ivopts still changes it to [5], and the loop still doesn't get unrolled, so
I don't see what can be done in the vectorizer (?).

thanks,
dorit

Reply via email to