Changed the option to -floop-unroll-and jam as you suggested.

> > The patch takes advantage of the new isl based code generator introduced
> > recently
> > in GCC (in fact of the possible options for building the AST).
> >
> > The code generated for this optimization in the case of non-constant loop
> > bounds
> > initially looks as below. This is not very useful because the standard GCC
> > unrolling don't succeed to unroll the most inner loop.
> >
> > ISL AST generated by ISL:
> > for (int c0 = 0; c0 < HEIGHT; c0 += 4)
> >   for (int c1 = 0; c1 < LENGTH - 3; c1 += 1)
> >     for (int c2 = c0; c2 <= min(HEIGHT - 1, c0 + 3); c2 += 1)
> 
> Hmm, so this iterates at most 4 times, right?  Eventually the body is
> considered
> too large by GCC or it fails to compute an upper bound for the number
> of iterations.
> Is that (an upper bound for the number of iterations) available readily from
> ISL
> at code-generation time?  If so you can transfer this knowledge to the GCC
> loop
> information.
> 

The problem was not explained well. It is not only the unrolling, it is also
the loop separation (which the latest version of the patch does). Even if the 
gcc
unrolling succeeds to unroll the inner loop you will get a code similar with
the one obtained by the previous version of this patch, which is not what is 
wanted.

Last time when checked, GCC unrolling was not able to unroll the inner loop.
In my opinion it is the min and max that prevent it (graphite for blocking,
strip-mine, unroll and jam emits such code). The bounds of the iteration 
domain are expressed in min, max terms.
  
> I'm curious to see a testcase (and a way to generate the above form) to see
> what
> is actually the problem.
> 

Of course. Take the code from the unroll-and-jam patch and the attached test
case (but as said other graphite options will generate similar code). But 
somehow
it seems that the new isl based code generator could handle more easily such
transformations.

Mircea



> Thanks,
> Richard.
> 
> >       S_4(c2, c1);
> >
> > Now, the "separating class" option (set for unroll and jam) produces this
> > nice loop
> > structure:
> > ISL AST generated by ISL:
> > for (int c0 = 0; c0 < HEIGHT; c0 += 4)
> >   for (int c1 = 0; c1 < LENGTH - 3; c1 += 1)
> >     if (HEIGHT >= c0 + 4) {
> >       for (int c2 = c0; c2 <= c0 + 3; c2 += 1)
> >         S_4(c2, c1);
> >     } else
> >       for (int c2 = c0; c2 < HEIGHT; c2 += 1)
> >         S_4(c2, c1);
> >
> > The "unroll" option (set for unroll and jam) produces:
> > ISL AST generated by ISL:
> > for (int c0 = 0; c0 < HEIGHT; c0 += 4)
> >   for (int c1 = 0; c1 < LENGTH - 3; c1 += 1)
> >     if (HEIGHT >= c0 + 4) {
> >       S_4(c0, c1);
> >       S_4(c0 + 1, c1);
> >       S_4(c0 + 2, c1);
> >       S_4(c0 + 3, c1);
> >     } else {
> >       S_4(c0, c1);
> >       if (HEIGHT >= c0 + 2) {
> >         S_4(c0 + 1, c1);
> >         if (4 * floord(HEIGHT - 3, 4) + 3 == HEIGHT && c0 + 3 == HEIGHT)
> >           S_4(HEIGHT - 1, c1);
> >       }
> >     }
> >
> > The "separate" option (set by default for all dimensions for the new isl
> > based code generator)
> > don't succeed to remove the ifs from the loops and generate two loop
> > structures (this would
> > have been highly desirable).
> >
> > As the stage 1 is going to close soon, quick feedback to this patch is
> > greatly appreciated.
> > Many thanks, Mircea Namolaru
> 
int
f1(int v[1024][1024], int HEIGHT, int LENGTH)
{
  int i, j;

  for (i=0; i<HEIGHT; i++) {
    for (j=3; j< LENGTH; j++) {
      v[i][j] = v[i][j-3] + v[i][j-2] + v[i][j];
    }
  }

}

Reply via email to