Re: complete_unrolli / complete_unroll

Richard Guenther Wed, 19 Aug 2009 05:07:20 -0700

2009/8/19 Albert Cohen <albert.co...@inria.fr>:
> When debugging graphite, we ran into code bloat issues due to
> pass_complete_unrolli being called very early in the non-ipa
> optimization sequence. Much later, the full-blown pass_complete_unroll
> is scheduled, and this one does not do any harm.
>
> Strangely, this early unrolling pass (tuned to only unroll inner loops)
> is only enabled at -O3, independently of the -funroll-loops flag.
>
> Does anyone remember why it is there, for which platform it is useful,
> and what are the perf regressions if we remove it?


The early loop unrolling pass is very important to remove abstraction
penalty for C++ programs that chose not to implement manual
unrolling by relying on the inliner and template metaprogramming.

In tramp3d you for example see (very much simplified, intermediate
state after some inlining):

 foo (int i, int j, int k)
{
 double a[][][];
 int index[3];
 const int dX[3] = { 1, 0, 0 };
...
 for (m=0; m<3; ++m)
  index[m] = 0;
 index[0] = i;
 index[1] = j;
 index[2] = k;
  ... a[index[0]][index[1]][index[2]];
 for (m=0; m<3; ++m)
  index[m] += dx[m];
... a[index[0]][index[1]][index[2]];

etc. to access a[i][j][k] and a[i+1][j][k].

There is an absoulte need to unroll these simple loops before
CSE otherwise loop optimizations have no chance on optimizing
anything here.

Another benchmark that degrades considerably without early
unrolling is 454.calculix (in fact that one was the reason to
add this pass).

> My guess is that it may only harm... disabling or damaging the
> effectivenesss of the (loop-level) vectorizer and increasing compilation
> time.

No it definitely does not.  But it has one small issue in that it sometimes
also unrolls an outermost loop IIRC, that could be fixed.

Richard.

>
> Thanks,
> Albert
>
> PS: When this question is solved, it will also be interesting to start a
> serious discussion on how to improve the flexibility in customizing pass
> ordering and parameterization of passes depending on the target. Grigori
> Fursin's work shows the strong benefits and already provides a working
> prototype. This question is independent of whether the customization is
> done by experts or machine-learning/statistical techniques.
>

Re: complete_unrolli / complete_unroll

Reply via email to