Richard Guenther wrote: > 2009/8/19 Albert Cohen <albert.co...@inria.fr>: >> When debugging graphite, we ran into code bloat issues due to >> pass_complete_unrolli being called very early in the non-ipa >> optimization sequence. Much later, the full-blown pass_complete_unroll >> is scheduled, and this one does not do any harm. >> >> Strangely, this early unrolling pass (tuned to only unroll inner loops) >> is only enabled at -O3, independently of the -funroll-loops flag. >> >> Does anyone remember why it is there, for which platform it is useful, >> and what are the perf regressions if we remove it? > > The early loop unrolling pass is very important to remove abstraction > penalty for C++ programs that chose not to implement manual > unrolling by relying on the inliner and template metaprogramming. > > In tramp3d you for example see (very much simplified, intermediate > state after some inlining): > > foo (int i, int j, int k) > { > double a[][][]; > int index[3]; > const int dX[3] = { 1, 0, 0 }; > ... > for (m=0; m<3; ++m) > index[m] = 0; > index[0] = i; > index[1] = j; > index[2] = k; > ... a[index[0]][index[1]][index[2]]; > for (m=0; m<3; ++m) > index[m] += dx[m]; > ... a[index[0]][index[1]][index[2]]; > > etc. to access a[i][j][k] and a[i+1][j][k]. > > There is an absoulte need to unroll these simple loops before > CSE otherwise loop optimizations have no chance on optimizing > anything here. > > Another benchmark that degrades considerably without early > unrolling is 454.calculix (in fact that one was the reason to > add this pass). > >> My guess is that it may only harm... disabling or damaging the >> effectivenesss of the (loop-level) vectorizer and increasing compilation >> time. > > No it definitely does not. But it has one small issue in that it sometimes > also unrolls an outermost loop IIRC, that could be fixed.
Thanks a lot for the quick and detailed response. It is more difficult than I thought, then :-( We'll think more, and maybe come up with yet another pass ordering proposal, but definitely this tramp3d code deserves to be processed by graphite AFTER unrolling+cse has done its specialization trick. Albert