2009/8/19 Albert Cohen <albert.co...@inria.fr>: > When debugging graphite, we ran into code bloat issues due to > pass_complete_unrolli being called very early in the non-ipa > optimization sequence. Much later, the full-blown pass_complete_unroll > is scheduled, and this one does not do any harm. > > Strangely, this early unrolling pass (tuned to only unroll inner loops) > is only enabled at -O3, independently of the -funroll-loops flag. > > Does anyone remember why it is there, for which platform it is useful, > and what are the perf regressions if we remove it?
The early loop unrolling pass is very important to remove abstraction penalty for C++ programs that chose not to implement manual unrolling by relying on the inliner and template metaprogramming. In tramp3d you for example see (very much simplified, intermediate state after some inlining): foo (int i, int j, int k) { double a[][][]; int index[3]; const int dX[3] = { 1, 0, 0 }; ... for (m=0; m<3; ++m) index[m] = 0; index[0] = i; index[1] = j; index[2] = k; ... a[index[0]][index[1]][index[2]]; for (m=0; m<3; ++m) index[m] += dx[m]; ... a[index[0]][index[1]][index[2]]; etc. to access a[i][j][k] and a[i+1][j][k]. There is an absoulte need to unroll these simple loops before CSE otherwise loop optimizations have no chance on optimizing anything here. Another benchmark that degrades considerably without early unrolling is 454.calculix (in fact that one was the reason to add this pass). > My guess is that it may only harm... disabling or damaging the > effectivenesss of the (loop-level) vectorizer and increasing compilation > time. No it definitely does not. But it has one small issue in that it sometimes also unrolls an outermost loop IIRC, that could be fixed. Richard. > > Thanks, > Albert > > PS: When this question is solved, it will also be interesting to start a > serious discussion on how to improve the flexibility in customizing pass > ordering and parameterization of passes depending on the target. Grigori > Fursin's work shows the strong benefits and already provides a working > prototype. This question is independent of whether the customization is > done by experts or machine-learning/statistical techniques. >