Richard, There is a regression in the induct polyhedron benchmark execution when gfortran compiled with -ffast-math -O3 introduced with gcc 4.3 that isn't present in gcc 4.2.4.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=36599 According to the SuSe polyhedron benchmark servers, the bottleneck in the induct benchmark was eliminated between 2008-04-27 and 2008-04-28 in gcc trunk. My guess is that your changes... r134730 | rguenth | 2008-04-27 12:27:08 -0400 (Sun, 27 Apr 2008) | 42 lines 2008-04-27 Richard Guenther <[EMAIL PROTECTED]> PR tree-optimization/18754 PR tree-optimization/34223 * tree-pass.h (pass_complete_unrolli): Declare. * tree-ssa-loop-ivcanon.c (try_unroll_loop_completely): Print loop size before and after unconditionally of UL_NO_GROWTH in effect. Rewrite loop into loop closed SSA form if it is not already. (tree_unroll_loops_completely): Re-structure to iterate over innermost loops with intermediate CFG cleanups. Unroll outermost loops only if requested or the code does not grow doing so. * tree-ssa-loop.c (gate_tree_vectorize): Don't shortcut if no loops are available. (tree_vectorize): Instead do so here. (tree_complete_unroll): Also unroll outermost loops. (tree_complete_unroll_inner): New function. (gate_tree_complete_unroll_inner): Likewise. (pass_complete_unrolli): New pass. * tree-ssa-loop-manip.c (find_uses_to_rename_use): Only record uses outside of the loop. (tree_duplicate_loop_to_header_edge): Only verify loop-closed SSA form if it is available. * tree-flow.h (tree_unroll_loops_completely): Add extra parameter. * passes.c (init_optimization_passes): Schedule complete inner loop unrolling pass before the first CCP pass after final inlining. * gcc.dg/tree-ssa/loop-36.c: New testcase. * gcc.dg/tree-ssa/loop-37.c: Likewise. * gcc.dg/vect/vect-118.c: Likewise. * gcc.dg/Wunreachable-8.c: XFAIL bogus warning. * gcc.dg/vect/vect-66.c: Increase loop trip count. * gcc.dg/vect/no-section-anchors-vect-66.c: Likewise. * gcc.dg/vect/no-section-anchors-vect-69.c: Likewise. * gcc.dg/vect/vect-76.c: Likewise. * gcc.dg/vect/vect-outer-6.c: Likewise. * gcc.dg/vect/vect-outer-1.c: Likewise. * gcc.dg/vect/vect-outer-1a.c: Likewise. * gcc.dg/vect/vect-11a.c: Likewise. * gcc.dg/vect/vect-shift-1.c: Likewise. * gcc.target/i386/vectorize1.c: Likewise. ...solved this performance regression. Is it possible that these changes could be backported to gcc 4.3.2 to eliminate the performance regression in gcc 4.3 compared to gcc 4.2? Jack