On Tue, Oct 16, 2012 at 10:32 AM, Jan Hubicka <hubi...@ucw.cz> wrote: > Hi, > here is third revised version of the complette unroling changes. While > working > on the RTL variant I noticed PR54937 and the fact that I was overly aggressive > on forcing single exit of the last iteration to be taken, because loop may > terminate > otherwise (by EH or by exitting the program). Same thinko is in loop-niter. > > This patch adds loop_edge_to_cancel that is more conservative: it looks for > the > exit conditional where the non-exitting edges leads to latch and verifies that > latch contains no statement with side effect that may terminate the loop. > This still actually matches quite few non-single-exit loops and works well in > practice. > > Unlike previous revision it also enables complette unrolling when code size > does not grow even for non-innermost loops (with update in > tree_unroll_loops_completely to walk them). This is something we did on RTL > land but missed in trees. This actually enables quite some optimizations when > things can be propagated to the tiny inner loop body. > > I also fixed accounting in tree_estimate_loop_size for the cases where last > iteration is not going to be updated. > > Finally I added code constructing __bulitin_unreachable as suggested by > Ian. > > Bootstrapped/regtested x86_64-linux, also bootstrapped with -O3 and -Werror > disabled and benchmarked. Among best benefits is about 7% improvement on > Applu, > and it causes up to 15% improvements on vectorized loops with small iteration > counts (by completelly peeling the precondition code). There are no real > performance regressions but there is some code size bloat. > > I plan to followup with strenghtening the heuristic to disable unrolling when > benefits are absymal. Easy is to limit unrolling on loops with CFG and/or > calls in them. We already have quite informed analysis in place. I also plan > to move simple FDO guided loop peeling from RTL level to trees to enable more > propagation into peeled sequences. > > The patch also triggers bug in niter and requires xfailing do_1.f90 testcase. > I filled PR 54932 to track this. > > There are also confused array bound warnings I hope to track incrementally, > too, > by recording statements that are known to become unreachable in the last > iteration and adding __buitin_unreachable in front of them. This is also > important to avoid duplication leading to dead code: no other optimizers > force paths leading to out of bound accesses to not happen. > > Honza > > > * tree-ssa-loop-ivcanon.c (tree_estimate_loop_size): Add > edge_to_cancel > parameter and use it to estimate code optimized out in the final > iteration. > (loop_edge_to_cancel): New function. > (try_unroll_loop_completely): New IRRED_IVALIDATED parameter; > handle unrolling loops with bounds given via max_loop_iteratins; > handle unrolling non-inner loops when code size shrinks; > tidy dump output; when the last iteration loop still stays > as loop in the CFG forcongly redirect the latch to > __builtin_unreachable. > (canonicalize_loop_induction_variables): Add irred_invlaidated > parameter; record niter bound derrived; dump > max_loop_iterations bounds; call try_unroll_loop_completely > even if no niter bound is given. > (canonicalize_induction_variables): Handle irred_invalidated. > (tree_unroll_loops_completely): Handle non-innermost loops; > handle irred_invalidated. > * cfgloop.h (unlop): Declare. > * cfgloopmanip.c (unloop): Export. > * tree.c (build_common_builtin_nodes): Build BULTIN_UNREACHABLE. >
This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55051 -- H.J.