On Tue, Oct 16, 2012 at 10:32 AM, Jan Hubicka <hubi...@ucw.cz> wrote:
> Hi,
> here is third revised version of the complette unroling changes.  While 
> working
> on the RTL variant I noticed PR54937 and the fact that I was overly aggressive
> on forcing single exit of the last iteration to be taken, because loop may 
> terminate
> otherwise (by EH or by exitting the program).  Same thinko is in loop-niter.
>
> This patch adds loop_edge_to_cancel that is more conservative: it looks for 
> the
> exit conditional where the non-exitting edges leads to latch and verifies that
> latch contains no statement with side effect that may terminate the loop.
> This still actually matches quite few non-single-exit loops and works well in
> practice.
>
> Unlike previous revision it also enables complette unrolling when code size
> does not grow even for non-innermost loops (with update in
> tree_unroll_loops_completely to walk them). This is something we did on RTL
> land but missed in trees.  This actually enables quite some optimizations when
> things can be propagated to the tiny inner loop body.
>
> I also fixed accounting in tree_estimate_loop_size for the cases where last
> iteration is not going to be updated.
>
> Finally I added code constructing __bulitin_unreachable as suggested by
> Ian.
>
> Bootstrapped/regtested x86_64-linux, also bootstrapped with -O3 and -Werror
> disabled and benchmarked. Among best benefits is about 7% improvement on 
> Applu,
> and it causes up to 15% improvements on vectorized loops with small iteration
> counts (by completelly peeling the precondition code).  There are no real
> performance regressions but there is some code size bloat.
>
> I plan to followup with strenghtening the heuristic to disable unrolling when
> benefits are absymal.  Easy is to limit unrolling on loops with CFG and/or
> calls in them.  We already have quite informed analysis in place.  I also plan
> to move simple FDO guided loop peeling from RTL level to trees to enable more
> propagation into peeled sequences.
>
> The patch also triggers bug in niter and requires xfailing do_1.f90 testcase.
> I filled PR 54932 to track this.
>
> There are also confused array bound warnings I hope to track incrementally, 
> too,
> by recording statements that are known to become unreachable in the last
> iteration and adding __buitin_unreachable in front of them. This is also
> important to avoid duplication leading to dead code: no other optimizers
> force paths leading to out of bound accesses to not happen.
>
> Honza
>
>
>         * tree-ssa-loop-ivcanon.c (tree_estimate_loop_size): Add 
> edge_to_cancel
>         parameter and use it to estimate code optimized out in the final 
> iteration.
>         (loop_edge_to_cancel): New function.
>         (try_unroll_loop_completely): New IRRED_IVALIDATED parameter;
>         handle unrolling loops with bounds given via max_loop_iteratins;
>         handle unrolling non-inner loops when code size shrinks;
>         tidy dump output; when the last iteration loop still stays
>         as loop in the CFG forcongly redirect the latch to
>         __builtin_unreachable.
>         (canonicalize_loop_induction_variables): Add irred_invlaidated
>         parameter; record niter bound derrived; dump
>         max_loop_iterations bounds; call try_unroll_loop_completely
>         even if no niter bound is given.
>         (canonicalize_induction_variables): Handle irred_invalidated.
>         (tree_unroll_loops_completely): Handle non-innermost loops;
>         handle irred_invalidated.
>         * cfgloop.h (unlop): Declare.
>         * cfgloopmanip.c (unloop): Export.
>         * tree.c (build_common_builtin_nodes): Build BULTIN_UNREACHABLE.
>

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=55051


-- 
H.J.

Reply via email to