On Thu, 11 Oct 2012, Jan Hubicka wrote: > Hi, > while looking into RTL loop peeling micopmilation I found that we now do a > lot of > RTL loop peeling for loops of the form > > int a[2]; > test(int c) > { > int i; > for (i=0;i<c;i++) > a[i]=5; > } > this is because tree-ssa-loop-niter is able to prove that the loop iterates at > most twice. This is better to be done on gimple level because destroying the > loop enables more propagation. > > This patch started as simple attempt to make try_unroll_loop_completely > to use max_loop_iterations_int. I had to track several issues > > 1) try_unroll_loop_completely handles only inner loops. I am fine with not > peeling loops with subloops, but if the loop is known to iterate 0 times, > we should turn it into non-loop. > 2) try_unroll_loop_completely now handles removal of the loop by making > exit edge conditional to be true and relying on cleanup_cfg to do > the job. This does not work for all the cases we can handle > by max_loop_iterations_int. When loop has multiple possible exits > we don't really know what exit will be taken (we know it will be one > of them). > > I added logic handling simple case where loop has single exit and > in generic case I unloop it by adding __builtin_unreachable on the > loopback path. While this seems bit involved it works well. > 3) The last iteration has parts that are provably unreachable and should > be optimized out later. VRP however tends to report array bound warnings > that breaks -O3 bootstrap. > I sent separate fix for that. Still I need to add 3 initializations > to avoid maybe used uninitialized warning to get -O3 bootstrap working > with this patch. > > The patch got quite a lot of snowballing effect, so I decided to not include: > > 1) the cost model is skewed for last iteration. It is often just duplicated > loop header - i.e. all code dominated by the optmized out exit test should > not be accounted. This would make cunroll-4.c testcase bellow to be > unlooped > during cunrolli rather than during complette unroling. > 2) profile updating is broken for any loop containing non-trivial control > flow. > We need to teach loop duplication code to update sanely the profile when > wont_exit is true and we need to update profile > 3) we probably want to do less unrolling when result is not a single basic > block. Current limit of 400 isnsns seems bit too large; making one basic > block is important win. Making sequence of basic blocks with exits is > less so.
Nice - this was on my TODO list as well, btw ;) > * gcc.dg/tree-ssa/cunroll-1.c: New testcase. > * gcc.dg/tree-ssa/cunroll-2.c: New testcase. > * gcc.dg/tree-ssa/cunroll-3.c: New testcase. > * gcc.dg/tree-ssa/cunroll-4.c: New testcase. > * gcc.dg/tree-ssa/cunroll-5.c: New testcase. > > * cfgloopmanip.c (unloop): Export. > * tree-ssa-loop-ivcanon.c (tree_estimate_loop_size): Estimate > also with unknown exit conditoinal. > (try_unroll_loop_completely): Use max_loop_iterations_int to unroll > also loops with low upper bound; handle unlooping of the last loop > even when exit conditional is not known; unloop loop that do not loop > even if they are not innermost. > (canonicalize_loop_induction_variables): Record niter bounds known; > try unrolling even if number of iterations is not known; > (canonicalize_induction_variables): Be ready for loops disappearing. > (tree_unroll_loops_completely): Likewise. > * cfgloop.h (unloop): Declare. > > * f95-lang.c (gfc_init_builtin_functions): Build __builtin_unreachable. I wonder if other languages need similar adjustment? + /* Now destroy the loop. First try to do so by cancelling the + patch from exit conditional if we identified good candidate. + you mean 'path' here, right? Similar in other places. + /* Unloop destroys the latch edge. */ + unloop (loop, &irred_invalidated); + if (irred_invalidated) + mark_irreducible_loops (); this makes the whole thing quadratic in the number of loops. Please pass down a flag and handle this in the place we update SSA form (thus, once per function / unroll iteration). Or provide a more optimial implementation that only considers parent loops of 'loop' (even that is excessive though, quadratic in the loop depth). - FOR_EACH_LOOP (li, loop, 0) + for (fel_init (&li, &next, LI_ONLY_INNERMOST); next;) FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST) but I'm sure that IV canonicalization should happen for all loops. So, why do you need this change? Esp. we should get rid of single-iteration outer loops here, too. - FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST) + for (fel_init (&li, &next, 0); next;) see above - FOR_EACH_LOOP (li, loop, 0) Otherwise looks ok. Please update and re-post. Thanks, Richard. > Index: testsuite/gcc.dg/tree-ssa/cunroll-1.c > =================================================================== > --- testsuite/gcc.dg/tree-ssa/cunroll-1.c (revision 0) > +++ testsuite/gcc.dg/tree-ssa/cunroll-1.c (revision 0) > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */ > +int a[2]; > +test(int c) > +{ > + int i; > + for (i=0;i<c;i++) > + a[i]=5; > +} > +/* Array bounds says the loop will not roll much. */ > +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 1 > times). Last iteration exit edge was proved true." "cunroll"} } */ > +/* { dg-final { cleanup-tree-dump "cunroll" } } */ > Index: testsuite/gcc.dg/tree-ssa/cunroll-2.c > =================================================================== > --- testsuite/gcc.dg/tree-ssa/cunroll-2.c (revision 0) > +++ testsuite/gcc.dg/tree-ssa/cunroll-2.c (revision 0) > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */ > +int a[2]; > +test(int c) > +{ > + int i; > + for (i=0;i<c;i++) > + { > + a[i]=5; > + if (test2()) > + return; > + } > +} > +/* We are not able to get rid of the final conditional because the loop has > two exits. */ > +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 2 > times)." "cunroll"} } */ > +/* { dg-final { cleanup-tree-dump "cunroll" } } */ > Index: testsuite/gcc.dg/tree-ssa/cunroll-3.c > =================================================================== > --- testsuite/gcc.dg/tree-ssa/cunroll-3.c (revision 0) > +++ testsuite/gcc.dg/tree-ssa/cunroll-3.c (revision 0) > @@ -0,0 +1,15 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */ > +int a[1]; > +test(int c) > +{ > + int i; > + for (i=0;i<c;i++) > + { > + a[i]=5; > + } > +} > +/* If we start duplicating headers prior curoll, this loop will have 0 > iterations. */ > + > +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 1 > times)." "cunrolli"} } */ > +/* { dg-final { cleanup-tree-dump "cunrolli" } } */ > Index: testsuite/gcc.dg/tree-ssa/cunroll-4.c > =================================================================== > --- testsuite/gcc.dg/tree-ssa/cunroll-4.c (revision 0) > +++ testsuite/gcc.dg/tree-ssa/cunroll-4.c (revision 0) > @@ -0,0 +1,20 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */ > +int a[1]; > +test(int c) > +{ > + int i=0,j; > + for (i=0;i<c;i++) > + { > + for (j=0;j<c;j++) > + { > + a[i]=5; > + test2(); > + } > + } > +} > + > +/* We should do this as part of cunrolli, but our cost model do not take > into account early exit > + from the last iteration. */ > +/* { dg-final { scan-tree-dump-not "Turned loop 1 to non-loop; it never > loops. Last iteration exit edge was proved true." "cunrolli"} } */ > +/* { dg-final { cleanup-tree-dump "cunroll" } } */ > Index: testsuite/gcc.dg/tree-ssa/cunroll-5.c > =================================================================== > --- testsuite/gcc.dg/tree-ssa/cunroll-5.c (revision 0) > +++ testsuite/gcc.dg/tree-ssa/cunroll-5.c (revision 0) > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */ > +int *a; > +test(int c) > +{ > + int i; > + for (i=0;i<6;i++) > + a[i]=5; > +} > +/* Basic testcase for complette unrolling. */ > +/* { dg-final { scan-tree-dump-not "Unrolled loop 1 completely (duplicated 5 > times). Exit condition of peeled iterations was eliminated. Last iteration > exit edge was proved true." "cunroll"} } */ > +/* { dg-final { cleanup-tree-dump "cunroll" } } */ > Index: cfgloopmanip.c > =================================================================== > --- cfgloopmanip.c (revision 192360) > +++ cfgloopmanip.c (working copy) > @@ -37,7 +37,6 @@ static int find_path (edge, basic_block > static void fix_loop_placements (struct loop *, bool *); > static bool fix_bb_placement (basic_block); > static void fix_bb_placements (basic_block, bool *); > -static void unloop (struct loop *, bool *); > > /* Checks whether basic block BB is dominated by DATA. */ > static bool > @@ -895,7 +894,7 @@ loopify (edge latch_edge, edge header_ed > If this may cause the information about irreducible regions to become > invalid, IRRED_INVALIDATED is set to true. */ > > -static void > +void > unloop (struct loop *loop, bool *irred_invalidated) > { > basic_block *body; > Index: tree-ssa-loop-ivcanon.c > =================================================================== > --- tree-ssa-loop-ivcanon.c (revision 192360) > +++ tree-ssa-loop-ivcanon.c (working copy) > @@ -231,7 +231,7 @@ tree_estimate_loop_size (struct loop *lo > /* Look for reasons why we might optimize this stmt away. */ > > /* Exit conditional. */ > - if (body[i] == exit->src && stmt == last_stmt (exit->src)) > + if (exit && body[i] == exit->src && stmt == last_stmt (exit->src)) > { > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, " Exit condition will be eliminated.\n"); > @@ -326,13 +326,43 @@ try_unroll_loop_completely (struct loop > unsigned HOST_WIDE_INT n_unroll, ninsns, max_unroll, unr_insns; > gimple cond; > struct loop_size size; > + bool n_unroll_found = false; > + HOST_WIDE_INT maxiter; > + basic_block latch; > + edge latch_edge; > + location_t locus; > + int flags; > + bool irred_invalidated = false; > + gimple stmt; > + gimple_stmt_iterator gsi; > + edge other_edge = NULL, last_exit; > + int num = loop->num; > + bool last_iteration_updated = false; > + > + /* See if we proved number of iterations to be low constant. */ > + if (host_integerp (niter, 1)) > + { > + n_unroll = tree_low_cst (niter, 1); > + n_unroll_found = true; > + } > > - if (loop->inner) > + /* See if we can improve our estimate by using recorded loop bounds. */ > + maxiter = max_loop_iterations_int (loop); > + if (maxiter >= 0 > + && (!n_unroll_found || (unsigned HOST_WIDE_INT)maxiter < n_unroll)) > + { > + n_unroll = maxiter; > + n_unroll_found = true; > + } > + > + if (!n_unroll_found) > return false; > > - if (!host_integerp (niter, 1)) > + /* We unroll only inner loops, because we do not consider it profitable > + otheriwse. We still can cancel loopback edge of not rolling loop; > + this is always a good idea. */ > + if (loop->inner && n_unroll) > return false; > - n_unroll = tree_low_cst (niter, 1); > > max_unroll = PARAM_VALUE (PARAM_MAX_COMPLETELY_PEEL_TIMES); > if (n_unroll > max_unroll) > @@ -340,6 +370,10 @@ try_unroll_loop_completely (struct loop > > if (n_unroll) > { > + sbitmap wont_exit; > + edge e; > + unsigned i; > + VEC (edge, heap) *to_remove = NULL; > if (ul == UL_SINGLE_ITER) > return false; > > @@ -372,14 +408,6 @@ try_unroll_loop_completely (struct loop > fprintf (dump_file, "Not unrolling loop %d.\n", loop->num); > return false; > } > - } > - > - if (n_unroll) > - { > - sbitmap wont_exit; > - edge e; > - unsigned i; > - VEC (edge, heap) *to_remove = NULL; > > initialize_original_copy_tables (); > wont_exit = sbitmap_alloc (n_unroll + 1); > @@ -408,24 +436,143 @@ try_unroll_loop_completely (struct loop > free_original_copy_tables (); > } > > - cond = last_stmt (exit->src); > - if (exit->flags & EDGE_TRUE_VALUE) > - gimple_cond_make_true (cond); > - else > - gimple_cond_make_false (cond); > - update_stmt (cond); > + /* After complette unrolling we still may get rid of the conditional > + on exit in the last copy even if we have no idea what it does. > + This is quite common case for loops of form > + > + int a[5]; > + for (i=0;i<b;i++) > + a[i]=0; > + > + Here we prove the loop to iterate 5 times but we do not know > + it from induction variable. > + > + We want to make the conditional of exit true, so we can only > + consider exits that are not part of the inner (irreducible) loops > + so we know that the conditional is tested at most once per iteration. > + > + Situation is more complicated withloops with multiple exists: > + > + int a[5]; > + for (i=0;i<b;i++) > + { > + if (blah) > + break; > + a[i]=0; > + } > + > + Here we could cancel the conditional of if "(blah)". And: > + > + int a[5]; > + for (i=0;i<b;i++) > + { > + a[i]=0; > + if (blah) > + break; > + } > + > + Here we can cancel the last i<b test. > + > + To handle these we need to track all statements containing code that > + can not execute on last iteration (in tree-ssa-loop-niter). > + Then we can use control dependnecy (not computed here) to cancel all > + the paths leading to them unless they are part of the inner loops. > + This however seems bit like an overkill so we handle only the > + simple case of single exit until interesting testcases appears. */ > + last_exit = exit; > + if (!last_exit) > + { > + last_exit = single_exit (loop); > + if (!last_exit || last_exit->src->loop_father != loop > + || !(last_exit->flags & (EDGE_TRUE_VALUE | EDGE_FALSE_VALUE))) > + last_exit = NULL; > + else > + { > + if (last_exit->src->flags & BB_IRREDUCIBLE_LOOP) > + last_exit = NULL; > + } > + } > + > + /* If loop has only one exit edge, we can remove the conditional from > + the last copy of the loop. > + TODO: We should account this update into cost model. */ > + if (last_exit) > + { > + cond = last_stmt (last_exit->src); > + if (last_exit->flags & EDGE_TRUE_VALUE) > + gimple_cond_make_true (cond); > + else > + gimple_cond_make_false (cond); > + update_stmt (cond); > + other_edge = EDGE_SUCC (last_exit->src, 0); > + if (other_edge == last_exit) > + other_edge = EDGE_SUCC (last_exit->src, 1); > + last_iteration_updated = true; > + } > + > + /* Now destroy the loop. First try to do so by cancelling the > + patch from exit conditional if we identified good candidate. > + > + TODO: We should update the loop profile for the fact that > + the last iteration no longer executes. */ > + if (!other_edge || !remove_path (other_edge)) > + { > + /* We did not manage to cancel the loop by removing the patch. > + The loop latch remains reachable even if it will never be reached > + at runtime. We must redirect it to somewhere, so create basic > + block containg __builtin_unreachable call for this reason. */ > + latch = loop->latch; > + latch_edge = loop_latch_edge (loop); > + flags = latch_edge->flags; > + locus = latch_edge->goto_locus; > + > + /* Unloop destroys the latch edge. */ > + unloop (loop, &irred_invalidated); > + if (irred_invalidated) > + mark_irreducible_loops (); > + > + /* Create new basic block for the latch edge destination and wire > + it in. */ > + stmt = gimple_build_call (builtin_decl_implicit > (BUILT_IN_UNREACHABLE), 0); > + latch_edge = make_edge (latch, create_basic_block (NULL, NULL, latch), > flags); > + gsi = gsi_start_bb (latch_edge->dest); > + gsi_insert_after (&gsi, stmt, GSI_NEW_STMT); > + latch_edge->dest->loop_father = current_loops->tree_root; > + set_immediate_dominator (CDI_DOMINATORS, latch_edge->dest, > latch_edge->src); > + latch_edge->probability = 0; > + latch_edge->count = 0; > + latch_edge->flags |= flags; > + latch_edge->goto_locus = locus; > + } > > if (dump_file && (dump_flags & TDF_DETAILS)) > - fprintf (dump_file, "Unrolled loop %d completely.\n", loop->num); > + { > + if (!n_unroll) > + fprintf (dump_file, "Turned loop %d to non-loop; it never > loops.%s\n", num, > + last_iteration_updated > + ? " Last iteration exit edge was proved true." : ""); > + else > + { > + fprintf (dump_file, "Unrolled loop %d completely " > + "(duplicated %i times).%s%s\n", num, (int)n_unroll, > + exit > + ? " Exit condition of peeled iterations was eliminated." : > "", > + last_iteration_updated > + ? " Last iteration exit edge was proved true." : ""); > + } > + } > > - return true; > + return true; > } > > /* Adds a canonical induction variable to LOOP if suitable. > CREATE_IV is true if we may create a new iv. UL determines > which loops we are allowed to completely unroll. If TRY_EVAL is true, we > try > to determine the number of iterations of a loop by direct evaluation. > - Returns true if cfg is changed. */ > + Returns true if cfg is changed. > + > + IRREDUCIBLE_LOOPS_FOUND is used to bookkeep if we discovered irreducible > + loops. This is used in a special case of the exit condition analysis. */ > > static bool > canonicalize_loop_induction_variables (struct loop *loop, > @@ -455,22 +602,34 @@ canonicalize_loop_induction_variables (s > || TREE_CODE (niter) != INTEGER_CST)) > niter = find_loop_niter_by_eval (loop, &exit); > > - if (chrec_contains_undetermined (niter) > - || TREE_CODE (niter) != INTEGER_CST) > - return false; > + if (TREE_CODE (niter) != INTEGER_CST) > + exit = NULL; > } > > - if (dump_file && (dump_flags & TDF_DETAILS)) > + /* We work exceptionally hard here to estimate the bound > + by find_loop_niter_by_eval. Be sure to keep it for future. */ > + if (niter && TREE_CODE (niter) == INTEGER_CST) > + record_niter_bound (loop, tree_to_double_int (niter), false, true); > + > + if (dump_file && (dump_flags & TDF_DETAILS) > + && TREE_CODE (niter) == INTEGER_CST) > { > fprintf (dump_file, "Loop %d iterates ", loop->num); > print_generic_expr (dump_file, niter, TDF_SLIM); > fprintf (dump_file, " times.\n"); > } > + if (dump_file && (dump_flags & TDF_DETAILS) > + && max_loop_iterations_int (loop) >= 0) > + { > + fprintf (dump_file, "Loop %d iterates at most %i times.\n", loop->num, > + (int)max_loop_iterations_int (loop)); > + } > > if (try_unroll_loop_completely (loop, exit, niter, ul)) > return true; > > - if (create_iv) > + if (create_iv > + && niter && !chrec_contains_undetermined (niter)) > create_canonical_iv (loop, exit, niter); > > return false; > @@ -483,11 +642,14 @@ unsigned int > canonicalize_induction_variables (void) > { > loop_iterator li; > - struct loop *loop; > + struct loop *loop, *next; > bool changed = false; > > - FOR_EACH_LOOP (li, loop, 0) > + for (fel_init (&li, &next, LI_ONLY_INNERMOST); next;) > { > + loop = next; > + fel_next (&li, &next); > + > changed |= canonicalize_loop_induction_variables (loop, > true, UL_SINGLE_ITER, > true); > @@ -591,14 +753,18 @@ tree_unroll_loops_completely (bool may_i > bool changed; > enum unroll_level ul; > int iteration = 0; > + struct loop *next; > > do > { > changed = false; > > - FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST) > + for (fel_init (&li, &next, 0); next;) > { > - struct loop *loop_father = loop_outer (loop); > + struct loop *loop_father = loop_outer (next); > + > + loop = next; > + fel_next (&li, &next); > > if (may_increase_size && optimize_loop_for_speed_p (loop) > /* Unroll outermost loops only if asked to do so or they do > Index: fortran/f95-lang.c > =================================================================== > --- fortran/f95-lang.c (revision 192360) > +++ fortran/f95-lang.c (working copy) > @@ -908,6 +908,11 @@ gfc_init_builtin_functions (void) > gfc_define_builtin ("__builtin_expect", ftype, BUILT_IN_EXPECT, > "__builtin_expect", ATTR_CONST_NOTHROW_LEAF_LIST); > > + ftype = build_function_type_list (void_type_node, NULL_TREE); > + > + gfc_define_builtin ("__builtin_unreachable", ftype, BUILT_IN_EXPECT, > + "__builtin_unreachable", ATTR_NORETURN_NOTHROW_LEAF_LIST); > + > ftype = build_function_type_list (void_type_node, > pvoid_type_node, NULL_TREE); > gfc_define_builtin ("__builtin_free", ftype, BUILT_IN_FREE, > Index: cfgloop.h > =================================================================== > --- cfgloop.h (revision 192360) > +++ cfgloop.h (working copy) > @@ -319,7 +319,8 @@ extern struct loop *loopify (edge, edge, > struct loop * loop_version (struct loop *, void *, > basic_block *, unsigned, unsigned, unsigned, bool); > extern bool remove_path (edge); > -void scale_loop_frequencies (struct loop *, int, int); > +extern void scale_loop_frequencies (struct loop *, int, int); > +extern void unloop (struct loop *, bool *); > > /* Induction variable analysis. */ > > > -- Richard Biener <rguent...@suse.de> SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend