On Tue, 16 Oct 2012, Jan Hubicka wrote: > Hi, > here is third revised version of the complette unroling changes. While > working > on the RTL variant I noticed PR54937 and the fact that I was overly aggressive > on forcing single exit of the last iteration to be taken, because loop may > terminate > otherwise (by EH or by exitting the program). Same thinko is in loop-niter. > > This patch adds loop_edge_to_cancel that is more conservative: it looks for > the > exit conditional where the non-exitting edges leads to latch and verifies that > latch contains no statement with side effect that may terminate the loop. > This still actually matches quite few non-single-exit loops and works well in > practice. > > Unlike previous revision it also enables complette unrolling when code size > does not grow even for non-innermost loops (with update in > tree_unroll_loops_completely to walk them). This is something we did on RTL > land but missed in trees. This actually enables quite some optimizations when > things can be propagated to the tiny inner loop body. > > I also fixed accounting in tree_estimate_loop_size for the cases where last > iteration is not going to be updated. > > Finally I added code constructing __bulitin_unreachable as suggested by > Ian. > > Bootstrapped/regtested x86_64-linux, also bootstrapped with -O3 and -Werror > disabled and benchmarked. Among best benefits is about 7% improvement on > Applu, > and it causes up to 15% improvements on vectorized loops with small iteration > counts (by completelly peeling the precondition code). There are no real > performance regressions but there is some code size bloat. > > I plan to followup with strenghtening the heuristic to disable unrolling when > benefits are absymal. Easy is to limit unrolling on loops with CFG and/or > calls in them. We already have quite informed analysis in place. I also plan > to move simple FDO guided loop peeling from RTL level to trees to enable more > propagation into peeled sequences. > > The patch also triggers bug in niter and requires xfailing do_1.f90 testcase. > I filled PR 54932 to track this. > > There are also confused array bound warnings I hope to track incrementally, > too, > by recording statements that are known to become unreachable in the last > iteration and adding __buitin_unreachable in front of them. This is also > important to avoid duplication leading to dead code: no other optimizers > force paths leading to out of bound accesses to not happen.
Quite a large patch ... it could have been split into bugfixes and enhancements ;) Still - ok! Thanks, Richard. > Honza > > > * tree-ssa-loop-ivcanon.c (tree_estimate_loop_size): Add edge_to_cancel > parameter and use it to estimate code optimized out in the final > iteration. > (loop_edge_to_cancel): New function. > (try_unroll_loop_completely): New IRRED_IVALIDATED parameter; > handle unrolling loops with bounds given via max_loop_iteratins; > handle unrolling non-inner loops when code size shrinks; > tidy dump output; when the last iteration loop still stays > as loop in the CFG forcongly redirect the latch to > __builtin_unreachable. > (canonicalize_loop_induction_variables): Add irred_invlaidated > parameter; record niter bound derrived; dump > max_loop_iterations bounds; call try_unroll_loop_completely > even if no niter bound is given. > (canonicalize_induction_variables): Handle irred_invalidated. > (tree_unroll_loops_completely): Handle non-innermost loops; > handle irred_invalidated. > * cfgloop.h (unlop): Declare. > * cfgloopmanip.c (unloop): Export. > * tree.c (build_common_builtin_nodes): Build BULTIN_UNREACHABLE. > > * gcc.target/i386/l_fma_float_?.c: Update. > * gcc.target/i386/l_fma_double_?.c: Update. > * gfortran.dg/do_1.f90: XFAIL > * gcc.dg/tree-ssa/cunroll-1.c: New testcase. > * gcc.dg/tree-ssa/cunroll-2.c: New testcase. > * gcc.dg/tree-ssa/cunroll-3.c: New testcase. > * gcc.dg/tree-ssa/cunroll-4.c: New testcase. > * gcc.dg/tree-ssa/cunroll-5.c: New testcase. > * gcc.dg/tree-ssa/ldist-17.c: Block cunroll to make testcase still > valid. > Index: tree-ssa-loop-ivcanon.c > =================================================================== > --- tree-ssa-loop-ivcanon.c (revision 192483) > +++ tree-ssa-loop-ivcanon.c (working copy) > @@ -192,7 +192,7 @@ constant_after_peeling (tree op, gimple > Return results in SIZE, estimate benefits for complete unrolling exiting > by EXIT. */ > > static void > -tree_estimate_loop_size (struct loop *loop, edge exit, struct loop_size > *size) > +tree_estimate_loop_size (struct loop *loop, edge exit, edge edge_to_cancel, > struct loop_size *size) > { > basic_block *body = get_loop_body (loop); > gimple_stmt_iterator gsi; > @@ -208,8 +208,8 @@ tree_estimate_loop_size (struct loop *lo > fprintf (dump_file, "Estimating sizes for loop %i\n", loop->num); > for (i = 0; i < loop->num_nodes; i++) > { > - if (exit && body[i] != exit->src > - && dominated_by_p (CDI_DOMINATORS, body[i], exit->src)) > + if (edge_to_cancel && body[i] != edge_to_cancel->src > + && dominated_by_p (CDI_DOMINATORS, body[i], edge_to_cancel->src)) > after_exit = true; > else > after_exit = false; > @@ -231,7 +231,7 @@ tree_estimate_loop_size (struct loop *lo > /* Look for reasons why we might optimize this stmt away. */ > > /* Exit conditional. */ > - if (body[i] == exit->src && stmt == last_stmt (exit->src)) > + if (exit && body[i] == exit->src && stmt == last_stmt (exit->src)) > { > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, " Exit condition will be eliminated.\n"); > @@ -314,36 +314,161 @@ estimated_unrolled_size (struct loop_siz > return unr_insns; > } > > +/* Loop LOOP is known to not loop. See if there is an edge in the loop > + body that can be remove to make the loop to always exit and at > + the same time it does not make any code potentially executed > + during the last iteration dead. > + > + After complette unrolling we still may get rid of the conditional > + on the exit in the last copy even if we have no idea what it does. > + This is quite common case for loops of form > + > + int a[5]; > + for (i=0;i<b;i++) > + a[i]=0; > + > + Here we prove the loop to iterate 5 times but we do not know > + it from induction variable. > + > + For now we handle only simple case where there is exit condition > + just before the latch block and the latch block contains no statements > + with side effect that may otherwise terminate the execution of loop > + (such as by EH or by terminating the program or longjmp). > + > + In the general case we may want to cancel the paths leading to statements > + loop-niter identified as having undefined effect in the last iteration. > + The other cases are hopefully rare and will be cleaned up later. */ > + > +edge > +loop_edge_to_cancel (struct loop *loop) > +{ > + VEC (edge, heap) *exits; > + unsigned i; > + edge edge_to_cancel; > + gimple_stmt_iterator gsi; > + > + /* We want only one predecestor of the loop. */ > + if (EDGE_COUNT (loop->latch->preds) > 1) > + return NULL; > + > + exits = get_loop_exit_edges (loop); > + > + FOR_EACH_VEC_ELT (edge, exits, i, edge_to_cancel) > + { > + /* Find the other edge than the loop exit > + leaving the conditoinal. */ > + if (EDGE_COUNT (edge_to_cancel->src->succs) != 2) > + continue; > + if (EDGE_SUCC (edge_to_cancel->src, 0) == edge_to_cancel) > + edge_to_cancel = EDGE_SUCC (edge_to_cancel->src, 1); > + else > + edge_to_cancel = EDGE_SUCC (edge_to_cancel->src, 0); > + > + /* We should never have conditionals in the loop latch. */ > + gcc_assert (edge_to_cancel->dest != loop->header); > + > + /* Check that it leads to loop latch. */ > + if (edge_to_cancel->dest != loop->latch) > + continue; > + > + VEC_free (edge, heap, exits); > + > + /* Verify that the code in loop latch does nothing that may end program > + execution without really reaching the exit. This may include > + non-pure/const function calls, EH statements, volatile ASMs etc. */ > + for (gsi = gsi_start_bb (loop->latch); !gsi_end_p (gsi); gsi_next > (&gsi)) > + if (gimple_has_side_effects (gsi_stmt (gsi))) > + return NULL; > + return edge_to_cancel; > + } > + VEC_free (edge, heap, exits); > + return NULL; > +} > + > /* Tries to unroll LOOP completely, i.e. NITER times. > UL determines which loops we are allowed to unroll. > - EXIT is the exit of the loop that should be eliminated. */ > + EXIT is the exit of the loop that should be eliminated. > + IRRED_INVALIDATED is used to bookkeep if information about > + irreducible regions may become invalid as a result > + of the transformation. */ > > static bool > try_unroll_loop_completely (struct loop *loop, > edge exit, tree niter, > - enum unroll_level ul) > + enum unroll_level ul, > + bool *irred_invalidated) > { > unsigned HOST_WIDE_INT n_unroll, ninsns, max_unroll, unr_insns; > gimple cond; > struct loop_size size; > + bool n_unroll_found = false; > + HOST_WIDE_INT maxiter; > + basic_block latch; > + edge latch_edge; > + location_t locus; > + int flags; > + gimple stmt; > + gimple_stmt_iterator gsi; > + edge edge_to_cancel = NULL; > + int num = loop->num; > > - if (loop->inner) > - return false; > + /* See if we proved number of iterations to be low constant. > > - if (!host_integerp (niter, 1)) > + EXIT is an edge that will be removed in all but last iteration of > + the loop. > + > + EDGE_TO_CACNEL is an edge that will be removed from the last iteration > + of the unrolled sequence and is expected to make the final loop not > + rolling. > + > + If the number of execution of loop is determined by standard induction > + variable test, then EXIT and EDGE_TO_CANCEL are the two edges leaving > + from the iv test. */ > + if (host_integerp (niter, 1)) > + { > + n_unroll = tree_low_cst (niter, 1); > + n_unroll_found = true; > + edge_to_cancel = EDGE_SUCC (exit->src, 0); > + if (edge_to_cancel == exit) > + edge_to_cancel = EDGE_SUCC (exit->src, 1); > + } > + /* We do not know the number of iterations and thus we can not eliminate > + the EXIT edge. */ > + else > + exit = NULL; > + > + /* See if we can improve our estimate by using recorded loop bounds. */ > + maxiter = max_loop_iterations_int (loop); > + if (maxiter >= 0 > + && (!n_unroll_found || (unsigned HOST_WIDE_INT)maxiter < n_unroll)) > + { > + n_unroll = maxiter; > + n_unroll_found = true; > + /* Loop terminates before the IV variable test, so we can not > + remove it in the last iteration. */ > + edge_to_cancel = NULL; > + } > + > + if (!n_unroll_found) > return false; > - n_unroll = tree_low_cst (niter, 1); > > max_unroll = PARAM_VALUE (PARAM_MAX_COMPLETELY_PEEL_TIMES); > if (n_unroll > max_unroll) > return false; > > + if (!edge_to_cancel) > + edge_to_cancel = loop_edge_to_cancel (loop); > + > if (n_unroll) > { > + sbitmap wont_exit; > + edge e; > + unsigned i; > + VEC (edge, heap) *to_remove = NULL; > if (ul == UL_SINGLE_ITER) > return false; > > - tree_estimate_loop_size (loop, exit, &size); > + tree_estimate_loop_size (loop, exit, edge_to_cancel, &size); > ninsns = size.overall; > > unr_insns = estimated_unrolled_size (&size, n_unroll); > @@ -354,6 +479,18 @@ try_unroll_loop_completely (struct loop > (int) unr_insns); > } > > + /* We unroll only inner loops, because we do not consider it profitable > + otheriwse. We still can cancel loopback edge of not rolling loop; > + this is always a good idea. */ > + if (loop->inner && unr_insns > ninsns) > + { > + if (dump_file && (dump_flags & TDF_DETAILS)) > + fprintf (dump_file, "Not unrolling loop %d:" > + "it is not innermost and code would grow.\n", > + loop->num); > + return false; > + } > + > if (unr_insns > ninsns > && (unr_insns > > (unsigned) PARAM_VALUE (PARAM_MAX_COMPLETELY_PEELED_INSNS))) > @@ -369,17 +506,10 @@ try_unroll_loop_completely (struct loop > && unr_insns > ninsns) > { > if (dump_file && (dump_flags & TDF_DETAILS)) > - fprintf (dump_file, "Not unrolling loop %d.\n", loop->num); > + fprintf (dump_file, "Not unrolling loop %d: size would grow.\n", > + loop->num); > return false; > } > - } > - > - if (n_unroll) > - { > - sbitmap wont_exit; > - edge e; > - unsigned i; > - VEC (edge, heap) *to_remove = NULL; > > initialize_original_copy_tables (); > wont_exit = sbitmap_alloc (n_unroll + 1); > @@ -408,15 +538,67 @@ try_unroll_loop_completely (struct loop > free_original_copy_tables (); > } > > - cond = last_stmt (exit->src); > - if (exit->flags & EDGE_TRUE_VALUE) > - gimple_cond_make_true (cond); > - else > - gimple_cond_make_false (cond); > - update_stmt (cond); > + /* Remove the conditional from the last copy of the loop. */ > + if (edge_to_cancel) > + { > + cond = last_stmt (edge_to_cancel->src); > + if (edge_to_cancel->flags & EDGE_TRUE_VALUE) > + gimple_cond_make_false (cond); > + else > + gimple_cond_make_true (cond); > + update_stmt (cond); > + /* Do not remove the path. Doing so may remove outer loop > + and confuse bookkeeping code in tree_unroll_loops_completelly. */ > + } > + /* We did not manage to cancel the loop. > + The loop latch remains reachable even if it will never be reached > + at runtime. We must redirect it to somewhere, so create basic > + block containg __builtin_unreachable call for this reason. */ > + else > + { > + latch = loop->latch; > + latch_edge = loop_latch_edge (loop); > + flags = latch_edge->flags; > + locus = latch_edge->goto_locus; > + > + /* Unloop destroys the latch edge. */ > + unloop (loop, irred_invalidated); > + > + /* Create new basic block for the latch edge destination and wire > + it in. */ > + stmt = gimple_build_call (builtin_decl_implicit > (BUILT_IN_UNREACHABLE), 0); > + latch_edge = make_edge (latch, create_basic_block (NULL, NULL, latch), > flags); > + latch_edge->probability = 0; > + latch_edge->count = 0; > + latch_edge->flags |= flags; > + latch_edge->goto_locus = locus; > + > + latch_edge->dest->loop_father = current_loops->tree_root; > + latch_edge->dest->count = 0; > + latch_edge->dest->frequency = 0; > + set_immediate_dominator (CDI_DOMINATORS, latch_edge->dest, > latch_edge->src); > + > + gsi = gsi_start_bb (latch_edge->dest); > + gsi_insert_after (&gsi, stmt, GSI_NEW_STMT); > + } > > if (dump_file && (dump_flags & TDF_DETAILS)) > - fprintf (dump_file, "Unrolled loop %d completely.\n", loop->num); > + { > + if (!n_unroll) > + fprintf (dump_file, "Turned loop %d to non-loop; it never loops.\n", > + num); > + else > + fprintf (dump_file, "Unrolled loop %d completely " > + "(duplicated %i times).\n", num, (int)n_unroll); > + if (exit) > + fprintf (dump_file, "Exit condition of peeled iterations was " > + "eliminated.\n"); > + if (edge_to_cancel) > + fprintf (dump_file, "Last iteration exit edge was proved true.\n"); > + else > + fprintf (dump_file, "Latch of last iteration was marked by " > + "__builtin_unreachable ().\n"); > + } > > return true; > } > @@ -425,12 +608,15 @@ try_unroll_loop_completely (struct loop > CREATE_IV is true if we may create a new iv. UL determines > which loops we are allowed to completely unroll. If TRY_EVAL is true, we > try > to determine the number of iterations of a loop by direct evaluation. > - Returns true if cfg is changed. */ > + Returns true if cfg is changed. > + > + IRRED_INVALIDATED is used to keep if irreducible reginos needs to be > recomputed. */ > > static bool > canonicalize_loop_induction_variables (struct loop *loop, > bool create_iv, enum unroll_level ul, > - bool try_eval) > + bool try_eval, > + bool *irred_invalidated) > { > edge exit = NULL; > tree niter; > @@ -455,22 +641,34 @@ canonicalize_loop_induction_variables (s > || TREE_CODE (niter) != INTEGER_CST)) > niter = find_loop_niter_by_eval (loop, &exit); > > - if (chrec_contains_undetermined (niter) > - || TREE_CODE (niter) != INTEGER_CST) > - return false; > + if (TREE_CODE (niter) != INTEGER_CST) > + exit = NULL; > } > > - if (dump_file && (dump_flags & TDF_DETAILS)) > + /* We work exceptionally hard here to estimate the bound > + by find_loop_niter_by_eval. Be sure to keep it for future. */ > + if (niter && TREE_CODE (niter) == INTEGER_CST) > + record_niter_bound (loop, tree_to_double_int (niter), false, true); > + > + if (dump_file && (dump_flags & TDF_DETAILS) > + && TREE_CODE (niter) == INTEGER_CST) > { > fprintf (dump_file, "Loop %d iterates ", loop->num); > print_generic_expr (dump_file, niter, TDF_SLIM); > fprintf (dump_file, " times.\n"); > } > + if (dump_file && (dump_flags & TDF_DETAILS) > + && max_loop_iterations_int (loop) >= 0) > + { > + fprintf (dump_file, "Loop %d iterates at most %i times.\n", loop->num, > + (int)max_loop_iterations_int (loop)); > + } > > - if (try_unroll_loop_completely (loop, exit, niter, ul)) > + if (try_unroll_loop_completely (loop, exit, niter, ul, irred_invalidated)) > return true; > > - if (create_iv) > + if (create_iv > + && niter && !chrec_contains_undetermined (niter)) > create_canonical_iv (loop, exit, niter); > > return false; > @@ -485,15 +683,21 @@ canonicalize_induction_variables (void) > loop_iterator li; > struct loop *loop; > bool changed = false; > + bool irred_invalidated = false; > > FOR_EACH_LOOP (li, loop, 0) > { > changed |= canonicalize_loop_induction_variables (loop, > true, UL_SINGLE_ITER, > - true); > + true, > + &irred_invalidated); > } > gcc_assert (!need_ssa_update_p (cfun)); > > + if (irred_invalidated > + && loops_state_satisfies_p (LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS)) > + mark_irreducible_loops (); > + > /* Clean up the information about numbers of iterations, since brute force > evaluation could reveal new information. */ > scev_reset (); > @@ -594,9 +798,10 @@ tree_unroll_loops_completely (bool may_i > > do > { > + bool irred_invalidated = false; > changed = false; > > - FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST) > + FOR_EACH_LOOP (li, loop, 0) > { > struct loop *loop_father = loop_outer (loop); > > @@ -609,7 +814,8 @@ tree_unroll_loops_completely (bool may_i > ul = UL_NO_GROWTH; > > if (canonicalize_loop_induction_variables (loop, false, ul, > - !flag_tree_loop_ivcanon)) > + !flag_tree_loop_ivcanon, > + &irred_invalidated)) > { > changed = true; > /* If we'll continue unrolling, we need to propagate constants > @@ -629,6 +835,10 @@ tree_unroll_loops_completely (bool may_i > struct loop **iter; > unsigned i; > > + if (irred_invalidated > + && loops_state_satisfies_p > (LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS)) > + mark_irreducible_loops (); > + > update_ssa (TODO_update_ssa); > > /* Propagate the constants within the new basic blocks. */ > Index: tree.c > =================================================================== > --- tree.c (revision 192483) > +++ tree.c (working copy) > @@ -9524,6 +9524,15 @@ build_common_builtin_nodes (void) > tree tmp, ftype; > int ecf_flags; > > + if (!builtin_decl_explicit_p (BUILT_IN_UNREACHABLE)) > + { > + ftype = build_function_type (void_type_node, void_list_node); > + local_define_builtin ("__builtin_unreachable", ftype, > BUILT_IN_UNREACHABLE, > + "__builtin_unreachable", > + ECF_NOTHROW | ECF_LEAF | ECF_NORETURN > + | ECF_CONST | ECF_LEAF); > + } > + > if (!builtin_decl_explicit_p (BUILT_IN_MEMCPY) > || !builtin_decl_explicit_p (BUILT_IN_MEMMOVE)) > { > Index: cfgloop.h > =================================================================== > --- cfgloop.h (revision 192483) > +++ cfgloop.h (working copy) > @@ -320,7 +321,8 @@ extern struct loop *loopify (edge, edge, > struct loop * loop_version (struct loop *, void *, > basic_block *, unsigned, unsigned, unsigned, bool); > extern bool remove_path (edge); > -void scale_loop_frequencies (struct loop *, int, int); > +extern void unloop (struct loop *, bool *); > +extern void scale_loop_frequencies (struct loop *, int, int); > > /* Induction variable analysis. */ > > Index: cfgloopmanip.c > =================================================================== > --- cfgloopmanip.c (revision 192483) > +++ cfgloopmanip.c (working copy) > @@ -37,7 +37,6 @@ static int find_path (edge, basic_block > static void fix_loop_placements (struct loop *, bool *); > static bool fix_bb_placement (basic_block); > static void fix_bb_placements (basic_block, bool *); > -static void unloop (struct loop *, bool *); > > /* Checks whether basic block BB is dominated by DATA. */ > static bool > @@ -895,7 +894,7 @@ loopify (edge latch_edge, edge header_ed > If this may cause the information about irreducible regions to become > invalid, IRRED_INVALIDATED is set to true. */ > > -static void > +void > unloop (struct loop *loop, bool *irred_invalidated) > { > basic_block *body; > Index: testsuite/gcc.target/i386/l_fma_float_5.c > =================================================================== > --- testsuite/gcc.target/i386/l_fma_float_5.c (revision 192483) > +++ testsuite/gcc.target/i386/l_fma_float_5.c (working copy) > @@ -12,7 +12,7 @@ > /* { dg-final { scan-assembler-times "vfmsub132ps" 8 } } */ > /* { dg-final { scan-assembler-times "vfnmadd132ps" 8 } } */ > /* { dg-final { scan-assembler-times "vfnmsub132ps" 8 } } */ > -/* { dg-final { scan-assembler-times "vfmadd132ss" 16 } } */ > -/* { dg-final { scan-assembler-times "vfmsub132ss" 16 } } */ > -/* { dg-final { scan-assembler-times "vfnmadd132ss" 16 } } */ > -/* { dg-final { scan-assembler-times "vfnmsub132ss" 16 } } */ > +/* { dg-final { scan-assembler-times "vfmadd132ss" 72 } } */ > +/* { dg-final { scan-assembler-times "vfmsub132ss" 72 } } */ > +/* { dg-final { scan-assembler-times "vfnmadd132ss" 72 } } */ > +/* { dg-final { scan-assembler-times "vfnmsub132ss" 72 } } */ > Index: testsuite/gcc.target/i386/l_fma_double_4.c > =================================================================== > --- testsuite/gcc.target/i386/l_fma_double_4.c (revision 192483) > +++ testsuite/gcc.target/i386/l_fma_double_4.c (working copy) > @@ -12,7 +12,7 @@ > /* { dg-final { scan-assembler-times "vfmsub132pd" 8 } } */ > /* { dg-final { scan-assembler-times "vfnmadd132pd" 8 } } */ > /* { dg-final { scan-assembler-times "vfnmsub132pd" 8 } } */ > -/* { dg-final { scan-assembler-times "vfmadd132sd" 16 } } */ > -/* { dg-final { scan-assembler-times "vfmsub132sd" 16 } } */ > -/* { dg-final { scan-assembler-times "vfnmadd132sd" 16 } } */ > -/* { dg-final { scan-assembler-times "vfnmsub132sd" 16 } } */ > +/* { dg-final { scan-assembler-times "vfmadd132sd" 40 } } */ > +/* { dg-final { scan-assembler-times "vfmsub132sd" 40 } } */ > +/* { dg-final { scan-assembler-times "vfnmadd132sd" 40 } } */ > +/* { dg-final { scan-assembler-times "vfnmsub132sd" 40 } } */ > Index: testsuite/gcc.target/i386/l_fma_float_2.c > =================================================================== > --- testsuite/gcc.target/i386/l_fma_float_2.c (revision 192483) > +++ testsuite/gcc.target/i386/l_fma_float_2.c (working copy) > @@ -12,7 +12,7 @@ > /* { dg-final { scan-assembler-times "vfmsub132ps" 8 } } */ > /* { dg-final { scan-assembler-times "vfnmadd132ps" 8 } } */ > /* { dg-final { scan-assembler-times "vfnmsub132ps" 8 } } */ > -/* { dg-final { scan-assembler-times "vfmadd132ss" 16 } } */ > -/* { dg-final { scan-assembler-times "vfmsub132ss" 16 } } */ > -/* { dg-final { scan-assembler-times "vfnmadd132ss" 16 } } */ > -/* { dg-final { scan-assembler-times "vfnmsub132ss" 16 } } */ > +/* { dg-final { scan-assembler-times "vfmadd132ss" 72 } } */ > +/* { dg-final { scan-assembler-times "vfmsub132ss" 72 } } */ > +/* { dg-final { scan-assembler-times "vfnmadd132ss" 72 } } */ > +/* { dg-final { scan-assembler-times "vfnmsub132ss" 72 } } */ > Index: testsuite/gcc.target/i386/l_fma_float_6.c > =================================================================== > --- testsuite/gcc.target/i386/l_fma_float_6.c (revision 192483) > +++ testsuite/gcc.target/i386/l_fma_float_6.c (working copy) > @@ -12,7 +12,7 @@ > /* { dg-final { scan-assembler-times "vfmsub132ps" 8 } } */ > /* { dg-final { scan-assembler-times "vfnmadd132ps" 8 } } */ > /* { dg-final { scan-assembler-times "vfnmsub132ps" 8 } } */ > -/* { dg-final { scan-assembler-times "vfmadd132ss" 16 } } */ > -/* { dg-final { scan-assembler-times "vfmsub132ss" 16 } } */ > -/* { dg-final { scan-assembler-times "vfnmadd132ss" 16 } } */ > -/* { dg-final { scan-assembler-times "vfnmsub132ss" 16 } } */ > +/* { dg-final { scan-assembler-times "vfmadd132ss" 72 } } */ > +/* { dg-final { scan-assembler-times "vfmsub132ss" 72 } } */ > +/* { dg-final { scan-assembler-times "vfnmadd132ss" 72 } } */ > +/* { dg-final { scan-assembler-times "vfnmsub132ss" 72 } } */ > Index: testsuite/gcc.target/i386/l_fma_double_1.c > =================================================================== > --- testsuite/gcc.target/i386/l_fma_double_1.c (revision 192483) > +++ testsuite/gcc.target/i386/l_fma_double_1.c (working copy) > @@ -16,11 +16,11 @@ > /* { dg-final { scan-assembler-times "vfnmadd231pd" 4 } } */ > /* { dg-final { scan-assembler-times "vfnmsub132pd" 4 } } */ > /* { dg-final { scan-assembler-times "vfnmsub231pd" 4 } } */ > -/* { dg-final { scan-assembler-times "vfmadd132sd" 8 } } */ > -/* { dg-final { scan-assembler-times "vfmadd213sd" 8 } } */ > -/* { dg-final { scan-assembler-times "vfmsub132sd" 8 } } */ > -/* { dg-final { scan-assembler-times "vfmsub213sd" 8 } } */ > -/* { dg-final { scan-assembler-times "vfnmadd132sd" 8 } } */ > -/* { dg-final { scan-assembler-times "vfnmadd213sd" 8 } } */ > -/* { dg-final { scan-assembler-times "vfnmsub132sd" 8 } } */ > -/* { dg-final { scan-assembler-times "vfnmsub213sd" 8 } } */ > +/* { dg-final { scan-assembler-times "vfmadd132sd" 20 } } */ > +/* { dg-final { scan-assembler-times "vfmadd213sd" 20 } } */ > +/* { dg-final { scan-assembler-times "vfmsub132sd" 20 } } */ > +/* { dg-final { scan-assembler-times "vfmsub213sd" 20 } } */ > +/* { dg-final { scan-assembler-times "vfnmadd132sd" 20 } } */ > +/* { dg-final { scan-assembler-times "vfnmadd213sd" 20 } } */ > +/* { dg-final { scan-assembler-times "vfnmsub132sd" 20 } } */ > +/* { dg-final { scan-assembler-times "vfnmsub213sd" 20 } } */ > Index: testsuite/gcc.target/i386/l_fma_double_5.c > =================================================================== > --- testsuite/gcc.target/i386/l_fma_double_5.c (revision 192483) > +++ testsuite/gcc.target/i386/l_fma_double_5.c (working copy) > @@ -12,7 +12,7 @@ > /* { dg-final { scan-assembler-times "vfmsub132pd" 8 } } */ > /* { dg-final { scan-assembler-times "vfnmadd132pd" 8 } } */ > /* { dg-final { scan-assembler-times "vfnmsub132pd" 8 } } */ > -/* { dg-final { scan-assembler-times "vfmadd132sd" 16 } } */ > -/* { dg-final { scan-assembler-times "vfmsub132sd" 16 } } */ > -/* { dg-final { scan-assembler-times "vfnmadd132sd" 16 } } */ > -/* { dg-final { scan-assembler-times "vfnmsub132sd" 16 } } */ > +/* { dg-final { scan-assembler-times "vfmadd132sd" 40 } } */ > +/* { dg-final { scan-assembler-times "vfmsub132sd" 40 } } */ > +/* { dg-final { scan-assembler-times "vfnmadd132sd" 40 } } */ > +/* { dg-final { scan-assembler-times "vfnmsub132sd" 40 } } */ > Index: testsuite/gcc.target/i386/l_fma_float_3.c > =================================================================== > --- testsuite/gcc.target/i386/l_fma_float_3.c (revision 192483) > +++ testsuite/gcc.target/i386/l_fma_float_3.c (working copy) > @@ -16,11 +16,11 @@ > /* { dg-final { scan-assembler-times "vfnmadd231ps" 4 } } */ > /* { dg-final { scan-assembler-times "vfnmsub132ps" 4 } } */ > /* { dg-final { scan-assembler-times "vfnmsub231ps" 4 } } */ > -/* { dg-final { scan-assembler-times "vfmadd132ss" 8 } } */ > -/* { dg-final { scan-assembler-times "vfmadd213ss" 8 } } */ > -/* { dg-final { scan-assembler-times "vfmsub132ss" 8 } } */ > -/* { dg-final { scan-assembler-times "vfmsub213ss" 8 } } */ > -/* { dg-final { scan-assembler-times "vfnmadd132ss" 8 } } */ > -/* { dg-final { scan-assembler-times "vfnmadd213ss" 8 } } */ > -/* { dg-final { scan-assembler-times "vfnmsub132ss" 8 } } */ > -/* { dg-final { scan-assembler-times "vfnmsub213ss" 8 } } */ > +/* { dg-final { scan-assembler-times "vfmadd132ss" 36 } } */ > +/* { dg-final { scan-assembler-times "vfmadd213ss" 36 } } */ > +/* { dg-final { scan-assembler-times "vfmsub132ss" 36 } } */ > +/* { dg-final { scan-assembler-times "vfmsub213ss" 36 } } */ > +/* { dg-final { scan-assembler-times "vfnmadd132ss" 36 } } */ > +/* { dg-final { scan-assembler-times "vfnmadd213ss" 36 } } */ > +/* { dg-final { scan-assembler-times "vfnmsub132ss" 36 } } */ > +/* { dg-final { scan-assembler-times "vfnmsub213ss" 36 } } */ > Index: testsuite/gcc.target/i386/l_fma_double_2.c > =================================================================== > --- testsuite/gcc.target/i386/l_fma_double_2.c (revision 192483) > +++ testsuite/gcc.target/i386/l_fma_double_2.c (working copy) > @@ -12,7 +12,7 @@ > /* { dg-final { scan-assembler-times "vfmsub132pd" 8 } } */ > /* { dg-final { scan-assembler-times "vfnmadd132pd" 8 } } */ > /* { dg-final { scan-assembler-times "vfnmsub132pd" 8 } } */ > -/* { dg-final { scan-assembler-times "vfmadd132sd" 16 } } */ > -/* { dg-final { scan-assembler-times "vfmsub132sd" 16 } } */ > -/* { dg-final { scan-assembler-times "vfnmadd132sd" 16 } } */ > -/* { dg-final { scan-assembler-times "vfnmsub132sd" 16 } } */ > +/* { dg-final { scan-assembler-times "vfmadd132sd" 40 } } */ > +/* { dg-final { scan-assembler-times "vfmsub132sd" 40 } } */ > +/* { dg-final { scan-assembler-times "vfnmadd132sd" 40 } } */ > +/* { dg-final { scan-assembler-times "vfnmsub132sd" 40 } } */ > Index: testsuite/gcc.target/i386/l_fma_double_6.c > =================================================================== > --- testsuite/gcc.target/i386/l_fma_double_6.c (revision 192483) > +++ testsuite/gcc.target/i386/l_fma_double_6.c (working copy) > @@ -12,7 +12,7 @@ > /* { dg-final { scan-assembler-times "vfmsub132pd" 8 } } */ > /* { dg-final { scan-assembler-times "vfnmadd132pd" 8 } } */ > /* { dg-final { scan-assembler-times "vfnmsub132pd" 8 } } */ > -/* { dg-final { scan-assembler-times "vfmadd132sd" 16 } } */ > -/* { dg-final { scan-assembler-times "vfmsub132sd" 16 } } */ > -/* { dg-final { scan-assembler-times "vfnmadd132sd" 16 } } */ > -/* { dg-final { scan-assembler-times "vfnmsub132sd" 16 } } */ > +/* { dg-final { scan-assembler-times "vfmadd132sd" 40 } } */ > +/* { dg-final { scan-assembler-times "vfmsub132sd" 40 } } */ > +/* { dg-final { scan-assembler-times "vfnmadd132sd" 40 } } */ > +/* { dg-final { scan-assembler-times "vfnmsub132sd" 40 } } */ > Index: testsuite/gcc.target/i386/l_fma_float_4.c > =================================================================== > --- testsuite/gcc.target/i386/l_fma_float_4.c (revision 192483) > +++ testsuite/gcc.target/i386/l_fma_float_4.c (working copy) > @@ -12,7 +12,7 @@ > /* { dg-final { scan-assembler-times "vfmsub132ps" 8 } } */ > /* { dg-final { scan-assembler-times "vfnmadd132ps" 8 } } */ > /* { dg-final { scan-assembler-times "vfnmsub132ps" 8 } } */ > -/* { dg-final { scan-assembler-times "vfmadd132ss" 16 } } */ > -/* { dg-final { scan-assembler-times "vfmsub132ss" 16 } } */ > -/* { dg-final { scan-assembler-times "vfnmadd132ss" 16 } } */ > -/* { dg-final { scan-assembler-times "vfnmsub132ss" 16 } } */ > +/* { dg-final { scan-assembler-times "vfmadd132ss" 72 } } */ > +/* { dg-final { scan-assembler-times "vfmsub132ss" 72 } } */ > +/* { dg-final { scan-assembler-times "vfnmadd132ss" 72 } } */ > +/* { dg-final { scan-assembler-times "vfnmsub132ss" 72 } } */ > Index: testsuite/gcc.target/i386/l_fma_double_3.c > =================================================================== > --- testsuite/gcc.target/i386/l_fma_double_3.c (revision 192483) > +++ testsuite/gcc.target/i386/l_fma_double_3.c (working copy) > @@ -16,11 +16,11 @@ > /* { dg-final { scan-assembler-times "vfnmadd231pd" 4 } } */ > /* { dg-final { scan-assembler-times "vfnmsub132pd" 4 } } */ > /* { dg-final { scan-assembler-times "vfnmsub231pd" 4 } } */ > -/* { dg-final { scan-assembler-times "vfmadd132sd" 8 } } */ > -/* { dg-final { scan-assembler-times "vfmadd213sd" 8 } } */ > -/* { dg-final { scan-assembler-times "vfmsub132sd" 8 } } */ > -/* { dg-final { scan-assembler-times "vfmsub213sd" 8 } } */ > -/* { dg-final { scan-assembler-times "vfnmadd132sd" 8 } } */ > -/* { dg-final { scan-assembler-times "vfnmadd213sd" 8 } } */ > -/* { dg-final { scan-assembler-times "vfnmsub132sd" 8 } } */ > -/* { dg-final { scan-assembler-times "vfnmsub213sd" 8 } } */ > +/* { dg-final { scan-assembler-times "vfmadd132sd" 20 } } */ > +/* { dg-final { scan-assembler-times "vfmadd213sd" 20 } } */ > +/* { dg-final { scan-assembler-times "vfmsub132sd" 20 } } */ > +/* { dg-final { scan-assembler-times "vfmsub213sd" 20 } } */ > +/* { dg-final { scan-assembler-times "vfnmadd132sd" 20 } } */ > +/* { dg-final { scan-assembler-times "vfnmadd213sd" 20 } } */ > +/* { dg-final { scan-assembler-times "vfnmsub132sd" 20 } } */ > +/* { dg-final { scan-assembler-times "vfnmsub213sd" 20 } } */ > Index: testsuite/gcc.target/i386/l_fma_float_1.c > =================================================================== > --- testsuite/gcc.target/i386/l_fma_float_1.c (revision 192483) > +++ testsuite/gcc.target/i386/l_fma_float_1.c (working copy) > @@ -16,11 +16,11 @@ > /* { dg-final { scan-assembler-times "vfnmadd231ps" 4 } } */ > /* { dg-final { scan-assembler-times "vfnmsub132ps" 4 } } */ > /* { dg-final { scan-assembler-times "vfnmsub231ps" 4 } } */ > -/* { dg-final { scan-assembler-times "vfmadd132ss" 8 } } */ > -/* { dg-final { scan-assembler-times "vfmadd213ss" 8 } } */ > -/* { dg-final { scan-assembler-times "vfmsub132ss" 8 } } */ > -/* { dg-final { scan-assembler-times "vfmsub213ss" 8 } } */ > -/* { dg-final { scan-assembler-times "vfnmadd132ss" 8 } } */ > -/* { dg-final { scan-assembler-times "vfnmadd213ss" 8 } } */ > -/* { dg-final { scan-assembler-times "vfnmsub132ss" 8 } } */ > -/* { dg-final { scan-assembler-times "vfnmsub213ss" 8 } } */ > +/* { dg-final { scan-assembler-times "vfmadd132ss" 36 } } */ > +/* { dg-final { scan-assembler-times "vfmadd213ss" 36 } } */ > +/* { dg-final { scan-assembler-times "vfmsub132ss" 36 } } */ > +/* { dg-final { scan-assembler-times "vfmsub213ss" 36 } } */ > +/* { dg-final { scan-assembler-times "vfnmadd132ss" 36 } } */ > +/* { dg-final { scan-assembler-times "vfnmadd213ss" 36 } } */ > +/* { dg-final { scan-assembler-times "vfnmsub132ss" 36 } } */ > +/* { dg-final { scan-assembler-times "vfnmsub213ss" 36 } } */ > Index: testsuite/gfortran.dg/do_1.f90 > =================================================================== > --- testsuite/gfortran.dg/do_1.f90 (revision 192483) > +++ testsuite/gfortran.dg/do_1.f90 (working copy) > @@ -1,4 +1,5 @@ > -! { dg-do run } > +! { dg-do run { xfail *-*-* } } > +! XFAIL is tracked in PR 54932 > ! Program to check corner cases for DO statements. > program do_1 > implicit none > Index: testsuite/gcc.dg/tree-ssa/cunroll-1.c > =================================================================== > --- testsuite/gcc.dg/tree-ssa/cunroll-1.c (revision 0) > +++ testsuite/gcc.dg/tree-ssa/cunroll-1.c (revision 0) > @@ -0,0 +1,13 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */ > +int a[2]; > +test(int c) > +{ > + int i; > + for (i=0;i<c;i++) > + a[i]=5; > +} > +/* Array bounds says the loop will not roll much. */ > +/* { dg-final { scan-tree-dump "Unrolled loop 1 completely .duplicated 1 > times.." "cunroll"} } */ > +/* { dg-final { scan-tree-dump "Last iteration exit edge was proved true." > "cunroll"} } */ > +/* { dg-final { cleanup-tree-dump "cunroll" } } */ > Index: testsuite/gcc.dg/tree-ssa/cunroll-2.c > =================================================================== > --- testsuite/gcc.dg/tree-ssa/cunroll-2.c (revision 0) > +++ testsuite/gcc.dg/tree-ssa/cunroll-2.c (revision 0) > @@ -0,0 +1,16 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */ > +int a[2]; > +test(int c) > +{ > + int i; > + for (i=0;i<c;i++) > + { > + a[i]=5; > + if (test2()) > + return; > + } > +} > +/* We are not able to get rid of the final conditional because the loop has > two exits. */ > +/* { dg-final { scan-tree-dump "Unrolled loop 1 completely .duplicated 2 > times.." "cunroll"} } */ > +/* { dg-final { cleanup-tree-dump "cunroll" } } */ > Index: testsuite/gcc.dg/tree-ssa/cunroll-3.c > =================================================================== > --- testsuite/gcc.dg/tree-ssa/cunroll-3.c (revision 0) > +++ testsuite/gcc.dg/tree-ssa/cunroll-3.c (revision 0) > @@ -0,0 +1,15 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */ > +int a[1]; > +test(int c) > +{ > + int i; > + for (i=0;i<c;i++) > + { > + a[i]=5; > + } > +} > +/* If we start duplicating headers prior curoll, this loop will have 0 > iterations. */ > + > +/* { dg-final { scan-tree-dump "Unrolled loop 1 completely .duplicated 1 > times.." "cunrolli"} } */ > +/* { dg-final { cleanup-tree-dump "cunrolli" } } */ > Index: testsuite/gcc.dg/tree-ssa/cunroll-4.c > =================================================================== > --- testsuite/gcc.dg/tree-ssa/cunroll-4.c (revision 0) > +++ testsuite/gcc.dg/tree-ssa/cunroll-4.c (revision 0) > @@ -0,0 +1,21 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */ > +int a[1]; > +test(int c) > +{ > + int i=0,j; > + for (i=0;i<c;i++) > + { > + for (j=0;j<c;j++) > + { > + a[i]=5; > + test2(); > + } > + } > +} > + > +/* We should do this as part of cunrolli, but our cost model do not take > into account early exit > + from the last iteration. */ > +/* { dg-final { scan-tree-dump "Turned loop 1 to non-loop; it never loops." > "cunrolli"} } */ > +/* { dg-final { scan-tree-dump "Last iteration exit edge was proved true." > "cunrolli"} } */ > +/* { dg-final { cleanup-tree-dump "cunroll" } } */ > Index: testsuite/gcc.dg/tree-ssa/cunroll-5.c > =================================================================== > --- testsuite/gcc.dg/tree-ssa/cunroll-5.c (revision 0) > +++ testsuite/gcc.dg/tree-ssa/cunroll-5.c (revision 0) > @@ -0,0 +1,14 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */ > +int *a; > +test(int c) > +{ > + int i; > + for (i=0;i<6;i++) > + a[i]=5; > +} > +/* Basic testcase for complette unrolling. */ > +/* { dg-final { scan-tree-dump "Unrolled loop 1 completely .duplicated 5 > times.." "cunroll"} } */ > +/* { dg-final { scan-tree-dump "Exit condition of peeled iterations was > eliminated." "cunroll"} } */ > +/* { dg-final { scan-tree-dump "Last iteration exit edge was proved true." > "cunroll"} } */ > +/* { dg-final { cleanup-tree-dump "cunroll" } } */ > Index: testsuite/gcc.dg/tree-ssa/ldist-17.c > =================================================================== > --- testsuite/gcc.dg/tree-ssa/ldist-17.c (revision 192483) > +++ testsuite/gcc.dg/tree-ssa/ldist-17.c (working copy) > @@ -1,5 +1,5 @@ > /* { dg-do compile } */ > -/* { dg-options "-O2 -ftree-loop-distribution > -ftree-loop-distribute-patterns -fdump-tree-ldist-details" } */ > +/* { dg-options "-O2 -ftree-loop-distribution > -ftree-loop-distribute-patterns -fdump-tree-ldist-details > fdisable-tree-cunroll -fdisable-tree-cunrolli" } */ > > typedef int mad_fixed_t; > struct mad_pcm > > -- Richard Biener <rguent...@suse.de> SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend