Hi,
here is third revised version of the complette unroling changes.  While working
on the RTL variant I noticed PR54937 and the fact that I was overly aggressive
on forcing single exit of the last iteration to be taken, because loop may 
terminate
otherwise (by EH or by exitting the program).  Same thinko is in loop-niter.

This patch adds loop_edge_to_cancel that is more conservative: it looks for the
exit conditional where the non-exitting edges leads to latch and verifies that
latch contains no statement with side effect that may terminate the loop.
This still actually matches quite few non-single-exit loops and works well in
practice.

Unlike previous revision it also enables complette unrolling when code size
does not grow even for non-innermost loops (with update in
tree_unroll_loops_completely to walk them). This is something we did on RTL
land but missed in trees.  This actually enables quite some optimizations when
things can be propagated to the tiny inner loop body.

I also fixed accounting in tree_estimate_loop_size for the cases where last
iteration is not going to be updated.

Finally I added code constructing __bulitin_unreachable as suggested by
Ian.

Bootstrapped/regtested x86_64-linux, also bootstrapped with -O3 and -Werror
disabled and benchmarked. Among best benefits is about 7% improvement on Applu,
and it causes up to 15% improvements on vectorized loops with small iteration
counts (by completelly peeling the precondition code).  There are no real
performance regressions but there is some code size bloat.  

I plan to followup with strenghtening the heuristic to disable unrolling when
benefits are absymal.  Easy is to limit unrolling on loops with CFG and/or
calls in them.  We already have quite informed analysis in place.  I also plan
to move simple FDO guided loop peeling from RTL level to trees to enable more
propagation into peeled sequences.

The patch also triggers bug in niter and requires xfailing do_1.f90 testcase.
I filled PR 54932 to track this.

There are also confused array bound warnings I hope to track incrementally, too,
by recording statements that are known to become unreachable in the last
iteration and adding __buitin_unreachable in front of them. This is also
important to avoid duplication leading to dead code: no other optimizers
force paths leading to out of bound accesses to not happen.

Honza


        * tree-ssa-loop-ivcanon.c (tree_estimate_loop_size): Add edge_to_cancel
        parameter and use it to estimate code optimized out in the final 
iteration.
        (loop_edge_to_cancel): New function.
        (try_unroll_loop_completely): New IRRED_IVALIDATED parameter;
        handle unrolling loops with bounds given via max_loop_iteratins;
        handle unrolling non-inner loops when code size shrinks;
        tidy dump output; when the last iteration loop still stays
        as loop in the CFG forcongly redirect the latch to
        __builtin_unreachable.
        (canonicalize_loop_induction_variables): Add irred_invlaidated
        parameter; record niter bound derrived; dump
        max_loop_iterations bounds; call try_unroll_loop_completely
        even if no niter bound is given.
        (canonicalize_induction_variables): Handle irred_invalidated.
        (tree_unroll_loops_completely): Handle non-innermost loops;
        handle irred_invalidated.
        * cfgloop.h (unlop): Declare.
        * cfgloopmanip.c (unloop): Export.
        * tree.c (build_common_builtin_nodes): Build BULTIN_UNREACHABLE.

        * gcc.target/i386/l_fma_float_?.c: Update.
        * gcc.target/i386/l_fma_double_?.c: Update.
        * gfortran.dg/do_1.f90: XFAIL
        * gcc.dg/tree-ssa/cunroll-1.c: New testcase.
        * gcc.dg/tree-ssa/cunroll-2.c: New testcase.
        * gcc.dg/tree-ssa/cunroll-3.c: New testcase.
        * gcc.dg/tree-ssa/cunroll-4.c: New testcase.
        * gcc.dg/tree-ssa/cunroll-5.c: New testcase.
        * gcc.dg/tree-ssa/ldist-17.c: Block cunroll to make testcase still
        valid.
Index: tree-ssa-loop-ivcanon.c
===================================================================
--- tree-ssa-loop-ivcanon.c     (revision 192483)
+++ tree-ssa-loop-ivcanon.c     (working copy)
@@ -192,7 +192,7 @@ constant_after_peeling (tree op, gimple 
    Return results in SIZE, estimate benefits for complete unrolling exiting by 
EXIT.  */
 
 static void
-tree_estimate_loop_size (struct loop *loop, edge exit, struct loop_size *size)
+tree_estimate_loop_size (struct loop *loop, edge exit, edge edge_to_cancel, 
struct loop_size *size)
 {
   basic_block *body = get_loop_body (loop);
   gimple_stmt_iterator gsi;
@@ -208,8 +208,8 @@ tree_estimate_loop_size (struct loop *lo
     fprintf (dump_file, "Estimating sizes for loop %i\n", loop->num);
   for (i = 0; i < loop->num_nodes; i++)
     {
-      if (exit && body[i] != exit->src
-         && dominated_by_p (CDI_DOMINATORS, body[i], exit->src))
+      if (edge_to_cancel && body[i] != edge_to_cancel->src
+         && dominated_by_p (CDI_DOMINATORS, body[i], edge_to_cancel->src))
        after_exit = true;
       else
        after_exit = false;
@@ -231,7 +231,7 @@ tree_estimate_loop_size (struct loop *lo
          /* Look for reasons why we might optimize this stmt away. */
 
          /* Exit conditional.  */
-         if (body[i] == exit->src && stmt == last_stmt (exit->src))
+         if (exit && body[i] == exit->src && stmt == last_stmt (exit->src))
            {
              if (dump_file && (dump_flags & TDF_DETAILS))
                fprintf (dump_file, "   Exit condition will be eliminated.\n");
@@ -314,36 +314,161 @@ estimated_unrolled_size (struct loop_siz
   return unr_insns;
 }
 
+/* Loop LOOP is known to not loop.  See if there is an edge in the loop
+   body that can be remove to make the loop to always exit and at
+   the same time it does not make any code potentially executed 
+   during the last iteration dead.  
+
+   After complette unrolling we still may get rid of the conditional
+   on the exit in the last copy even if we have no idea what it does.
+   This is quite common case for loops of form
+
+     int a[5];
+     for (i=0;i<b;i++)
+       a[i]=0;
+
+   Here we prove the loop to iterate 5 times but we do not know
+   it from induction variable.
+
+   For now we handle only simple case where there is exit condition
+   just before the latch block and the latch block contains no statements
+   with side effect that may otherwise terminate the execution of loop
+   (such as by EH or by terminating the program or longjmp).
+
+   In the general case we may want to cancel the paths leading to statements
+   loop-niter identified as having undefined effect in the last iteration.
+   The other cases are hopefully rare and will be cleaned up later.  */
+
+edge
+loop_edge_to_cancel (struct loop *loop)
+{
+  VEC (edge, heap) *exits;
+  unsigned i;
+  edge edge_to_cancel;
+  gimple_stmt_iterator gsi;
+
+  /* We want only one predecestor of the loop.  */
+  if (EDGE_COUNT (loop->latch->preds) > 1)
+    return NULL;
+
+  exits = get_loop_exit_edges (loop);
+
+  FOR_EACH_VEC_ELT (edge, exits, i, edge_to_cancel)
+    {
+       /* Find the other edge than the loop exit
+          leaving the conditoinal.  */
+       if (EDGE_COUNT (edge_to_cancel->src->succs) != 2)
+         continue;
+       if (EDGE_SUCC (edge_to_cancel->src, 0) == edge_to_cancel)
+         edge_to_cancel = EDGE_SUCC (edge_to_cancel->src, 1);
+       else
+         edge_to_cancel = EDGE_SUCC (edge_to_cancel->src, 0);
+
+      /* We should never have conditionals in the loop latch. */
+      gcc_assert (edge_to_cancel->dest != loop->header);
+
+      /* Check that it leads to loop latch.  */
+      if (edge_to_cancel->dest != loop->latch)
+        continue;
+
+      VEC_free (edge, heap, exits);
+
+      /* Verify that the code in loop latch does nothing that may end program
+         execution without really reaching the exit.  This may include
+        non-pure/const function calls, EH statements, volatile ASMs etc.  */
+      for (gsi = gsi_start_bb (loop->latch); !gsi_end_p (gsi); gsi_next (&gsi))
+       if (gimple_has_side_effects (gsi_stmt (gsi)))
+          return NULL;
+      return edge_to_cancel;
+    }
+  VEC_free (edge, heap, exits);
+  return NULL;
+}
+
 /* Tries to unroll LOOP completely, i.e. NITER times.
    UL determines which loops we are allowed to unroll.
-   EXIT is the exit of the loop that should be eliminated.  */
+   EXIT is the exit of the loop that should be eliminated.  
+   IRRED_INVALIDATED is used to bookkeep if information about
+   irreducible regions may become invalid as a result
+   of the transformation.  */
 
 static bool
 try_unroll_loop_completely (struct loop *loop,
                            edge exit, tree niter,
-                           enum unroll_level ul)
+                           enum unroll_level ul,
+                           bool *irred_invalidated)
 {
   unsigned HOST_WIDE_INT n_unroll, ninsns, max_unroll, unr_insns;
   gimple cond;
   struct loop_size size;
+  bool n_unroll_found = false;
+  HOST_WIDE_INT maxiter;
+  basic_block latch;
+  edge latch_edge;
+  location_t locus;
+  int flags;
+  gimple stmt;
+  gimple_stmt_iterator gsi;
+  edge edge_to_cancel = NULL;
+  int num = loop->num;
 
-  if (loop->inner)
-    return false;
+  /* See if we proved number of iterations to be low constant.
 
-  if (!host_integerp (niter, 1))
+     EXIT is an edge that will be removed in all but last iteration of 
+     the loop.
+
+     EDGE_TO_CACNEL is an edge that will be removed from the last iteration
+     of the unrolled sequence and is expected to make the final loop not
+     rolling. 
+
+     If the number of execution of loop is determined by standard induction
+     variable test, then EXIT and EDGE_TO_CANCEL are the two edges leaving
+     from the iv test.  */
+  if (host_integerp (niter, 1))
+    {
+      n_unroll = tree_low_cst (niter, 1);
+      n_unroll_found = true;
+      edge_to_cancel = EDGE_SUCC (exit->src, 0);
+      if (edge_to_cancel == exit)
+       edge_to_cancel = EDGE_SUCC (exit->src, 1);
+    }
+  /* We do not know the number of iterations and thus we can not eliminate
+     the EXIT edge.  */
+  else
+    exit = NULL;
+
+  /* See if we can improve our estimate by using recorded loop bounds.  */
+  maxiter = max_loop_iterations_int (loop);
+  if (maxiter >= 0
+      && (!n_unroll_found || (unsigned HOST_WIDE_INT)maxiter < n_unroll))
+    {
+      n_unroll = maxiter;
+      n_unroll_found = true;
+      /* Loop terminates before the IV variable test, so we can not
+        remove it in the last iteration.  */
+      edge_to_cancel = NULL;
+    }
+
+  if (!n_unroll_found)
     return false;
-  n_unroll = tree_low_cst (niter, 1);
 
   max_unroll = PARAM_VALUE (PARAM_MAX_COMPLETELY_PEEL_TIMES);
   if (n_unroll > max_unroll)
     return false;
 
+  if (!edge_to_cancel)
+    edge_to_cancel = loop_edge_to_cancel (loop);
+
   if (n_unroll)
     {
+      sbitmap wont_exit;
+      edge e;
+      unsigned i;
+      VEC (edge, heap) *to_remove = NULL;
       if (ul == UL_SINGLE_ITER)
        return false;
 
-      tree_estimate_loop_size (loop, exit, &size);
+      tree_estimate_loop_size (loop, exit, edge_to_cancel, &size);
       ninsns = size.overall;
 
       unr_insns = estimated_unrolled_size (&size, n_unroll);
@@ -354,6 +479,18 @@ try_unroll_loop_completely (struct loop 
                   (int) unr_insns);
        }
 
+      /* We unroll only inner loops, because we do not consider it profitable
+        otheriwse.  We still can cancel loopback edge of not rolling loop;
+        this is always a good idea.  */
+      if (loop->inner && unr_insns > ninsns)
+       {
+         if (dump_file && (dump_flags & TDF_DETAILS))
+           fprintf (dump_file, "Not unrolling loop %d:"
+                    "it is not innermost and code would grow.\n",
+                    loop->num);
+         return false;
+       }
+
       if (unr_insns > ninsns
          && (unr_insns
              > (unsigned) PARAM_VALUE (PARAM_MAX_COMPLETELY_PEELED_INSNS)))
@@ -369,17 +506,10 @@ try_unroll_loop_completely (struct loop 
          && unr_insns > ninsns)
        {
          if (dump_file && (dump_flags & TDF_DETAILS))
-           fprintf (dump_file, "Not unrolling loop %d.\n", loop->num);
+           fprintf (dump_file, "Not unrolling loop %d: size would grow.\n",
+                    loop->num);
          return false;
        }
-    }
-
-  if (n_unroll)
-    {
-      sbitmap wont_exit;
-      edge e;
-      unsigned i;
-      VEC (edge, heap) *to_remove = NULL;
 
       initialize_original_copy_tables ();
       wont_exit = sbitmap_alloc (n_unroll + 1);
@@ -408,15 +538,67 @@ try_unroll_loop_completely (struct loop 
       free_original_copy_tables ();
     }
 
-  cond = last_stmt (exit->src);
-  if (exit->flags & EDGE_TRUE_VALUE)
-    gimple_cond_make_true (cond);
-  else
-    gimple_cond_make_false (cond);
-  update_stmt (cond);
+  /* Remove the conditional from the last copy of the loop.  */
+  if (edge_to_cancel)
+    {
+      cond = last_stmt (edge_to_cancel->src);
+      if (edge_to_cancel->flags & EDGE_TRUE_VALUE)
+       gimple_cond_make_false (cond);
+      else
+       gimple_cond_make_true (cond);
+      update_stmt (cond);
+      /* Do not remove the path. Doing so may remove outer loop
+        and confuse bookkeeping code in tree_unroll_loops_completelly.  */
+    }
+  /* We did not manage to cancel the loop.
+     The loop latch remains reachable even if it will never be reached
+     at runtime.  We must redirect it to somewhere, so create basic
+     block containg __builtin_unreachable call for this reason.  */
+  else
+    {
+      latch = loop->latch;
+      latch_edge = loop_latch_edge (loop);
+      flags = latch_edge->flags;
+      locus = latch_edge->goto_locus;
+
+      /* Unloop destroys the latch edge.  */
+      unloop (loop, irred_invalidated);
+
+      /* Create new basic block for the latch edge destination and wire
+        it in.  */
+      stmt = gimple_build_call (builtin_decl_implicit (BUILT_IN_UNREACHABLE), 
0);
+      latch_edge = make_edge (latch, create_basic_block (NULL, NULL, latch), 
flags);
+      latch_edge->probability = 0;
+      latch_edge->count = 0;
+      latch_edge->flags |= flags;
+      latch_edge->goto_locus = locus;
+
+      latch_edge->dest->loop_father = current_loops->tree_root;
+      latch_edge->dest->count = 0;
+      latch_edge->dest->frequency = 0;
+      set_immediate_dominator (CDI_DOMINATORS, latch_edge->dest, 
latch_edge->src);
+
+      gsi = gsi_start_bb (latch_edge->dest);
+      gsi_insert_after (&gsi, stmt, GSI_NEW_STMT);
+    }
 
   if (dump_file && (dump_flags & TDF_DETAILS))
-    fprintf (dump_file, "Unrolled loop %d completely.\n", loop->num);
+    {
+      if (!n_unroll)
+        fprintf (dump_file, "Turned loop %d to non-loop; it never loops.\n",
+                num);
+      else
+        fprintf (dump_file, "Unrolled loop %d completely "
+                "(duplicated %i times).\n", num, (int)n_unroll);
+      if (exit)
+        fprintf (dump_file, "Exit condition of peeled iterations was "
+                "eliminated.\n");
+      if (edge_to_cancel)
+        fprintf (dump_file, "Last iteration exit edge was proved true.\n");
+      else
+        fprintf (dump_file, "Latch of last iteration was marked by "
+                "__builtin_unreachable ().\n");
+    }
 
   return true;
 }
@@ -425,12 +608,15 @@ try_unroll_loop_completely (struct loop 
    CREATE_IV is true if we may create a new iv.  UL determines
    which loops we are allowed to completely unroll.  If TRY_EVAL is true, we 
try
    to determine the number of iterations of a loop by direct evaluation.
-   Returns true if cfg is changed.  */
+   Returns true if cfg is changed.  
+
+   IRRED_INVALIDATED is used to keep if irreducible reginos needs to be 
recomputed.  */
 
 static bool
 canonicalize_loop_induction_variables (struct loop *loop,
                                       bool create_iv, enum unroll_level ul,
-                                      bool try_eval)
+                                      bool try_eval,
+                                      bool *irred_invalidated)
 {
   edge exit = NULL;
   tree niter;
@@ -455,22 +641,34 @@ canonicalize_loop_induction_variables (s
              || TREE_CODE (niter) != INTEGER_CST))
        niter = find_loop_niter_by_eval (loop, &exit);
 
-      if (chrec_contains_undetermined (niter)
-         || TREE_CODE (niter) != INTEGER_CST)
-       return false;
+      if (TREE_CODE (niter) != INTEGER_CST)
+       exit = NULL;
     }
 
-  if (dump_file && (dump_flags & TDF_DETAILS))
+  /* We work exceptionally hard here to estimate the bound
+     by find_loop_niter_by_eval.  Be sure to keep it for future.  */
+  if (niter && TREE_CODE (niter) == INTEGER_CST)
+    record_niter_bound (loop, tree_to_double_int (niter), false, true);
+
+  if (dump_file && (dump_flags & TDF_DETAILS)
+      && TREE_CODE (niter) == INTEGER_CST)
     {
       fprintf (dump_file, "Loop %d iterates ", loop->num);
       print_generic_expr (dump_file, niter, TDF_SLIM);
       fprintf (dump_file, " times.\n");
     }
+  if (dump_file && (dump_flags & TDF_DETAILS)
+      && max_loop_iterations_int (loop) >= 0)
+    {
+      fprintf (dump_file, "Loop %d iterates at most %i times.\n", loop->num,
+              (int)max_loop_iterations_int (loop));
+    }
 
-  if (try_unroll_loop_completely (loop, exit, niter, ul))
+  if (try_unroll_loop_completely (loop, exit, niter, ul, irred_invalidated))
     return true;
 
-  if (create_iv)
+  if (create_iv
+      && niter && !chrec_contains_undetermined (niter))
     create_canonical_iv (loop, exit, niter);
 
   return false;
@@ -485,15 +683,21 @@ canonicalize_induction_variables (void)
   loop_iterator li;
   struct loop *loop;
   bool changed = false;
+  bool irred_invalidated = false;
 
   FOR_EACH_LOOP (li, loop, 0)
     {
       changed |= canonicalize_loop_induction_variables (loop,
                                                        true, UL_SINGLE_ITER,
-                                                       true);
+                                                       true,
+                                                       &irred_invalidated);
     }
   gcc_assert (!need_ssa_update_p (cfun));
 
+  if (irred_invalidated
+      && loops_state_satisfies_p (LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS))
+    mark_irreducible_loops ();
+
   /* Clean up the information about numbers of iterations, since brute force
      evaluation could reveal new information.  */
   scev_reset ();
@@ -594,9 +798,10 @@ tree_unroll_loops_completely (bool may_i
 
   do
     {
+      bool irred_invalidated = false;
       changed = false;
 
-      FOR_EACH_LOOP (li, loop, LI_ONLY_INNERMOST)
+      FOR_EACH_LOOP (li, loop, 0)
        {
          struct loop *loop_father = loop_outer (loop);
 
@@ -609,7 +814,8 @@ tree_unroll_loops_completely (bool may_i
            ul = UL_NO_GROWTH;
 
          if (canonicalize_loop_induction_variables (loop, false, ul,
-                                                    !flag_tree_loop_ivcanon))
+                                                    !flag_tree_loop_ivcanon,
+                                                    &irred_invalidated))
            {
              changed = true;
              /* If we'll continue unrolling, we need to propagate constants
@@ -629,6 +835,10 @@ tree_unroll_loops_completely (bool may_i
          struct loop **iter;
          unsigned i;
 
+         if (irred_invalidated
+             && loops_state_satisfies_p 
(LOOPS_HAVE_MARKED_IRREDUCIBLE_REGIONS))
+           mark_irreducible_loops ();
+
          update_ssa (TODO_update_ssa);
 
          /* Propagate the constants within the new basic blocks.  */
Index: tree.c
===================================================================
--- tree.c      (revision 192483)
+++ tree.c      (working copy)
@@ -9524,6 +9524,15 @@ build_common_builtin_nodes (void)
   tree tmp, ftype;
   int ecf_flags;
 
+  if (!builtin_decl_explicit_p (BUILT_IN_UNREACHABLE))
+    {
+      ftype = build_function_type (void_type_node, void_list_node);
+      local_define_builtin ("__builtin_unreachable", ftype, 
BUILT_IN_UNREACHABLE,
+                           "__builtin_unreachable",
+                           ECF_NOTHROW | ECF_LEAF | ECF_NORETURN
+                           | ECF_CONST | ECF_LEAF);
+    }
+
   if (!builtin_decl_explicit_p (BUILT_IN_MEMCPY)
       || !builtin_decl_explicit_p (BUILT_IN_MEMMOVE))
     {
Index: cfgloop.h
===================================================================
--- cfgloop.h   (revision 192483)
+++ cfgloop.h   (working copy)
@@ -320,7 +321,8 @@ extern struct loop *loopify (edge, edge,
 struct loop * loop_version (struct loop *, void *,
                            basic_block *, unsigned, unsigned, unsigned, bool);
 extern bool remove_path (edge);
-void scale_loop_frequencies (struct loop *, int, int);
+extern void unloop (struct loop *, bool *);
+extern void scale_loop_frequencies (struct loop *, int, int);
 
 /* Induction variable analysis.  */
 
Index: cfgloopmanip.c
===================================================================
--- cfgloopmanip.c      (revision 192483)
+++ cfgloopmanip.c      (working copy)
@@ -37,7 +37,6 @@ static int find_path (edge, basic_block 
 static void fix_loop_placements (struct loop *, bool *);
 static bool fix_bb_placement (basic_block);
 static void fix_bb_placements (basic_block, bool *);
-static void unloop (struct loop *, bool *);
 
 /* Checks whether basic block BB is dominated by DATA.  */
 static bool
@@ -895,7 +894,7 @@ loopify (edge latch_edge, edge header_ed
    If this may cause the information about irreducible regions to become
    invalid, IRRED_INVALIDATED is set to true.  */
 
-static void
+void
 unloop (struct loop *loop, bool *irred_invalidated)
 {
   basic_block *body;
Index: testsuite/gcc.target/i386/l_fma_float_5.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_float_5.c   (revision 192483)
+++ testsuite/gcc.target/i386/l_fma_float_5.c   (working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub132ps" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmadd132ps" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmsub132ps" 8  } } */
-/* { dg-final { scan-assembler-times "vfmadd132ss" 16  } } */
-/* { dg-final { scan-assembler-times "vfmsub132ss" 16  } } */
-/* { dg-final { scan-assembler-times "vfnmadd132ss" 16  } } */
-/* { dg-final { scan-assembler-times "vfnmsub132ss" 16  } } */
+/* { dg-final { scan-assembler-times "vfmadd132ss" 72  } } */
+/* { dg-final { scan-assembler-times "vfmsub132ss" 72  } } */
+/* { dg-final { scan-assembler-times "vfnmadd132ss" 72  } } */
+/* { dg-final { scan-assembler-times "vfnmsub132ss" 72  } } */
Index: testsuite/gcc.target/i386/l_fma_double_4.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_double_4.c  (revision 192483)
+++ testsuite/gcc.target/i386/l_fma_double_4.c  (working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub132pd" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmadd132pd" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmsub132pd" 8  } } */
-/* { dg-final { scan-assembler-times "vfmadd132sd" 16  } } */
-/* { dg-final { scan-assembler-times "vfmsub132sd" 16  } } */
-/* { dg-final { scan-assembler-times "vfnmadd132sd" 16  } } */
-/* { dg-final { scan-assembler-times "vfnmsub132sd" 16  } } */
+/* { dg-final { scan-assembler-times "vfmadd132sd" 40  } } */
+/* { dg-final { scan-assembler-times "vfmsub132sd" 40  } } */
+/* { dg-final { scan-assembler-times "vfnmadd132sd" 40  } } */
+/* { dg-final { scan-assembler-times "vfnmsub132sd" 40  } } */
Index: testsuite/gcc.target/i386/l_fma_float_2.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_float_2.c   (revision 192483)
+++ testsuite/gcc.target/i386/l_fma_float_2.c   (working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub132ps" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmadd132ps" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmsub132ps" 8  } } */
-/* { dg-final { scan-assembler-times "vfmadd132ss" 16  } } */
-/* { dg-final { scan-assembler-times "vfmsub132ss" 16  } } */
-/* { dg-final { scan-assembler-times "vfnmadd132ss" 16  } } */
-/* { dg-final { scan-assembler-times "vfnmsub132ss" 16  } } */
+/* { dg-final { scan-assembler-times "vfmadd132ss" 72  } } */
+/* { dg-final { scan-assembler-times "vfmsub132ss" 72  } } */
+/* { dg-final { scan-assembler-times "vfnmadd132ss" 72  } } */
+/* { dg-final { scan-assembler-times "vfnmsub132ss" 72  } } */
Index: testsuite/gcc.target/i386/l_fma_float_6.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_float_6.c   (revision 192483)
+++ testsuite/gcc.target/i386/l_fma_float_6.c   (working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub132ps" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmadd132ps" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmsub132ps" 8  } } */
-/* { dg-final { scan-assembler-times "vfmadd132ss" 16  } } */
-/* { dg-final { scan-assembler-times "vfmsub132ss" 16  } } */
-/* { dg-final { scan-assembler-times "vfnmadd132ss" 16  } } */
-/* { dg-final { scan-assembler-times "vfnmsub132ss" 16  } } */
+/* { dg-final { scan-assembler-times "vfmadd132ss" 72  } } */
+/* { dg-final { scan-assembler-times "vfmsub132ss" 72  } } */
+/* { dg-final { scan-assembler-times "vfnmadd132ss" 72  } } */
+/* { dg-final { scan-assembler-times "vfnmsub132ss" 72  } } */
Index: testsuite/gcc.target/i386/l_fma_double_1.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_double_1.c  (revision 192483)
+++ testsuite/gcc.target/i386/l_fma_double_1.c  (working copy)
@@ -16,11 +16,11 @@
 /* { dg-final { scan-assembler-times "vfnmadd231pd" 4  } } */
 /* { dg-final { scan-assembler-times "vfnmsub132pd" 4  } } */
 /* { dg-final { scan-assembler-times "vfnmsub231pd" 4  } } */
-/* { dg-final { scan-assembler-times "vfmadd132sd" 8  } } */
-/* { dg-final { scan-assembler-times "vfmadd213sd" 8  } } */
-/* { dg-final { scan-assembler-times "vfmsub132sd" 8  } } */
-/* { dg-final { scan-assembler-times "vfmsub213sd" 8  } } */
-/* { dg-final { scan-assembler-times "vfnmadd132sd" 8  } } */
-/* { dg-final { scan-assembler-times "vfnmadd213sd" 8  } } */
-/* { dg-final { scan-assembler-times "vfnmsub132sd" 8  } } */
-/* { dg-final { scan-assembler-times "vfnmsub213sd" 8  } } */
+/* { dg-final { scan-assembler-times "vfmadd132sd" 20  } } */
+/* { dg-final { scan-assembler-times "vfmadd213sd" 20  } } */
+/* { dg-final { scan-assembler-times "vfmsub132sd" 20  } } */
+/* { dg-final { scan-assembler-times "vfmsub213sd" 20  } } */
+/* { dg-final { scan-assembler-times "vfnmadd132sd" 20  } } */
+/* { dg-final { scan-assembler-times "vfnmadd213sd" 20  } } */
+/* { dg-final { scan-assembler-times "vfnmsub132sd" 20  } } */
+/* { dg-final { scan-assembler-times "vfnmsub213sd" 20  } } */
Index: testsuite/gcc.target/i386/l_fma_double_5.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_double_5.c  (revision 192483)
+++ testsuite/gcc.target/i386/l_fma_double_5.c  (working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub132pd" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmadd132pd" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmsub132pd" 8  } } */
-/* { dg-final { scan-assembler-times "vfmadd132sd" 16  } } */
-/* { dg-final { scan-assembler-times "vfmsub132sd" 16  } } */
-/* { dg-final { scan-assembler-times "vfnmadd132sd" 16  } } */
-/* { dg-final { scan-assembler-times "vfnmsub132sd" 16  } } */
+/* { dg-final { scan-assembler-times "vfmadd132sd" 40  } } */
+/* { dg-final { scan-assembler-times "vfmsub132sd" 40  } } */
+/* { dg-final { scan-assembler-times "vfnmadd132sd" 40  } } */
+/* { dg-final { scan-assembler-times "vfnmsub132sd" 40  } } */
Index: testsuite/gcc.target/i386/l_fma_float_3.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_float_3.c   (revision 192483)
+++ testsuite/gcc.target/i386/l_fma_float_3.c   (working copy)
@@ -16,11 +16,11 @@
 /* { dg-final { scan-assembler-times "vfnmadd231ps" 4  } } */
 /* { dg-final { scan-assembler-times "vfnmsub132ps" 4  } } */
 /* { dg-final { scan-assembler-times "vfnmsub231ps" 4  } } */
-/* { dg-final { scan-assembler-times "vfmadd132ss" 8  } } */
-/* { dg-final { scan-assembler-times "vfmadd213ss" 8  } } */
-/* { dg-final { scan-assembler-times "vfmsub132ss" 8  } } */
-/* { dg-final { scan-assembler-times "vfmsub213ss" 8  } } */
-/* { dg-final { scan-assembler-times "vfnmadd132ss" 8  } } */
-/* { dg-final { scan-assembler-times "vfnmadd213ss" 8  } } */
-/* { dg-final { scan-assembler-times "vfnmsub132ss" 8  } } */
-/* { dg-final { scan-assembler-times "vfnmsub213ss" 8  } } */
+/* { dg-final { scan-assembler-times "vfmadd132ss" 36  } } */
+/* { dg-final { scan-assembler-times "vfmadd213ss" 36  } } */
+/* { dg-final { scan-assembler-times "vfmsub132ss" 36  } } */
+/* { dg-final { scan-assembler-times "vfmsub213ss" 36  } } */
+/* { dg-final { scan-assembler-times "vfnmadd132ss" 36  } } */
+/* { dg-final { scan-assembler-times "vfnmadd213ss" 36  } } */
+/* { dg-final { scan-assembler-times "vfnmsub132ss" 36  } } */
+/* { dg-final { scan-assembler-times "vfnmsub213ss" 36  } } */
Index: testsuite/gcc.target/i386/l_fma_double_2.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_double_2.c  (revision 192483)
+++ testsuite/gcc.target/i386/l_fma_double_2.c  (working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub132pd" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmadd132pd" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmsub132pd" 8  } } */
-/* { dg-final { scan-assembler-times "vfmadd132sd" 16  } } */
-/* { dg-final { scan-assembler-times "vfmsub132sd" 16  } } */
-/* { dg-final { scan-assembler-times "vfnmadd132sd" 16  } } */
-/* { dg-final { scan-assembler-times "vfnmsub132sd" 16  } } */
+/* { dg-final { scan-assembler-times "vfmadd132sd" 40  } } */
+/* { dg-final { scan-assembler-times "vfmsub132sd" 40  } } */
+/* { dg-final { scan-assembler-times "vfnmadd132sd" 40  } } */
+/* { dg-final { scan-assembler-times "vfnmsub132sd" 40  } } */
Index: testsuite/gcc.target/i386/l_fma_double_6.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_double_6.c  (revision 192483)
+++ testsuite/gcc.target/i386/l_fma_double_6.c  (working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub132pd" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmadd132pd" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmsub132pd" 8  } } */
-/* { dg-final { scan-assembler-times "vfmadd132sd" 16  } } */
-/* { dg-final { scan-assembler-times "vfmsub132sd" 16  } } */
-/* { dg-final { scan-assembler-times "vfnmadd132sd" 16  } } */
-/* { dg-final { scan-assembler-times "vfnmsub132sd" 16  } } */
+/* { dg-final { scan-assembler-times "vfmadd132sd" 40 } } */
+/* { dg-final { scan-assembler-times "vfmsub132sd" 40  } } */
+/* { dg-final { scan-assembler-times "vfnmadd132sd" 40  } } */
+/* { dg-final { scan-assembler-times "vfnmsub132sd" 40  } } */
Index: testsuite/gcc.target/i386/l_fma_float_4.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_float_4.c   (revision 192483)
+++ testsuite/gcc.target/i386/l_fma_float_4.c   (working copy)
@@ -12,7 +12,7 @@
 /* { dg-final { scan-assembler-times "vfmsub132ps" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmadd132ps" 8  } } */
 /* { dg-final { scan-assembler-times "vfnmsub132ps" 8  } } */
-/* { dg-final { scan-assembler-times "vfmadd132ss" 16  } } */
-/* { dg-final { scan-assembler-times "vfmsub132ss" 16  } } */
-/* { dg-final { scan-assembler-times "vfnmadd132ss" 16  } } */
-/* { dg-final { scan-assembler-times "vfnmsub132ss" 16  } } */
+/* { dg-final { scan-assembler-times "vfmadd132ss" 72  } } */
+/* { dg-final { scan-assembler-times "vfmsub132ss" 72  } } */
+/* { dg-final { scan-assembler-times "vfnmadd132ss" 72  } } */
+/* { dg-final { scan-assembler-times "vfnmsub132ss" 72  } } */
Index: testsuite/gcc.target/i386/l_fma_double_3.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_double_3.c  (revision 192483)
+++ testsuite/gcc.target/i386/l_fma_double_3.c  (working copy)
@@ -16,11 +16,11 @@
 /* { dg-final { scan-assembler-times "vfnmadd231pd" 4  } } */
 /* { dg-final { scan-assembler-times "vfnmsub132pd" 4  } } */
 /* { dg-final { scan-assembler-times "vfnmsub231pd" 4  } } */
-/* { dg-final { scan-assembler-times "vfmadd132sd" 8  } } */
-/* { dg-final { scan-assembler-times "vfmadd213sd" 8  } } */
-/* { dg-final { scan-assembler-times "vfmsub132sd" 8  } } */
-/* { dg-final { scan-assembler-times "vfmsub213sd" 8  } } */
-/* { dg-final { scan-assembler-times "vfnmadd132sd" 8  } } */
-/* { dg-final { scan-assembler-times "vfnmadd213sd" 8  } } */
-/* { dg-final { scan-assembler-times "vfnmsub132sd" 8  } } */
-/* { dg-final { scan-assembler-times "vfnmsub213sd" 8  } } */
+/* { dg-final { scan-assembler-times "vfmadd132sd" 20  } } */
+/* { dg-final { scan-assembler-times "vfmadd213sd" 20  } } */
+/* { dg-final { scan-assembler-times "vfmsub132sd" 20  } } */
+/* { dg-final { scan-assembler-times "vfmsub213sd" 20  } } */
+/* { dg-final { scan-assembler-times "vfnmadd132sd" 20  } } */
+/* { dg-final { scan-assembler-times "vfnmadd213sd" 20  } } */
+/* { dg-final { scan-assembler-times "vfnmsub132sd" 20  } } */
+/* { dg-final { scan-assembler-times "vfnmsub213sd" 20  } } */
Index: testsuite/gcc.target/i386/l_fma_float_1.c
===================================================================
--- testsuite/gcc.target/i386/l_fma_float_1.c   (revision 192483)
+++ testsuite/gcc.target/i386/l_fma_float_1.c   (working copy)
@@ -16,11 +16,11 @@
 /* { dg-final { scan-assembler-times "vfnmadd231ps" 4  } } */
 /* { dg-final { scan-assembler-times "vfnmsub132ps" 4  } } */
 /* { dg-final { scan-assembler-times "vfnmsub231ps" 4  } } */
-/* { dg-final { scan-assembler-times "vfmadd132ss" 8  } } */
-/* { dg-final { scan-assembler-times "vfmadd213ss" 8  } } */
-/* { dg-final { scan-assembler-times "vfmsub132ss" 8  } } */
-/* { dg-final { scan-assembler-times "vfmsub213ss" 8  } } */
-/* { dg-final { scan-assembler-times "vfnmadd132ss" 8  } } */
-/* { dg-final { scan-assembler-times "vfnmadd213ss" 8  } } */
-/* { dg-final { scan-assembler-times "vfnmsub132ss" 8  } } */
-/* { dg-final { scan-assembler-times "vfnmsub213ss" 8  } } */
+/* { dg-final { scan-assembler-times "vfmadd132ss" 36  } } */
+/* { dg-final { scan-assembler-times "vfmadd213ss" 36  } } */
+/* { dg-final { scan-assembler-times "vfmsub132ss" 36  } } */
+/* { dg-final { scan-assembler-times "vfmsub213ss" 36  } } */
+/* { dg-final { scan-assembler-times "vfnmadd132ss" 36  } } */
+/* { dg-final { scan-assembler-times "vfnmadd213ss" 36  } } */
+/* { dg-final { scan-assembler-times "vfnmsub132ss" 36  } } */
+/* { dg-final { scan-assembler-times "vfnmsub213ss" 36  } } */
Index: testsuite/gfortran.dg/do_1.f90
===================================================================
--- testsuite/gfortran.dg/do_1.f90      (revision 192483)
+++ testsuite/gfortran.dg/do_1.f90      (working copy)
@@ -1,4 +1,5 @@
-! { dg-do run }
+! { dg-do run { xfail *-*-* } }
+! XFAIL is tracked in PR 54932
 ! Program to check corner cases for DO statements.
 program do_1
   implicit none
Index: testsuite/gcc.dg/tree-ssa/cunroll-1.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/cunroll-1.c       (revision 0)
+++ testsuite/gcc.dg/tree-ssa/cunroll-1.c       (revision 0)
@@ -0,0 +1,13 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
+int a[2];
+test(int c)
+{ 
+  int i;
+  for (i=0;i<c;i++)
+    a[i]=5;
+}
+/* Array bounds says the loop will not roll much.  */
+/* { dg-final { scan-tree-dump "Unrolled loop 1 completely .duplicated 1 
times.." "cunroll"} } */
+/* { dg-final { scan-tree-dump "Last iteration exit edge was proved true." 
"cunroll"} } */
+/* { dg-final { cleanup-tree-dump "cunroll" } } */
Index: testsuite/gcc.dg/tree-ssa/cunroll-2.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/cunroll-2.c       (revision 0)
+++ testsuite/gcc.dg/tree-ssa/cunroll-2.c       (revision 0)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
+int a[2];
+test(int c)
+{ 
+  int i;
+  for (i=0;i<c;i++)
+    {
+      a[i]=5;
+      if (test2())
+       return;
+    }
+}
+/* We are not able to get rid of the final conditional because the loop has 
two exits.  */
+/* { dg-final { scan-tree-dump "Unrolled loop 1 completely .duplicated 2 
times.." "cunroll"} } */
+/* { dg-final { cleanup-tree-dump "cunroll" } } */
Index: testsuite/gcc.dg/tree-ssa/cunroll-3.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/cunroll-3.c       (revision 0)
+++ testsuite/gcc.dg/tree-ssa/cunroll-3.c       (revision 0)
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-cunrolli-details" } */
+int a[1];
+test(int c)
+{ 
+  int i;
+  for (i=0;i<c;i++)
+    {
+      a[i]=5;
+    }
+}
+/* If we start duplicating headers prior curoll, this loop will have 0 
iterations.  */
+
+/* { dg-final { scan-tree-dump "Unrolled loop 1 completely .duplicated 1 
times.." "cunrolli"} } */
+/* { dg-final { cleanup-tree-dump "cunrolli" } } */
Index: testsuite/gcc.dg/tree-ssa/cunroll-4.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/cunroll-4.c       (revision 0)
+++ testsuite/gcc.dg/tree-ssa/cunroll-4.c       (revision 0)
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
+int a[1];
+test(int c)
+{ 
+  int i=0,j;
+  for (i=0;i<c;i++)
+    {
+      for (j=0;j<c;j++)
+       {
+          a[i]=5;
+         test2();
+       }
+    }
+}
+
+/* We should do this as part of cunrolli, but our cost model do not take into 
account early exit
+   from the last iteration.  */
+/* { dg-final { scan-tree-dump "Turned loop 1 to non-loop; it never loops." 
"cunrolli"} } */
+/* { dg-final { scan-tree-dump "Last iteration exit edge was proved true." 
"cunrolli"} } */
+/* { dg-final { cleanup-tree-dump "cunroll" } } */
Index: testsuite/gcc.dg/tree-ssa/cunroll-5.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/cunroll-5.c       (revision 0)
+++ testsuite/gcc.dg/tree-ssa/cunroll-5.c       (revision 0)
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
+int *a;
+test(int c)
+{ 
+  int i;
+  for (i=0;i<6;i++)
+    a[i]=5;
+}
+/* Basic testcase for complette unrolling.  */
+/* { dg-final { scan-tree-dump "Unrolled loop 1 completely .duplicated 5 
times.." "cunroll"} } */
+/* { dg-final { scan-tree-dump "Exit condition of peeled iterations was 
eliminated." "cunroll"} } */
+/* { dg-final { scan-tree-dump "Last iteration exit edge was proved true." 
"cunroll"} } */
+/* { dg-final { cleanup-tree-dump "cunroll" } } */
Index: testsuite/gcc.dg/tree-ssa/ldist-17.c
===================================================================
--- testsuite/gcc.dg/tree-ssa/ldist-17.c        (revision 192483)
+++ testsuite/gcc.dg/tree-ssa/ldist-17.c        (working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -ftree-loop-distribution -ftree-loop-distribute-patterns 
-fdump-tree-ldist-details" } */
+/* { dg-options "-O2 -ftree-loop-distribution -ftree-loop-distribute-patterns 
-fdump-tree-ldist-details fdisable-tree-cunroll -fdisable-tree-cunrolli" } */
 
 typedef int mad_fixed_t;
 struct mad_pcm

Reply via email to