With Honzas change to make the x86 backend consider the actual operation
for costing vector stmts it becomes apparent that 
vect_compute_single_scalar_iteration_cost uses the old-style target
cost hook which doesn't get enough information to distinguish different
operations.  This means instead of actual scalar multiplication cost
we cost a general scalar-stmt cost for the testcase for the scalar
iteration but cost a vector multiplication for the vectorized body
resulting in an apples-to-oranges comparison in the end.

Fixed as follows.

Bootstrap and regtest running on x86_64-unknown-linux-gnu.

Richard.

2018-02-27  Richard Biener  <rguent...@suse.de>

        PR tree-optimization/84512
        * tree-vect-loop.c (vect_compute_single_scalar_iteration_cost):
        Do not use the estimate returned from record_stmt_cost for
        the scalar iteration cost but sum properly using add_stmt_cost.

        * gcc.dg/tree-ssa/pr84512.c: New testcase.

Index: gcc/tree-vect-loop.c
===================================================================
--- gcc/tree-vect-loop.c        (revision 258030)
+++ gcc/tree-vect-loop.c        (working copy)
@@ -1384,16 +1384,10 @@ vect_compute_single_scalar_iteration_cos
 {
   struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo);
   basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo);
-  int nbbs = loop->num_nodes, factor, scalar_single_iter_cost = 0;
+  int nbbs = loop->num_nodes, factor;
   int innerloop_iters, i;
 
-  /* Count statements in scalar loop.  Using this as scalar cost for a single
-     iteration for now.
-
-     TODO: Add outer loop support.
-
-     TODO: Consider assigning different costs to different scalar
-     statements.  */
+  /* Gather costs for statements in the scalar loop.  */
 
   /* FORNOW.  */
   innerloop_iters = 1;
@@ -1437,13 +1431,28 @@ vect_compute_single_scalar_iteration_cos
           else
             kind = scalar_stmt;
 
-         scalar_single_iter_cost
-           += record_stmt_cost (&LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
-                                factor, kind, stmt_info, 0, vect_prologue);
+         record_stmt_cost (&LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
+                           factor, kind, stmt_info, 0, vect_prologue);
         }
     }
-  LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST (loop_vinfo)
-    = scalar_single_iter_cost;
+
+  /* Now accumulate cost.  */
+  void *target_cost_data = init_cost (loop);
+  stmt_info_for_cost *si;
+  int j;
+  FOR_EACH_VEC_ELT (LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo),
+                   j, si)
+    {
+      struct _stmt_vec_info *stmt_info
+       = si->stmt ? vinfo_for_stmt (si->stmt) : NULL;
+      (void) add_stmt_cost (target_cost_data, si->count,
+                           si->kind, stmt_info, si->misalign,
+                           vect_body);
+    }
+  unsigned dummy, body_cost = 0;
+  finish_cost (target_cost_data, &dummy, &body_cost, &dummy);
+  destroy_cost_data (target_cost_data);
+  LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST (loop_vinfo) = body_cost;
 }
 
 
Index: gcc/testsuite/gcc.dg/tree-ssa/pr84512.c
===================================================================
--- gcc/testsuite/gcc.dg/tree-ssa/pr84512.c     (nonexistent)
+++ gcc/testsuite/gcc.dg/tree-ssa/pr84512.c     (working copy)
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-optimized" } */
+
+int foo()
+{
+  int a[10];
+  for(int i = 0; i < 10; ++i)
+    a[i] = i*i;
+  int res = 0;
+  for(int i = 0; i < 10; ++i)
+    res += a[i];
+  return res;
+}
+
+/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */

Reply via email to