With Honzas change to make the x86 backend consider the actual operation for costing vector stmts it becomes apparent that vect_compute_single_scalar_iteration_cost uses the old-style target cost hook which doesn't get enough information to distinguish different operations. This means instead of actual scalar multiplication cost we cost a general scalar-stmt cost for the testcase for the scalar iteration but cost a vector multiplication for the vectorized body resulting in an apples-to-oranges comparison in the end.
Fixed as follows. Bootstrap and regtest running on x86_64-unknown-linux-gnu. Richard. 2018-02-27 Richard Biener <rguent...@suse.de> PR tree-optimization/84512 * tree-vect-loop.c (vect_compute_single_scalar_iteration_cost): Do not use the estimate returned from record_stmt_cost for the scalar iteration cost but sum properly using add_stmt_cost. * gcc.dg/tree-ssa/pr84512.c: New testcase. Index: gcc/tree-vect-loop.c =================================================================== --- gcc/tree-vect-loop.c (revision 258030) +++ gcc/tree-vect-loop.c (working copy) @@ -1384,16 +1384,10 @@ vect_compute_single_scalar_iteration_cos { struct loop *loop = LOOP_VINFO_LOOP (loop_vinfo); basic_block *bbs = LOOP_VINFO_BBS (loop_vinfo); - int nbbs = loop->num_nodes, factor, scalar_single_iter_cost = 0; + int nbbs = loop->num_nodes, factor; int innerloop_iters, i; - /* Count statements in scalar loop. Using this as scalar cost for a single - iteration for now. - - TODO: Add outer loop support. - - TODO: Consider assigning different costs to different scalar - statements. */ + /* Gather costs for statements in the scalar loop. */ /* FORNOW. */ innerloop_iters = 1; @@ -1437,13 +1431,28 @@ vect_compute_single_scalar_iteration_cos else kind = scalar_stmt; - scalar_single_iter_cost - += record_stmt_cost (&LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo), - factor, kind, stmt_info, 0, vect_prologue); + record_stmt_cost (&LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo), + factor, kind, stmt_info, 0, vect_prologue); } } - LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST (loop_vinfo) - = scalar_single_iter_cost; + + /* Now accumulate cost. */ + void *target_cost_data = init_cost (loop); + stmt_info_for_cost *si; + int j; + FOR_EACH_VEC_ELT (LOOP_VINFO_SCALAR_ITERATION_COST (loop_vinfo), + j, si) + { + struct _stmt_vec_info *stmt_info + = si->stmt ? vinfo_for_stmt (si->stmt) : NULL; + (void) add_stmt_cost (target_cost_data, si->count, + si->kind, stmt_info, si->misalign, + vect_body); + } + unsigned dummy, body_cost = 0; + finish_cost (target_cost_data, &dummy, &body_cost, &dummy); + destroy_cost_data (target_cost_data); + LOOP_VINFO_SINGLE_SCALAR_ITERATION_COST (loop_vinfo) = body_cost; } Index: gcc/testsuite/gcc.dg/tree-ssa/pr84512.c =================================================================== --- gcc/testsuite/gcc.dg/tree-ssa/pr84512.c (nonexistent) +++ gcc/testsuite/gcc.dg/tree-ssa/pr84512.c (working copy) @@ -0,0 +1,15 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -fdump-tree-optimized" } */ + +int foo() +{ + int a[10]; + for(int i = 0; i < 10; ++i) + a[i] = i*i; + int res = 0; + for(int i = 0; i < 10; ++i) + res += a[i]; + return res; +} + +/* { dg-final { scan-tree-dump "return 285;" "optimized" } } */