I have been asked to push this change, fixing (somewhat) the impreciseness of costing constant/invariant vector uses in SLP stmts. The previous code always just considered a single constant to be generated in the prologue irrespective of how many we'd need. With this patch we properly handle this count and optimize for the case when we can use a vector splat. It doesn't yet handle CSE (or CSE among stmts) which means it could in theory regress cases it overall costed correctly before "optimistically" (aka by accident). But at least the costing now matches code generation.
Bootstrapped and tested on x86_64-unknown-linux-gnu. On x86_64 Haswell with AVX2 SPEC 2k6 shows no off-noise changes. The patch is said to help the case in the PR when additional backend costing changes are done (for AVX512). Ok for trunk at this stage? Thanks, Richard. 2018-01-30 Richard Biener <rguent...@suse.de> PR tree-optimization/83008 * tree-vect-slp.c (vect_analyze_slp_cost_1): Properly cost invariant and constant vector uses in stmts when they need more than one stmt. Index: gcc/tree-vect-slp.c =================================================================== --- gcc/tree-vect-slp.c (revision 257047) +++ gcc/tree-vect-slp.c (working copy) @@ -1911,18 +1911,56 @@ vect_analyze_slp_cost_1 (slp_instance in enum vect_def_type dt; if (!op || op == lhs) continue; - if (vect_is_simple_use (op, stmt_info->vinfo, &def_stmt, &dt)) + if (vect_is_simple_use (op, stmt_info->vinfo, &def_stmt, &dt) + && (dt == vect_constant_def || dt == vect_external_def)) { /* Without looking at the actual initializer a vector of constants can be implemented as load from the constant pool. - ??? We need to pass down stmt_info for a vector type - even if it points to the wrong stmt. */ - if (dt == vect_constant_def) - record_stmt_cost (prologue_cost_vec, 1, vector_load, - stmt_info, 0, vect_prologue); - else if (dt == vect_external_def) - record_stmt_cost (prologue_cost_vec, 1, vec_construct, - stmt_info, 0, vect_prologue); + When all elements are the same we can use a splat. */ + tree vectype = get_vectype_for_scalar_type (TREE_TYPE (op)); + unsigned group_size = SLP_TREE_SCALAR_STMTS (node).length (); + unsigned num_vects_to_check; + unsigned HOST_WIDE_INT const_nunits; + unsigned nelt_limit; + if (TYPE_VECTOR_SUBPARTS (vectype).is_constant (&const_nunits) + && ! multiple_p (const_nunits, group_size)) + { + num_vects_to_check = SLP_TREE_NUMBER_OF_VEC_STMTS (node); + nelt_limit = const_nunits; + } + else + { + /* If either the vector has variable length or the vectors + are composed of repeated whole groups we only need to + cost construction once. All vectors will be the same. */ + num_vects_to_check = 1; + nelt_limit = group_size; + } + tree elt = NULL_TREE; + unsigned nelt = 0; + for (unsigned j = 0; j < num_vects_to_check * nelt_limit; ++j) + { + unsigned si = j % group_size; + if (nelt == 0) + elt = gimple_op (SLP_TREE_SCALAR_STMTS (node)[si], i); + /* ??? We're just tracking whether all operands of a single + vector initializer are the same, ideally we'd check if + we emitted the same one already. */ + else if (elt != gimple_op (SLP_TREE_SCALAR_STMTS (node)[si], i)) + elt = NULL_TREE; + nelt++; + if (nelt == nelt_limit) + { + /* ??? We need to pass down stmt_info for a vector type + even if it points to the wrong stmt. */ + record_stmt_cost (prologue_cost_vec, 1, + dt == vect_external_def + ? (elt ? scalar_to_vec : vec_construct) + : vector_load, + stmt_info, 0, vect_prologue); + nelt = 0; + } + } } }