[PATCH] Fix PR83008

Richard Biener Tue, 30 Jan 2018 02:09:11 -0800

I have been asked to push this change, fixing (somewhat) the impreciseness
of costing constant/invariant vector uses in SLP stmts.  The previous
code always just considered a single constant to be generated in the
prologue irrespective of how many we'd need.  With this patch we
properly handle this count and optimize for the case when we can use
a vector splat.  It doesn't yet handle CSE (or CSE among stmts) which
means it could in theory regress cases it overall costed correctly
before "optimistically" (aka by accident).  But at least the costing
now matches code generation.


Bootstrapped and tested on x86_64-unknown-linux-gnu.  On x86_64
Haswell with AVX2 SPEC 2k6 shows no off-noise changes.

The patch is said to help the case in the PR when additional backend
costing changes are done (for AVX512).

Ok for trunk at this stage?

Thanks,
Richard.

2018-01-30  Richard Biener  <rguent...@suse.de>

        PR tree-optimization/83008
        * tree-vect-slp.c (vect_analyze_slp_cost_1): Properly cost
        invariant and constant vector uses in stmts when they need
        more than one stmt.

Index: gcc/tree-vect-slp.c
===================================================================
--- gcc/tree-vect-slp.c (revision 257047)
+++ gcc/tree-vect-slp.c (working copy)
@@ -1911,18 +1911,56 @@ vect_analyze_slp_cost_1 (slp_instance in
       enum vect_def_type dt;
       if (!op || op == lhs)
        continue;
-      if (vect_is_simple_use (op, stmt_info->vinfo, &def_stmt, &dt))
+      if (vect_is_simple_use (op, stmt_info->vinfo, &def_stmt, &dt)
+         && (dt == vect_constant_def || dt == vect_external_def))
        {
          /* Without looking at the actual initializer a vector of
             constants can be implemented as load from the constant pool.
-            ???  We need to pass down stmt_info for a vector type
-            even if it points to the wrong stmt.  */
-         if (dt == vect_constant_def)
-           record_stmt_cost (prologue_cost_vec, 1, vector_load,
-                             stmt_info, 0, vect_prologue);
-         else if (dt == vect_external_def)
-           record_stmt_cost (prologue_cost_vec, 1, vec_construct,
-                             stmt_info, 0, vect_prologue);
+            When all elements are the same we can use a splat.  */
+         tree vectype = get_vectype_for_scalar_type (TREE_TYPE (op));
+         unsigned group_size = SLP_TREE_SCALAR_STMTS (node).length ();
+         unsigned num_vects_to_check;
+         unsigned HOST_WIDE_INT const_nunits;
+         unsigned nelt_limit;
+         if (TYPE_VECTOR_SUBPARTS (vectype).is_constant (&const_nunits)
+             && ! multiple_p (const_nunits, group_size))
+           {
+             num_vects_to_check = SLP_TREE_NUMBER_OF_VEC_STMTS (node);
+             nelt_limit = const_nunits;
+           }
+         else
+           {
+             /* If either the vector has variable length or the vectors
+                are composed of repeated whole groups we only need to
+                cost construction once.  All vectors will be the same.  */
+             num_vects_to_check = 1;
+             nelt_limit = group_size;
+           }
+         tree elt = NULL_TREE;
+         unsigned nelt = 0;
+         for (unsigned j = 0; j < num_vects_to_check * nelt_limit; ++j)
+           {
+             unsigned si = j % group_size;
+             if (nelt == 0)
+               elt = gimple_op (SLP_TREE_SCALAR_STMTS (node)[si], i);
+             /* ???  We're just tracking whether all operands of a single
+                vector initializer are the same, ideally we'd check if
+                we emitted the same one already.  */
+             else if (elt != gimple_op (SLP_TREE_SCALAR_STMTS (node)[si], i))
+               elt = NULL_TREE;
+             nelt++;
+             if (nelt == nelt_limit)
+               {
+                 /* ???  We need to pass down stmt_info for a vector type
+                    even if it points to the wrong stmt.  */
+                 record_stmt_cost (prologue_cost_vec, 1,
+                                   dt == vect_external_def
+                                   ? (elt ? scalar_to_vec : vec_construct)
+                                   : vector_load,
+                                   stmt_info, 0, vect_prologue);
+                 nelt = 0;
+               }
+           }
        }
     }

[PATCH] Fix PR83008

Reply via email to