The following also accounts for a GPR->XMM move cost for splat
operations and properly guards eliding the cost when moving from
memory only for SSE4.1 or HImode or larger operands.  This
doesn't fix the PR fully yet.

Bootstrapped and tested on x86_64-unknown-linux-gnu, OK?

Thanks,
Richard.

        PR target/109944
        * config/i386/i386.cc (ix86_vector_costs::add_stmt_cost):
        For vector construction or splats apply GPR->XMM move
        costing.  QImode memory can be handled directly only
        with SSE4.1 pinsrb.
---
 gcc/config/i386/i386.cc | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/gcc/config/i386/i386.cc b/gcc/config/i386/i386.cc
index 38125ce284a..011a1fb0d6d 100644
--- a/gcc/config/i386/i386.cc
+++ b/gcc/config/i386/i386.cc
@@ -23654,7 +23654,7 @@ ix86_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
       stmt_cost = ix86_builtin_vectorization_cost (kind, vectype, misalign);
       stmt_cost *= (TYPE_VECTOR_SUBPARTS (vectype) + 1);
     }
-  else if (kind == vec_construct
+  else if ((kind == vec_construct || kind == scalar_to_vec)
           && node
           && SLP_TREE_DEF_TYPE (node) == vect_external_def
           && INTEGRAL_TYPE_P (TREE_TYPE (vectype)))
@@ -23687,7 +23687,9 @@ ix86_vector_costs::add_stmt_cost (int count, 
vect_cost_for_stmt kind,
             Likewise with a BIT_FIELD_REF extracting from a vector
             register we can hope to avoid using a GPR.  */
          if (!is_gimple_assign (def)
-             || (!gimple_assign_load_p (def)
+             || ((!gimple_assign_load_p (def)
+                  || (!TARGET_SSE4_1
+                      && GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (op))) == 1))
                  && (gimple_assign_rhs_code (def) != BIT_FIELD_REF
                      || !VECTOR_TYPE_P (TREE_TYPE
                                (TREE_OPERAND (gimple_assign_rhs1 (def), 0))))))
-- 
2.35.3

Reply via email to