Hi,

on 2020/3/18 下午11:10, Richard Biener wrote:
> On Wed, Mar 18, 2020 at 2:56 PM Kewen.Lin <li...@linux.ibm.com> wrote:
>>
>> Hi Richi,
>>
>> Thanks for your comments.
>>
>> on 2020/3/18 下午6:39, Richard Biener wrote:
>>> On Wed, Mar 18, 2020 at 11:06 AM Kewen.Lin <li...@linux.ibm.com> wrote:
>>>>
>> This path can define overrun_p to false, some case can fall into
>> "no peeling for gaps" hunk in vectorizable_load.  Since I used
>> DR_GROUP_HALF_MODE to save the half mode, if some case matches
>> this condition, vectorizable_load hunk can get unitialized
>> DR_GROUP_HALF_MODE.  But even with proposed recomputing way, I
>> think we still need to check the vec_init optab here if the
>> know_eq half size conditions hold?
> 
> Hmm, but for the above case it's fine to access the excess elements.
> 
> I guess the vectorizable_load code needs to be amended with
> the alignment check or we do need to store somewhere our
> decision to use smaller loads.
> 

OK, thanks.  I'll investigate it separately.

>>
>>> I don't like storing DR_GROUP_HALF_MODE very much, later
>>> you need a vector type and it looks cheap enough to recompute
>>> it where you need it?  Iff then it doesn't belong to DR_GROUP
>>> but to the stmt-info.
>>>
>>
>> OK, I was intended not to recompute it for time saving, will
>> throw it away.
>>
>>> I realize the original optimization was kind of a hack (and I was too
>>> lazy to implement the integer mode construction path ...).
>>>
>>> So, can you factor out the existing code into a function returning
>>> the vector type for construction for a vector type and a
>>> pieces size?  So for V16QI and a pieces-size of 4 we'd
>>> get either V16QI back (then construction from V4QI pieces
>>> should work) or V4SI (then construction from SImode pieces
>>> should work)?  Eventually as secondary output provide that
>>> piece type (SI / V4QI).
>>
>> Sure.  I'm very poor to get a function name, does function name
>> suitable_vector_and_pieces sound good?
>>   ie. tree suitable_vector_and_pieces (tree vtype, tree *ptype);
> 
> tree vector_vector_composition_type (tree vtype, poly_uint64 nelts,
> tree *ptype);
> 
> where nelts specifies the number of vtype elements in a piece.
> 

Thanks, yep, "nelts" I forgot to get it.

The new version with refactoring has been attached.
Bootstrapped/regtested on powerpc64le-linux-gnu (LE) P8 and P9.

Is it ok for trunk?

BR,
Kewen
---------
gcc/ChangeLog

2020-MM-DD  Kewen Lin  <li...@gcc.gnu.org>

        PR tree-optimization/90332
        * gcc/tree-vect-stmts.c (vector_vector_composition_type): New function.
        (get_group_load_store_type): Adjust to call 
vector_vector_composition_type,
        extend it to construct with scalar types.
        (vectorizable_load): Likewise.

-------------
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 2ca8e494680..9e00b6f24cd 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -2220,6 +2220,61 @@ vect_get_store_rhs (stmt_vec_info stmt_info)
   gcc_unreachable ();
 }
 
+/* Function VECTOR_VECTOR_COMPOSITION_TYPE
+
+   This function returns a vector type which can be composed with NETLS pieces,
+   whose type is recorded in PTYPE.  VTYPE should be a vector type, and has the
+   same vector size as the return vector.  It checks target whether supports
+   pieces-size vector mode for construction firstly, if target fails to, check
+   pieces-size scalar mode for construction further.  It returns NULL_TREE if
+   fails to find the available composition.
+
+   For example, for (vtype=V16QI, nelts=4), we can probably get:
+     - V16QI with PTYPE V4QI.
+     - V4SI with PTYPE SI.
+     - NULL_TREE.  */
+
+static tree
+vector_vector_composition_type (tree vtype, poly_uint64 nelts, tree *ptype)
+{
+  gcc_assert (VECTOR_TYPE_P (vtype));
+  gcc_assert (known_gt (nelts, 0U));
+
+  machine_mode vmode = TYPE_MODE (vtype);
+  if (!VECTOR_MODE_P (vmode))
+    return NULL_TREE;
+
+  poly_uint64 vbsize = GET_MODE_BITSIZE (vmode);
+  unsigned int pbsize;
+  if (constant_multiple_p (vbsize, nelts, &pbsize))
+    {
+      /* First check if vec_init optab supports construction from
+        vector pieces directly.  */
+      scalar_mode elmode = SCALAR_TYPE_MODE (TREE_TYPE (vtype));
+      machine_mode rmode;
+      if (related_vector_mode (vmode, elmode, nelts).exists (&rmode)
+         && (convert_optab_handler (vec_init_optab, vmode, rmode)
+             != CODE_FOR_nothing))
+       {
+         *ptype = build_vector_type (TREE_TYPE (vtype), nelts);
+         return vtype;
+       }
+
+      /* Otherwise check if exists an integer type of the same piece size and
+        if vec_init optab supports construction from it directly.  */
+      if (int_mode_for_size (pbsize, 0).exists (&elmode)
+         && related_vector_mode (vmode, elmode, nelts).exists (&rmode)
+         && (convert_optab_handler (vec_init_optab, rmode, elmode)
+             != CODE_FOR_nothing))
+       {
+         *ptype = build_nonstandard_integer_type (pbsize, 1);
+         return build_vector_type (*ptype, nelts);
+       }
+    }
+
+  return NULL_TREE;
+}
+
 /* A subroutine of get_load_store_type, with a subset of the same
    arguments.  Handle the case where STMT_INFO is part of a grouped load
    or store.
@@ -2300,8 +2355,7 @@ get_group_load_store_type (stmt_vec_info stmt_info, tree 
vectype, bool slp,
             by simply loading half of the vector only.  Usually
             the construction with an upper zero half will be elided.  */
          dr_alignment_support alignment_support_scheme;
-         scalar_mode elmode = SCALAR_TYPE_MODE (TREE_TYPE (vectype));
-         machine_mode vmode;
+         tree half_vtype;
          if (overrun_p
              && !masked_p
              && (((alignment_support_scheme
@@ -2310,12 +2364,8 @@ get_group_load_store_type (stmt_vec_info stmt_info, tree 
vectype, bool slp,
                  || alignment_support_scheme == dr_unaligned_supported)
              && known_eq (nunits, (group_size - gap) * 2)
              && known_eq (nunits, group_size)
-             && VECTOR_MODE_P (TYPE_MODE (vectype))
-             && related_vector_mode (TYPE_MODE (vectype), elmode,
-                                     group_size - gap).exists (&vmode)
-             && (convert_optab_handler (vec_init_optab,
-                                        TYPE_MODE (vectype), vmode)
-                 != CODE_FOR_nothing))
+             && (vector_vector_composition_type (vectype, 2, &half_vtype)
+                 != NULL_TREE))
            overrun_p = false;
 
          if (overrun_p && !can_overrun_p)
@@ -8915,47 +8965,24 @@ vectorizable_load (stmt_vec_info stmt_info, 
gimple_stmt_iterator *gsi,
        {
          if (group_size < const_nunits)
            {
-             /* First check if vec_init optab supports construction from
-                vector elts directly.  */
-             scalar_mode elmode = SCALAR_TYPE_MODE (TREE_TYPE (vectype));
-             machine_mode vmode;
-             if (VECTOR_MODE_P (TYPE_MODE (vectype))
-                 && related_vector_mode (TYPE_MODE (vectype), elmode,
-                                         group_size).exists (&vmode)
-                 && (convert_optab_handler (vec_init_optab,
-                                            TYPE_MODE (vectype), vmode)
-                     != CODE_FOR_nothing))
+             /* First check if vec_init optab supports construction from vector
+                elts directly.  Otherwise avoid emitting a constructor of
+                vector elements by performing the loads using an integer type
+                of the same size, constructing a vector of those and then
+                re-interpreting it as the original vector type.  This avoids a
+                huge runtime penalty due to the general inability to perform
+                store forwarding from smaller stores to a larger load.  */
+             tree ptype;
+             tree vtype
+               = vector_vector_composition_type (vectype,
+                                                 const_nunits / group_size,
+                                                 &ptype);
+             if (vtype != NULL_TREE)
                {
                  nloads = const_nunits / group_size;
                  lnel = group_size;
-                 ltype = build_vector_type (TREE_TYPE (vectype), group_size);
-               }
-             else
-               {
-                 /* Otherwise avoid emitting a constructor of vector elements
-                    by performing the loads using an integer type of the same
-                    size, constructing a vector of those and then
-                    re-interpreting it as the original vector type.
-                    This avoids a huge runtime penalty due to the general
-                    inability to perform store forwarding from smaller stores
-                    to a larger load.  */
-                 unsigned lsize
-                   = group_size * TYPE_PRECISION (TREE_TYPE (vectype));
-                 unsigned int lnunits = const_nunits / group_size;
-                 /* If we can't construct such a vector fall back to
-                    element loads of the original vector type.  */
-                 if (int_mode_for_size (lsize, 0).exists (&elmode)
-                     && VECTOR_MODE_P (TYPE_MODE (vectype))
-                     && related_vector_mode (TYPE_MODE (vectype), elmode,
-                                             lnunits).exists (&vmode)
-                     && (convert_optab_handler (vec_init_optab, vmode, elmode)
-                         != CODE_FOR_nothing))
-                   {
-                     nloads = lnunits;
-                     lnel = group_size;
-                     ltype = build_nonstandard_integer_type (lsize, 1);
-                     lvectype = build_vector_type (ltype, nloads);
-                   }
+                 lvectype = vtype;
+                 ltype = ptype;
                }
            }
          else
@@ -9541,6 +9568,7 @@ vectorizable_load (stmt_vec_info stmt_info, 
gimple_stmt_iterator *gsi,
                    else
                      {
                        tree ltype = vectype;
+                       tree new_vtype = NULL_TREE;
                        /* If there's no peeling for gaps but we have a gap
                           with slp loads then load the lower half of the
                           vector only.  See get_group_load_store_type for
@@ -9553,10 +9581,14 @@ vectorizable_load (stmt_vec_info stmt_info, 
gimple_stmt_iterator *gsi,
                                         (group_size
                                          - DR_GROUP_GAP (first_stmt_info)) * 2)
                            && known_eq (nunits, group_size))
-                         ltype = build_vector_type (TREE_TYPE (vectype),
-                                                    (group_size
-                                                     - DR_GROUP_GAP
-                                                         (first_stmt_info)));
+                         {
+                           tree half_vtype;
+                           new_vtype
+                             = vector_vector_composition_type (vectype, 2,
+                                                               &half_vtype);
+                           if (new_vtype != NULL_TREE)
+                             ltype = half_vtype;
+                         }
                        data_ref
                          = fold_build2 (MEM_REF, ltype, dataref_ptr,
                                         dataref_offset
@@ -9584,10 +9616,21 @@ vectorizable_load (stmt_vec_info stmt_info, 
gimple_stmt_iterator *gsi,
                            CONSTRUCTOR_APPEND_ELT (v, NULL_TREE, tem);
                            CONSTRUCTOR_APPEND_ELT (v, NULL_TREE,
                                                    build_zero_cst (ltype));
-                           new_stmt
-                             = gimple_build_assign (vec_dest,
-                                                    build_constructor
-                                                      (vectype, v));
+                           gcc_assert (new_vtype != NULL_TREE);
+                           if (new_vtype == vectype)
+                             new_stmt = gimple_build_assign (
+                               vec_dest, build_constructor (vectype, v));
+                           else
+                             {
+                               tree new_vname = make_ssa_name (new_vtype);
+                               new_stmt = gimple_build_assign (
+                                 new_vname, build_constructor (new_vtype, v));
+                               vect_finish_stmt_generation (stmt_info,
+                                                            new_stmt, gsi);
+                               new_stmt = gimple_build_assign (
+                                 vec_dest, build1 (VIEW_CONVERT_EXPR, vectype,
+                                                   new_vname));
+                             }
                          }
                      }
                    break;

Reply via email to