RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

Tamar Christina via Gcc-patches Mon, 21 Jun 2021 01:12:26 -0700

Ping


> -----Original Message-----
> From: Gcc-patches <gcc-patches-
> bounces+tamar.christina=arm....@gcc.gnu.org> On Behalf Of Tamar
> Christina via Gcc-patches
> Sent: Monday, June 14, 2021 1:06 PM
> To: Richard Sandiford <richard.sandif...@arm.com>
> Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>; Richard Biener
> <rguent...@suse.de>
> Subject: RE: [PATCH 1/4]middle-end Vect: Add support for dot-product
> where the sign for the multiplicant changes.
> 
> Hi Richard,
> 
> I've attached a new version of the patch with the changes.
> I have also added 7 new tests in the testsuite to check the cases you
> mentioned.
> 
> Bootstrapped Regtested on aarch64-none-linux-gnu and no issues.
> 
> Ok for master?
> 
> Thanks,
> Tamar
> 
> gcc/ChangeLog:
> 
>       * optabs.def (usdot_prod_optab): New.
>       * doc/md.texi: Document it and clarify other dot prod optabs.
>       * optabs-tree.h (enum optab_subtype): Add
> optab_vector_mixed_sign.
>       * optabs-tree.c (optab_for_tree_code): Support usdot_prod_optab.
>       * optabs.c (expand_widen_pattern_expr): Likewise.
>       * tree-cfg.c (verify_gimple_assign_ternary): Likewise.
>       * tree-vect-loop.c (vectorizable_reduction): Query dot-product kind.
>       * tree-vect-patterns.c (vect_supportable_direct_optab_p): Take
> optional
>       optab subtype.
>       (vect_widened_op_tree): Optionally ignore
>       mismatch types.
>       (vect_recog_dot_prod_pattern): Support usdot_prod_optab.
> 
> --- inline copy of patch ---
> 
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi index
> 00caf3844ccf8ea289d581839766502d51b9e8d7..1356afb7f903f17c198103562b
> 5cd145ecb9966f 100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -5446,13 +5446,55 @@ Like @samp{fold_left_plus_@var{m}}, but takes
> an additional mask operand
> 
>  @cindex @code{sdot_prod@var{m}} instruction pattern  @item
> @samp{sdot_prod@var{m}}
> +
> +Compute the sum of the products of two signed elements.
> +Operand 1 and operand 2 are of the same mode. Their product, which is
> +of a wider mode, is computed and added to operand 3.
> +Operand 3 is of a mode equal or wider than the mode of the product. The
> +result is placed in operand 0, which is of the same mode as operand 3.
> +
> +Semantically the expressions perform the multiplication in the
> +following signs
> +
> +@smallexample
> +sdot<signed c, signed a, signed b> ==
> +   res = sign-ext (a) * sign-ext (b) + c @dots{} @end smallexample
> +
>  @cindex @code{udot_prod@var{m}} instruction pattern -@itemx
> @samp{udot_prod@var{m}} -Compute the sum of the products of two
> signed/unsigned elements.
> -Operand 1 and operand 2 are of the same mode. Their product, which is of a
> -wider mode, is computed and added to operand 3. Operand 3 is of a mode
> equal or -wider than the mode of the product. The result is placed in operand
> 0, which -is of the same mode as operand 3.
> +@item @samp{udot_prod@var{m}}
> +
> +Compute the sum of the products of two unsigned elements.
> +Operand 1 and operand 2 are of the same mode. Their product, which is
> +of a wider mode, is computed and added to operand 3.
> +Operand 3 is of a mode equal or wider than the mode of the product. The
> +result is placed in operand 0, which is of the same mode as operand 3.
> +
> +Semantically the expressions perform the multiplication in the
> +following signs
> +
> +@smallexample
> +udot<unsigned c, unsigned a, unsigned b> ==
> +   res = zero-ext (a) * zero-ext (b) + c @dots{} @end smallexample
> +
> +
> +
> +@cindex @code{usdot_prod@var{m}} instruction pattern
> +@item @samp{usdot_prod@var{m}}
> +Compute the sum of the products of elements of different signs.
> +Operand 1 must be unsigned and operand 2 signed. Their
> +product, which is of a wider mode, is computed and added to operand 3.
> +Operand 3 is of a mode equal or wider than the mode of the product. The
> +result is placed in operand 0, which is of the same mode as operand 3.
> +
> +Semantically the expressions perform the multiplication in the following
> signs
> +
> +@smallexample
> +usdot<unsigned c, unsigned a, signed b> ==
> +   res = ((unsigned-conv) sign-ext (a)) * zero-ext (b) + c
> +@dots{}
> +@end smallexample
> 
>  @cindex @code{ssad@var{m}} instruction pattern
>  @item @samp{ssad@var{m}}
> diff --git a/gcc/optabs-tree.h b/gcc/optabs-tree.h
> index
> c3aaa1a416991e856d3e24da45968a92ebada82c..fbd2b06b8dbfd560dfb66b31
> 4830e6b564b37abb 100644
> --- a/gcc/optabs-tree.h
> +++ b/gcc/optabs-tree.h
> @@ -29,7 +29,8 @@ enum optab_subtype
>  {
>    optab_default,
>    optab_scalar,
> -  optab_vector
> +  optab_vector,
> +  optab_vector_mixed_sign
>  };
> 
>  /* Return the optab used for computing the given operation on the type
> given by
> diff --git a/gcc/optabs-tree.c b/gcc/optabs-tree.c
> index
> 95ffe397c23e80c105afea52e9d47216bf52f55a..eeb5aeed3202cc6971b6447994
> bc5311e9c010bb 100644
> --- a/gcc/optabs-tree.c
> +++ b/gcc/optabs-tree.c
> @@ -127,7 +127,12 @@ optab_for_tree_code (enum tree_code code,
> const_tree type,
>        return TYPE_UNSIGNED (type) ? usum_widen_optab :
> ssum_widen_optab;
> 
>      case DOT_PROD_EXPR:
> -      return TYPE_UNSIGNED (type) ? udot_prod_optab : sdot_prod_optab;
> +      {
> +     if (subtype == optab_vector_mixed_sign)
> +       return usdot_prod_optab;
> +
> +     return (TYPE_UNSIGNED (type) ? udot_prod_optab :
> sdot_prod_optab);
> +      }
> 
>      case SAD_EXPR:
>        return TYPE_UNSIGNED (type) ? usad_optab : ssad_optab;
> diff --git a/gcc/optabs.c b/gcc/optabs.c
> index
> 62a6bdb4c59bf8263c499245795576199606d372..14d8ad2f33fd75388435fe9123
> 80e177f8f3c54b 100644
> --- a/gcc/optabs.c
> +++ b/gcc/optabs.c
> @@ -262,6 +262,11 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
> rtx op1, rtx wide_op,
>    bool sbool = false;
> 
>    oprnd0 = ops->op0;
> +  if (nops >= 2)
> +    oprnd1 = ops->op1;
> +  if (nops >= 3)
> +    oprnd2 = ops->op2;
> +
>    tmode0 = TYPE_MODE (TREE_TYPE (oprnd0));
>    if (ops->code == VEC_UNPACK_FIX_TRUNC_HI_EXPR
>        || ops->code == VEC_UNPACK_FIX_TRUNC_LO_EXPR)
> @@ -285,6 +290,27 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
> rtx op1, rtx wide_op,
>          ? vec_unpacks_sbool_hi_optab : vec_unpacks_sbool_lo_optab);
>        sbool = true;
>      }
> +  else if (ops->code == DOT_PROD_EXPR)
> +    {
> +      enum optab_subtype subtype = optab_default;
> +      signop sign1 = TYPE_SIGN (TREE_TYPE (oprnd0));
> +      signop sign2 = TYPE_SIGN (TREE_TYPE (oprnd1));
> +      if (sign1 == sign2)
> +     ;
> +      else if (sign1 == SIGNED && sign2 == UNSIGNED)
> +     {
> +       subtype = optab_vector_mixed_sign;
> +       /* Same as optab_vector_mixed_sign but flip the operands.  */
> +       std::swap (op0, op1);
> +     }
> +      else if (sign1 == UNSIGNED && sign2 == SIGNED)
> +     subtype = optab_vector_mixed_sign;
> +      else
> +     gcc_unreachable ();
> +
> +      widen_pattern_optab
> +     = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), subtype);
> +    }
>    else
>      widen_pattern_optab
>        = optab_for_tree_code (ops->code, TREE_TYPE (oprnd0), optab_default);
> @@ -298,10 +324,7 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
> rtx op1, rtx wide_op,
>    gcc_assert (icode != CODE_FOR_nothing);
> 
>    if (nops >= 2)
> -    {
> -      oprnd1 = ops->op1;
> -      tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
> -    }
> +    tmode1 = TYPE_MODE (TREE_TYPE (oprnd1));
>    else if (sbool)
>      {
>        nops = 2;
> @@ -316,7 +339,6 @@ expand_widen_pattern_expr (sepops ops, rtx op0,
> rtx op1, rtx wide_op,
>      {
>        gcc_assert (tmode1 == tmode0);
>        gcc_assert (op1);
> -      oprnd2 = ops->op2;
>        wmode = TYPE_MODE (TREE_TYPE (oprnd2));
>      }
> 
> diff --git a/gcc/optabs.def b/gcc/optabs.def
> index
> b192a9d070b8aa72e5676b2eaa020b5bdd7ffcc8..f470c2168378cec840edf7fbd
> b7c18615baae928 100644
> --- a/gcc/optabs.def
> +++ b/gcc/optabs.def
> @@ -352,6 +352,7 @@ OPTAB_D (uavg_ceil_optab, "uavg$a3_ceil")
>  OPTAB_D (sdot_prod_optab, "sdot_prod$I$a")
>  OPTAB_D (ssum_widen_optab, "widen_ssum$I$a3")
>  OPTAB_D (udot_prod_optab, "udot_prod$I$a")
> +OPTAB_D (usdot_prod_optab, "usdot_prod$I$a")
>  OPTAB_D (usum_widen_optab, "widen_usum$I$a3")
>  OPTAB_D (usad_optab, "usad$I$a")
>  OPTAB_D (ssad_optab, "ssad$I$a")
> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> index
> 02256580c986be426564adc1105ed2e1c69b0efc..f250f0fe99bec5278a0963e92b
> c1d2a61d9eee70 100644
> --- a/gcc/tree-cfg.c
> +++ b/gcc/tree-cfg.c
> @@ -4412,7 +4412,8 @@ verify_gimple_assign_ternary (gassign *stmt)
>                 && !SCALAR_FLOAT_TYPE_P (rhs1_type))
>                || (!INTEGRAL_TYPE_P (lhs_type)
>                    && !SCALAR_FLOAT_TYPE_P (lhs_type))))
> -         || !types_compatible_p (rhs1_type, rhs2_type)
> +         /* rhs1_type and rhs2_type may differ in sign.  */
> +         || !tree_nop_conversion_p (rhs1_type, rhs2_type)
>           || !useless_type_conversion_p (lhs_type, rhs3_type)
>           || maybe_lt (GET_MODE_SIZE (element_mode (rhs3_type)),
>                        2 * GET_MODE_SIZE (element_mode (rhs1_type))))
> diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
> index
> ee79808472cea88786e5c04756980b456c3f5a02..d2accf3c35ade25e8d2ff4ee88
> 136651e3e87c74 100644
> --- a/gcc/tree-vect-loop.c
> +++ b/gcc/tree-vect-loop.c
> @@ -6663,6 +6663,12 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>    bool lane_reduc_code_p
>      = (code == DOT_PROD_EXPR || code == WIDEN_SUM_EXPR || code ==
> SAD_EXPR);
>    int op_type = TREE_CODE_LENGTH (code);
> +  enum optab_subtype optab_query_kind = optab_vector;
> +  if (code == DOT_PROD_EXPR
> +      && TYPE_SIGN (TREE_TYPE (gimple_assign_rhs1 (stmt)))
> +        != TYPE_SIGN (TREE_TYPE (gimple_assign_rhs2 (stmt))))
> +    optab_query_kind = optab_vector_mixed_sign;
> +
> 
>    scalar_dest = gimple_assign_lhs (stmt);
>    scalar_type = TREE_TYPE (scalar_dest);
> @@ -7190,7 +7196,7 @@ vectorizable_reduction (loop_vec_info loop_vinfo,
>        bool ok = true;
> 
>        /* 4.1. check support for the operation in the loop  */
> -      optab optab = optab_for_tree_code (code, vectype_in, optab_vector);
> +      optab optab = optab_for_tree_code (code, vectype_in,
> optab_query_kind);
>        if (!optab)
>       {
>         if (dump_enabled_p ())
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index
> c6b6feadb8d8d5cc57ded192cd68dd54b9185aef..77605e55dec7b4f6b0a1e1fd
> afa6313b987fa12c 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -191,9 +191,9 @@ vect_get_external_def_edge (vec_info *vinfo, tree
> var)
>  }
> 
>  /* Return true if the target supports a vector version of CODE,
> -   where CODE is known to map to a direct optab.  ITYPE specifies
> -   the type of (some of) the scalar inputs and OTYPE specifies the
> -   type of the scalar result.
> +   where CODE is known to map to a direct optab with the given SUBTYPE.
> +   ITYPE specifies the type of (some of) the scalar inputs and OTYPE
> +   specifies the type of the scalar result.
> 
>     If CODE allows the inputs and outputs to have different type
>     (such as for WIDEN_SUM_EXPR), it is the input mode rather
> @@ -208,7 +208,8 @@ vect_get_external_def_edge (vec_info *vinfo, tree
> var)
>  static bool
>  vect_supportable_direct_optab_p (vec_info *vinfo, tree otype, tree_code
> code,
>                                tree itype, tree *vecotype_out,
> -                              tree *vecitype_out = NULL)
> +                              tree *vecitype_out = NULL,
> +                              enum optab_subtype subtype =
> optab_default)
>  {
>    tree vecitype = get_vectype_for_scalar_type (vinfo, itype);
>    if (!vecitype)
> @@ -218,7 +219,7 @@ vect_supportable_direct_optab_p (vec_info *vinfo,
> tree otype, tree_code code,
>    if (!vecotype)
>      return false;
> 
> -  optab optab = optab_for_tree_code (code, vecitype, optab_default);
> +  optab optab = optab_for_tree_code (code, vecitype, subtype);
>    if (!optab)
>      return false;
> 
> @@ -521,6 +522,9 @@ vect_joust_widened_type (tree type, tree new_type,
> tree *common_type)
>    unsigned int precision = MAX (TYPE_PRECISION (*common_type),
>                               TYPE_PRECISION (new_type));
>    precision *= 2;
> +
> +  /* The resulting application is unsigned, check if we have enough
> +     precision to perform the operation.  */
>    if (precision * 2 > TYPE_PRECISION (type))
>      return false;
> 
> @@ -539,6 +543,10 @@ vect_joust_widened_type (tree type, tree
> new_type, tree *common_type)
>     to a type that (a) is narrower than the result of STMT_INFO and
>     (b) can hold all leaf operand values.
> 
> +   If SUBTYPE then allow that the signs of the operands
> +   may differ in signs but not in precision.  SUBTYPE is updated to reflect
> +   this.
> +
>     Return 0 if STMT_INFO isn't such a tree, or if no such COMMON_TYPE
>     exists.  */
> 
> @@ -546,7 +554,8 @@ static unsigned int
>  vect_widened_op_tree (vec_info *vinfo, stmt_vec_info stmt_info,
> tree_code code,
>                     tree_code widened_code, bool shift_p,
>                     unsigned int max_nops,
> -                   vect_unpromoted_value *unprom, tree *common_type)
> +                   vect_unpromoted_value *unprom, tree *common_type,
> +                   enum optab_subtype *subtype = NULL)
>  {
>    /* Check for an integer operation with the right code.  */
>    gassign *assign = dyn_cast <gassign *> (stmt_info->stmt);
> @@ -607,7 +616,8 @@ vect_widened_op_tree (vec_info *vinfo,
> stmt_vec_info stmt_info, tree_code code,
>               = vinfo->lookup_def (this_unprom->op);
>             nops = vect_widened_op_tree (vinfo, def_stmt_info, code,
>                                          widened_code, shift_p, max_nops,
> -                                        this_unprom, common_type);
> +                                        this_unprom, common_type,
> +                                        subtype);
>             if (nops == 0)
>               return 0;
> 
> @@ -625,7 +635,24 @@ vect_widened_op_tree (vec_info *vinfo,
> stmt_vec_info stmt_info, tree_code code,
>               *common_type = this_unprom->type;
>             else if (!vect_joust_widened_type (type, this_unprom->type,
>                                                common_type))
> -             return 0;
> +             {
> +               if (subtype)
> +                 {
> +                   tree new_type = *common_type;
> +                   /* See if we can sign extend the smaller type.  */
> +                   if (TYPE_PRECISION (this_unprom->type) >
> TYPE_PRECISION (new_type)
> +                       && (TYPE_UNSIGNED (this_unprom->type)
> && !TYPE_UNSIGNED (new_type)))
> +                     new_type = build_nonstandard_integer_type
> (TYPE_PRECISION (this_unprom->type), true);
> +
> +                   if (tree_nop_conversion_p (this_unprom->type,
> new_type))
> +                     {
> +                       *subtype = optab_vector_mixed_sign;
> +                       *common_type = new_type;
> +                     }
> +                 }
> +               else
> +                 return 0;
> +             }
>           }
>       }
>        next_op += nops;
> @@ -806,12 +833,15 @@ vect_convert_input (vec_info *vinfo,
> stmt_vec_info stmt_info, tree type,
>  }
> 
>  /* Invoke vect_convert_input for N elements of UNPROM and store the
> -   result in the corresponding elements of RESULT.  */
> +   result in the corresponding elements of RESULT.
> +
> +   If SUBTYPE then don't convert the types if they only
> +   differ by sign.  */
> 
>  static void
>  vect_convert_inputs (vec_info *vinfo, stmt_vec_info stmt_info, unsigned
> int n,
>                    tree *result, tree type, vect_unpromoted_value *unprom,
> -                  tree vectype)
> +                  tree vectype, enum optab_subtype subtype =
> optab_default)
>  {
>    for (unsigned int i = 0; i < n; ++i)
>      {
> @@ -819,8 +849,12 @@ vect_convert_inputs (vec_info *vinfo,
> stmt_vec_info stmt_info, unsigned int n,
>        for (j = 0; j < i; ++j)
>       if (unprom[j].op == unprom[i].op)
>         break;
> +
>        if (j < i)
>       result[i] = result[j];
> +      else if (subtype == optab_vector_mixed_sign
> +            && tree_nop_conversion_p (type, unprom[i].type))
> +     result[i] = unprom[i].op;
>        else
>       result[i] = vect_convert_input (vinfo, stmt_info,
>                                       type, &unprom[i], vectype);
> @@ -895,7 +929,8 @@ vect_reassociating_reduction_p (vec_info *vinfo,
> 
>     Try to find the following pattern:
> 
> -     type x_t, y_t;
> +     type1a x_t
> +     type1b y_t;
>       TYPE1 prod;
>       TYPE2 sum = init;
>     loop:
> @@ -908,8 +943,10 @@ vect_reassociating_reduction_p (vec_info *vinfo,
>       [S6  prod = (TYPE2) prod;  #optional]
>       S7  sum_1 = prod + sum_0;
> 
> -   where 'TYPE1' is exactly double the size of type 'type', and 'TYPE2' is 
> the
> -   same size of 'TYPE1' or bigger. This is a special case of a reduction
> +   where 'TYPE1' is exactly double the size of type 'type1a' and 'type1b',
> +   the sign of 'TYPE1' must be one of 'type1a' or 'type1b' but the sign of
> +   'type1a' and 'type1b' can differ. 'TYPE2' is the same size of 'TYPE1' or
> +   bigger and must be the same sign. This is a special case of a reduction
>     computation.
> 
>     Input:
> @@ -946,15 +983,15 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
> 
>    /* Look for the following pattern
>            DX = (TYPE1) X;
> -          DY = (TYPE1) Y;
> +       DY = (TYPE1) Y;
>            DPROD = DX * DY;
> -          DDPROD = (TYPE2) DPROD;
> +       DDPROD = (TYPE2) DPROD;
>            sum_1 = DDPROD + sum_0;
>       In which
>       - DX is double the size of X
>       - DY is double the size of Y
>       - DX, DY, DPROD all have the same type but the sign
> -       between DX, DY and DPROD can differ.
> +       between X, Y and DPROD can differ.
>       - sum is the same size of DPROD or bigger
>       - sum has been recognized as a reduction variable.
> 
> @@ -992,21 +1029,27 @@ vect_recog_dot_prod_pattern (vec_info *vinfo,
>    /* FORNOW.  Can continue analyzing the def-use chain when this stmt in a
> phi
>       inside the loop (in case we are analyzing an outer-loop).  */
>    vect_unpromoted_value unprom0[2];
> +  enum optab_subtype subtype = optab_vector;
>    if (!vect_widened_op_tree (vinfo, mult_vinfo, MULT_EXPR,
> WIDEN_MULT_EXPR,
> -                          false, 2, unprom0, &half_type))
> +                          false, 2, unprom0, &half_type, &subtype))
> +    return NULL;
> +
> +  if (subtype == optab_vector_mixed_sign
> +      && TYPE_UNSIGNED (unprom_mult.type)
> +      && TYPE_PRECISION (half_type) * 4 > TYPE_PRECISION
> (unprom_mult.type))
>      return NULL;
> 
>    vect_pattern_detected ("vect_recog_dot_prod_pattern", last_stmt);
> 
>    tree half_vectype;
>    if (!vect_supportable_direct_optab_p (vinfo, type, DOT_PROD_EXPR,
> half_type,
> -                                     type_out, &half_vectype))
> +                                     type_out, &half_vectype, subtype))
>      return NULL;
> 
>    /* Get the inputs in the appropriate types.  */
>    tree mult_oprnd[2];
>    vect_convert_inputs (vinfo, stmt_vinfo, 2, mult_oprnd, half_type,
> -                    unprom0, half_vectype);
> +                    unprom0, half_vectype, subtype);
> 
>    var = vect_recog_temp_ssa_var (type, NULL);
>    pattern_stmt = gimple_build_assign (var, DOT_PROD_EXPR,

RE: [PATCH 1/4]middle-end Vect: Add support for dot-product where the sign for the multiplicant changes.

Reply via email to