Re: [AArch64] Enable generation of FRINTNZ instructions

Richard Sandiford via Gcc-patches Fri, 17 Dec 2021 04:44:40 -0800

"Andre Vieira (lists) via Gcc-patches" <gcc-patches@gcc.gnu.org> writes:
> On 22/11/2021 11:41, Richard Biener wrote:
>>
>>> On 18/11/2021 11:05, Richard Biener wrote:
>>>> This is a good shout and made me think about something I hadn't before... I
>>>> thought I could handle the vector forms later, but the problem is if I add
>>>> support for the scalar, it will stop the vectorizer. It seems
>>>> vectorizable_call expects all arguments to have the same type, which 
>>>> doesn't
>>>> work with passing the integer type as an operand work around.
>> We already special case some IFNs there (masked load/store and gather)
>> to ignore some args, so that would just add to this set.
>>
>> Richard.
> Hi,
>
> Reworked it to add support of the new IFN to the vectorizer. Was 
> initially trying to make vectorizable_call and 
> vectorizable_internal_function handle IFNs with different inputs more 
> generically, using the information we have in the <IFN>_direct structs 
> regarding what operands to get the modes from. Unfortunately, that 
> wasn't straightforward because of how vectorizable_call assumes operands 
> have the same type and uses the type of the DEF_STMT_INFO of the 
> non-constant operands (either output operand or non-constant inputs) to 
> determine the type of constants. I assume there is some reason why we 
> use the DEF_STMT_INFO and not always use get_vectype_for_scalar_type on 
> the argument types. That is why I ended up with this sort of half-way 
> mix of both, which still allows room to add more IFNs that don't take 
> inputs of the same type, but require adding a bit of special casing 
> similar to the IFN_FTRUNC_INT and masking ones.
>
> Bootstrapped on aarch64-none-linux.


Still leaving the match.pd stuff to Richard, but otherwise:

> index 
> bdc8ba3576cf2c9b4ae96b45a382234e4e25b13f..51f00344b02d0d1d4adf97463f6a46f9fd0fb43f
>  100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -160,7 +160,11 @@ (define_mode_iterator VHSDF_HSDF [(V4HF 
> "TARGET_SIMD_F16INST")
>                                 SF DF])
>  
>  ;; Scalar and vetor modes for SF, DF.
> -(define_mode_iterator VSFDF [V2SF V4SF V2DF DF SF])
> +(define_mode_iterator VSFDF [ (V2SF "TARGET_SIMD")

Nit: excess space between [ and (.

> +                           (V4SF "TARGET_SIMD")
> +                           (V2DF "TARGET_SIMD")
> +                           (DF "TARGET_FLOAT")
> +                           (SF "TARGET_FLOAT")])
>  
>  ;; Advanced SIMD single Float modes.
>  (define_mode_iterator VDQSF [V2SF V4SF])
> […]
> diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
> index 
> 41f1850bf6e95005647ca97a495a97d7e184d137..d50d09b0ae60d98537b9aece4396a490f33f174c
>  100644
> --- a/gcc/doc/md.texi
> +++ b/gcc/doc/md.texi
> @@ -6175,6 +6175,15 @@ operands; otherwise, it may not.
>  
>  This pattern is not allowed to @code{FAIL}.
>  
> +@cindex @code{ftrunc@var{m}@var{n}2} instruction pattern
> +@item @samp{ftrunc@var{m}@var{n}2}
> +Truncate operand 1 to a @var{n} mode signed integer, towards zero, and store
> +the result in operand 0. Both operands have mode @var{m}, which is a scalar 
> or
> +vector floating-point mode.  Exception must be thrown if operand 1 does not 
> fit

Maybe “An exception must be raised”?  “thrown” makes it sound like a
signal must be raised or C++ exception thrown.

> +in a @var{n} mode signed integer as it would have if the truncation happened
> +through separate floating point to integer conversion.
> +
> +
>  @cindex @code{round@var{m}2} instruction pattern
>  @item @samp{round@var{m}2}
>  Round operand 1 to the nearest integer, rounding away from zero in the
> […]
> @@ -3688,6 +3712,15 @@ multi_vector_optab_supported_p (convert_optab optab, 
> tree_pair types,
>         != CODE_FOR_nothing);
>  }
>  
> +static bool direct_ftrunc_int_optab_supported_p (convert_optab optab,
> +                                              tree_pair types,
> +                                              optimization_type opt_type)

Formatting nit: should be a line break after “bool”

> +{
> +  return (convert_optab_handler (optab, TYPE_MODE (types.first),
> +                             TYPE_MODE (element_type (types.second)),
> +                             opt_type) != CODE_FOR_nothing);
> +}
> +
>  #define direct_unary_optab_supported_p direct_optab_supported_p
>  #define direct_binary_optab_supported_p direct_optab_supported_p
>  #define direct_ternary_optab_supported_p direct_optab_supported_p
> […]
> diff --git a/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c 
> b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
> new file mode 100644
> index 
> 0000000000000000000000000000000000000000..b93304eb2acb3d3d954eebee51d77ff23fee68ac
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/frintnz_vec.c
> @@ -0,0 +1,47 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -march=armv8.5-a" } */
> +/* { dg-require-effective-target aarch64_frintnzx_ok } */
> +/* { dg-final { check-function-bodies "**" "" } } */
> +
> +#define TEST(name,float_type,int_type)                                       
> \
> +void                                                                 \
> +name (float_type * __restrict__ x, float_type * __restrict__ y, int n)  \
> +{                                                                    \
> +  for (int i = 0; i < n; ++i)                                              \
> +    {                                                                      \
> +      int_type x_i = x[i];                                         \
> +      y[i] = (float_type) x_i;                                             \
> +    }                                                                      \
> +}
> +
> +/*
> +** f1:
> +**   ...
> +**   frint32z        v0.4s, v0.4s

I don't think we can rely on v0 being used here.  v[0-9]+\.4s would
be safer.

> +**   ...
> +*/
> +TEST(f1, float, int)
> +
> +/*
> +** f2:
> +**   ...
> +**   frint64z        v0.4s, v0.4s
> +**   ...
> +*/
> +TEST(f2, float, long long)
> +
> +/*
> +** f3:
> +**   ...
> +**   frint32z        v0.2d, v0.2d
> +**   ...
> +*/
> +TEST(f3, double, int)
> +
> +/*
> +** f4:
> +**   ...
> +**   frint64z        v0.2d, v0.2d
> +**   ...
> +*/
> +TEST(f4, double, long long)
> […]
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 
> 03cc7267cf80d4ce73c0d89ab86b07e84752456a..35bb1f70f7b173ad0d1e9f70ce0ac9da891dbe62
>  100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -1625,7 +1625,8 @@ vect_finish_stmt_generation (vec_info *vinfo,
>  
>  static internal_fn
>  vectorizable_internal_function (combined_fn cfn, tree fndecl,
> -                             tree vectype_out, tree vectype_in)
> +                             tree vectype_out, tree vectype_in,
> +                             tree *vectypes)
>  {
>    internal_fn ifn;
>    if (internal_fn_p (cfn))
> @@ -1637,8 +1638,12 @@ vectorizable_internal_function (combined_fn cfn, tree 
> fndecl,
>        const direct_internal_fn_info &info = direct_internal_fn (ifn);
>        if (info.vectorizable)
>       {
> -       tree type0 = (info.type0 < 0 ? vectype_out : vectype_in);
> -       tree type1 = (info.type1 < 0 ? vectype_out : vectype_in);
> +       tree type0 = (info.type0 < 0 ? vectype_out : vectypes[info.type0]);
> +       if (!type0)
> +         type0 = vectype_in;
> +       tree type1 = (info.type1 < 0 ? vectype_out : vectypes[info.type1]);
> +       if (!type1)
> +         type1 = vectype_in;
>         if (direct_internal_fn_supported_p (ifn, tree_pair (type0, type1),
>                                             OPTIMIZE_FOR_SPEED))
>           return ifn;
> @@ -3252,16 +3257,31 @@ vectorizable_call (vec_info *vinfo,
>        rhs_type = unsigned_type_node;
>      }
>  
> -  int mask_opno = -1;
> +  /* The argument that is not of the same type as the others.  */
> +  int diff_opno = -1;
> +  bool masked = false;
>    if (internal_fn_p (cfn))
> -    mask_opno = internal_fn_mask_index (as_internal_fn (cfn));
> +    {
> +      if (cfn == CFN_FTRUNC_INT)
> +     /* For FTRUNC this represents the argument that carries the type of the
> +        intermediate signed integer.  */
> +     diff_opno = 1;
> +      else
> +     {
> +       /* For masked operations this represents the argument that carries the
> +          mask.  */
> +       diff_opno = internal_fn_mask_index (as_internal_fn (cfn));
> +       masked = diff_opno >=  0;
> +     }
> +    }

I think it would be cleaner not to process argument 1 at all for
CFN_FTRUNC_INT.  There's no particular need to vectorise it.

>    for (i = 0; i < nargs; i++)
>      {
> -      if ((int) i == mask_opno)
> +      if ((int) i == diff_opno && masked)
>       {
> -       if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node, mask_opno,
> -                                    &op, &slp_op[i], &dt[i], &vectypes[i]))
> +       if (!vect_check_scalar_mask (vinfo, stmt_info, slp_node,
> +                                    diff_opno, &op, &slp_op[i], &dt[i],
> +                                    &vectypes[i]))
>           return false;
>         continue;
>       }
> […]
> diff --git a/gcc/tree.c b/gcc/tree.c
> index 
> 845228a055b2cfac0c9ca8c0cda1b9df4b0095c6..f1e9a1eb48769cb11aa69730e2480ed5522f78c1
>  100644
> --- a/gcc/tree.c
> +++ b/gcc/tree.c
> @@ -6645,11 +6645,11 @@ valid_constant_size_p (const_tree size, 
> cst_size_error *perr /* = NULL */)
>    return true;
>  }
>  
> -/* Return the precision of the type, or for a complex or vector type the
> -   precision of the type of its elements.  */
> +/* Return the type, or for a complex or vector type the type of its
> +   elements.  */
>  
> -unsigned int
> -element_precision (const_tree type)
> +tree
> +element_type (const_tree type)
>  {
>    if (!TYPE_P (type))
>      type = TREE_TYPE (type);
> @@ -6657,7 +6657,16 @@ element_precision (const_tree type)
>    if (code == COMPLEX_TYPE || code == VECTOR_TYPE)
>      type = TREE_TYPE (type);
>  
> -  return TYPE_PRECISION (type);
> +  return (tree) type;

I think we should stick a const_cast in element_precision and make
element_type take a plain “type”.  As it stands element_type is an
implicit const_cast for many cases.

Thanks,
Richard

> +}
> +
> +/* Return the precision of the type, or for a complex or vector type the
> +   precision of the type of its elements.  */
> +
> +unsigned int
> +element_precision (const_tree type)
> +{
> +  return TYPE_PRECISION (element_type (type));
>  }
>  
>  /* Return true if CODE represents an associative tree code.  Otherwise

Re: [AArch64] Enable generation of FRINTNZ instructions

Reply via email to