Kyrylo Tkachov <ktkac...@nvidia.com> writes:
> Hi Victor,
>
>> On 10 Jul 2024, at 16:05, Victor Do Nascimento <victor.donascime...@arm.com> 
>> wrote:
>> 
>> External email: Use caution opening links or attachments
>> 
>> 
>> Given recent changes to the dot_prod standard pattern name, this patch
>> fixes the aarch64 back-end by implementing the following changes:
>> 
>> 1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files.
>> 2. Rewrite initialization and function expansion mechanism for simd
>> builtins.
>> 3. Fix all direct calls to back-end `dot_prod' patterns in SVE
>> builtins.
>> 
>> Finally, given that it is now possible for the compiler to
>> differentiate between the two- and four-way dot product, we add a test
>> to ensure that autovectorization picks up on dot-product patterns
>> where the result is twice the width of the operands.
>> 
>> gcc/ChangeLog:
>> 
>>        * config/aarch64/aarch64-builtins.cc (enum aarch64_builtins):
>>        New AARCH64_BUILTIN_* enum values: SDOTV8QI, SDOTV16QI,
>>        UDOTV8QI, UDOTV16QI, USDOTV8QI, USDOTV16QI.
>>        (aarch64_init_builtin_dotprod_functions): New.
>>        (aarch64_init_simd_builtins): Add call to
>>        `aarch64_init_builtin_dotprod_functions'.
>>        (aarch64_general_gimple_fold_builtin): Add DOT_PROD_EXPR
>>        handling.
>>        * config/aarch64/aarch64-simd-builtins.def: Remove macro
>>        expansion-based initialization and expansion
>>        of (u|s|us)dot_prod builtins.
>>        * config/aarch64/aarch64-simd.md
>>        (<sur>dot_prod<vsi2qi><vczle><vczbe>): Deleted.
>>        (<sur>dot_prod<mode><vsi2qi><vczle><vczbe>): New.
>>        (usdot_prod<vsi2qi><vczle><vczbe>): Deleted.
>>        (usdot_prod<mode><vsi2qi><vczle><vczbe>): New.
>>        (<su>sadv16qi): Adjust call to gen_udot_prod take second mode.
>>        (popcount<mode2>): fix use of `udot_prod_optab'.
>>        * config/aarch64/aarch64-sve-builtins-base.cc
>>        (svdot_impl::expand): s/direct/convert/ in
>>        `convert_optab_handler_for_sign' function call.
>>        (svusdot_impl::expand): add second mode argument in call to
>>        `code_for_dot_prod'.
>>        * config/aarch64/aarch64-sve-builtins.cc
>>        (function_expander::convert_optab_handler_for_sign): New class
>>        method.
>>        * config/aarch64/aarch64-sve-builtins.h
>>        (class function_expander): Add prototype for new
>>        `convert_optab_handler_for_sign' method.
>>        * gcc/config/aarch64/aarch64-sve.md
>>        (<sur>dot_prod<vsi2qi>): Deleted.
>>        (<sur>dot_prod<mode><vsi2qi>): New.
>>        (@<sur>dot_prod<vsi2qi>): Deleted.
>>        (@<sur>dot_prod<mode><vsi2qi>): New.
>>        (<su>sad<vsi2qi>): Adjust call to gen_udot_prod take second mode.
>>        * gcc/config/aarch64/aarch64-sve2.md
>>        (@aarch64_sve_<sur>dotvnx4sivnx8hi): Deleted.
>>        (<sur>dot_prodvnx4sivnx8hi): New.
>> 
>> gcc/testsuite/ChangeLog:
>>        * gcc.target/aarch64/sme/vect-dotprod-twoway.c (udot2): New.
>> ---
>> gcc/config/aarch64/aarch64-builtins.cc        | 71 +++++++++++++++++++
>> gcc/config/aarch64/aarch64-simd-builtins.def  |  4 --
>> gcc/config/aarch64/aarch64-simd.md            |  9 +--
>> .../aarch64/aarch64-sve-builtins-base.cc      | 13 ++--
>> gcc/config/aarch64/aarch64-sve-builtins.cc    | 17 +++++
>> gcc/config/aarch64/aarch64-sve-builtins.h     |  3 +
>> gcc/config/aarch64/aarch64-sve.md             |  6 +-
>> gcc/config/aarch64/aarch64-sve2.md            |  2 +-
>> gcc/config/aarch64/iterators.md               |  1 +
>> .../aarch64/sme/vect-dotprod-twoway.c         | 25 +++++++
>> 10 files changed, 133 insertions(+), 18 deletions(-)
>> create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c
>> 
>> diff --git a/gcc/config/aarch64/aarch64-builtins.cc 
>> b/gcc/config/aarch64/aarch64-builtins.cc
>> index 30669f8aa18..6c7c86d0e6e 100644
>> --- a/gcc/config/aarch64/aarch64-builtins.cc
>> +++ b/gcc/config/aarch64/aarch64-builtins.cc
>> @@ -783,6 +783,12 @@ enum aarch64_builtins
>>   AARCH64_SIMD_PATTERN_START = AARCH64_SIMD_BUILTIN_LANE_CHECK + 1,
>>   AARCH64_SIMD_BUILTIN_MAX = AARCH64_SIMD_PATTERN_START
>>                              + ARRAY_SIZE (aarch64_simd_builtin_data) - 1,
>> +  AARCH64_BUILTIN_SDOTV8QI,
>> +  AARCH64_BUILTIN_SDOTV16QI,
>> +  AARCH64_BUILTIN_UDOTV8QI,
>> +  AARCH64_BUILTIN_UDOTV16QI,
>> +  AARCH64_BUILTIN_USDOTV8QI,
>> +  AARCH64_BUILTIN_USDOTV16QI,
>>   AARCH64_CRC32_BUILTIN_BASE,
>>   AARCH64_CRC32_BUILTINS
>>   AARCH64_CRC32_BUILTIN_MAX,
>> @@ -1642,6 +1648,60 @@ handle_arm_neon_h (void)
>>   aarch64_init_simd_intrinsics ();
>> }
>> 
>> +void
>> +aarch64_init_builtin_dotprod_functions (void)
>> +{
>> +  tree fndecl = NULL;
>> +  tree ftype = NULL;
>> +
>> +  tree uv8qi = aarch64_simd_builtin_type (V8QImode, qualifier_unsigned);
>> +  tree sv8qi = aarch64_simd_builtin_type (V8QImode, qualifier_none);
>> +  tree uv16qi = aarch64_simd_builtin_type (V16QImode, qualifier_unsigned);
>> +  tree sv16qi = aarch64_simd_builtin_type (V16QImode, qualifier_none);
>> +  tree uv2si = aarch64_simd_builtin_type (V2SImode, qualifier_unsigned);
>> +  tree sv2si = aarch64_simd_builtin_type (V2SImode, qualifier_none);
>> +  tree uv4si = aarch64_simd_builtin_type (V4SImode, qualifier_unsigned);
>> +  tree sv4si = aarch64_simd_builtin_type (V4SImode, qualifier_none);
>> +
>> +  struct builtin_decls_data
>> +  {
>> +    tree out_type_node;
>> +    tree in_type1_node;
>> +    tree in_type2_node;
>> +    const char *builtin_name;
>> +    int function_code;
>> +  };
>> +
>> +#define NAME(A) "__builtin_aarch64_" #A
>> +#define ENUM(B) AARCH64_BUILTIN_##B
>> +
>> +  builtin_decls_data bdda[] =
>> +  {
>> +    { sv2si, sv8qi,  sv8qi,  NAME (sdot_prodv8qi),       ENUM (SDOTV8QI)   
>> },
>> +    { uv2si, uv8qi,  uv8qi,  NAME (udot_prodv8qi_uuuu),   ENUM (UDOTV8QI)   
>> },
>> +    { sv2si, uv8qi,  sv8qi,  NAME (usdot_prodv8qi_suss),  ENUM (USDOTV8QI)  
>> },
>> +    { sv4si, sv16qi, sv16qi, NAME (sdot_prodv16qi),      ENUM (SDOTV16QI)  
>> },
>> +    { uv4si, uv16qi, uv16qi, NAME (udot_prodv16qi_uuuu),  ENUM (UDOTV16QI)  
>> },
>> +    { sv4si, uv16qi, sv16qi, NAME (usdot_prodv16qi_suss), ENUM (USDOTV16QI) 
>> },
>> +  };
>> +
>> +#undef NAME
>> +#undef ENUM
>> +
>> +  builtin_decls_data *bdd = bdda;
>> +  builtin_decls_data *bdd_end = bdd + (ARRAY_SIZE (bdda));
>> +
>> +  for (; bdd < bdd_end; bdd++)
>> +  {
>> +    ftype = build_function_type_list (bdd->out_type_node, 
>> bdd->in_type1_node,
>> +                                     bdd->in_type2_node, bdd->out_type_node,
>> +                                     NULL_TREE);
>> +    fndecl = aarch64_general_add_builtin (bdd->builtin_name,
>> +                                         ftype, bdd->function_code);
>> +    aarch64_builtin_decls[bdd->function_code] = fndecl;
>> +  }
>> +}
>> +
>> static void
>> aarch64_init_simd_builtins (void)
>> {
>> @@ -1654,6 +1714,8 @@ aarch64_init_simd_builtins (void)
>>   aarch64_init_simd_builtin_scalar_types ();
>> 
>>   aarch64_init_simd_builtin_functions (false);
>> +  aarch64_init_builtin_dotprod_functions ();
>> +
>
> Perhaps we should take this opportunity to instead migrate the dot-product 
> intrinsics to the simulate_builtin_function_decl framework instead so that 
> they get created as part of “#pragma GCC aarch64 “arm_neon.h””.
>
> That’s the direction of travel we want with these builtins so I’d rather not 
> complicate the legacy builtin handling code here.
> I think it shouldn’t be much more work than this patch as you’ve already got 
> the various static bookkeeping data on hand.

To avoid mission creep, it might be simpler to change:

  BUILTIN_VB (TERNOP, sdot_prod, 10, NONE)
  BUILTIN_VB (TERNOPU, udot_prod, 10, NONE)
  BUILTIN_VB (TERNOP_SUSS, usdot_prod, 10, NONE)

to:

  BUILTIN_VB (TERNOP, sdot_prod, 0, NONE)
  BUILTIN_VB (TERNOPU, udot_prod, 0, NONE)
  BUILTIN_VB (TERNOP_SUSS, usdot_prod, 0, NONE)

so that the internal names are aarch64_udot_prodv8qi etc., and then add:

constexpr insn_code CODE_FOR_aarch64_udot_prodv8qi
    = CODE_FOR_udot_prodv2siv8qi;

etc. to aarch64-builtins.cc.  I agree that moving to the pragma approach
would be a good thing long-term, but at heart this patch is meant to be
a renaming exercise.

Thanks,
Richard

Reply via email to