Kyrylo Tkachov <ktkac...@nvidia.com> writes: > Hi Victor, > >> On 10 Jul 2024, at 16:05, Victor Do Nascimento <victor.donascime...@arm.com> >> wrote: >> >> External email: Use caution opening links or attachments >> >> >> Given recent changes to the dot_prod standard pattern name, this patch >> fixes the aarch64 back-end by implementing the following changes: >> >> 1. Add 2nd mode to all (u|s|us)dot_prod patterns in .md files. >> 2. Rewrite initialization and function expansion mechanism for simd >> builtins. >> 3. Fix all direct calls to back-end `dot_prod' patterns in SVE >> builtins. >> >> Finally, given that it is now possible for the compiler to >> differentiate between the two- and four-way dot product, we add a test >> to ensure that autovectorization picks up on dot-product patterns >> where the result is twice the width of the operands. >> >> gcc/ChangeLog: >> >> * config/aarch64/aarch64-builtins.cc (enum aarch64_builtins): >> New AARCH64_BUILTIN_* enum values: SDOTV8QI, SDOTV16QI, >> UDOTV8QI, UDOTV16QI, USDOTV8QI, USDOTV16QI. >> (aarch64_init_builtin_dotprod_functions): New. >> (aarch64_init_simd_builtins): Add call to >> `aarch64_init_builtin_dotprod_functions'. >> (aarch64_general_gimple_fold_builtin): Add DOT_PROD_EXPR >> handling. >> * config/aarch64/aarch64-simd-builtins.def: Remove macro >> expansion-based initialization and expansion >> of (u|s|us)dot_prod builtins. >> * config/aarch64/aarch64-simd.md >> (<sur>dot_prod<vsi2qi><vczle><vczbe>): Deleted. >> (<sur>dot_prod<mode><vsi2qi><vczle><vczbe>): New. >> (usdot_prod<vsi2qi><vczle><vczbe>): Deleted. >> (usdot_prod<mode><vsi2qi><vczle><vczbe>): New. >> (<su>sadv16qi): Adjust call to gen_udot_prod take second mode. >> (popcount<mode2>): fix use of `udot_prod_optab'. >> * config/aarch64/aarch64-sve-builtins-base.cc >> (svdot_impl::expand): s/direct/convert/ in >> `convert_optab_handler_for_sign' function call. >> (svusdot_impl::expand): add second mode argument in call to >> `code_for_dot_prod'. >> * config/aarch64/aarch64-sve-builtins.cc >> (function_expander::convert_optab_handler_for_sign): New class >> method. >> * config/aarch64/aarch64-sve-builtins.h >> (class function_expander): Add prototype for new >> `convert_optab_handler_for_sign' method. >> * gcc/config/aarch64/aarch64-sve.md >> (<sur>dot_prod<vsi2qi>): Deleted. >> (<sur>dot_prod<mode><vsi2qi>): New. >> (@<sur>dot_prod<vsi2qi>): Deleted. >> (@<sur>dot_prod<mode><vsi2qi>): New. >> (<su>sad<vsi2qi>): Adjust call to gen_udot_prod take second mode. >> * gcc/config/aarch64/aarch64-sve2.md >> (@aarch64_sve_<sur>dotvnx4sivnx8hi): Deleted. >> (<sur>dot_prodvnx4sivnx8hi): New. >> >> gcc/testsuite/ChangeLog: >> * gcc.target/aarch64/sme/vect-dotprod-twoway.c (udot2): New. >> --- >> gcc/config/aarch64/aarch64-builtins.cc | 71 +++++++++++++++++++ >> gcc/config/aarch64/aarch64-simd-builtins.def | 4 -- >> gcc/config/aarch64/aarch64-simd.md | 9 +-- >> .../aarch64/aarch64-sve-builtins-base.cc | 13 ++-- >> gcc/config/aarch64/aarch64-sve-builtins.cc | 17 +++++ >> gcc/config/aarch64/aarch64-sve-builtins.h | 3 + >> gcc/config/aarch64/aarch64-sve.md | 6 +- >> gcc/config/aarch64/aarch64-sve2.md | 2 +- >> gcc/config/aarch64/iterators.md | 1 + >> .../aarch64/sme/vect-dotprod-twoway.c | 25 +++++++ >> 10 files changed, 133 insertions(+), 18 deletions(-) >> create mode 100644 gcc/testsuite/gcc.target/aarch64/sme/vect-dotprod-twoway.c >> >> diff --git a/gcc/config/aarch64/aarch64-builtins.cc >> b/gcc/config/aarch64/aarch64-builtins.cc >> index 30669f8aa18..6c7c86d0e6e 100644 >> --- a/gcc/config/aarch64/aarch64-builtins.cc >> +++ b/gcc/config/aarch64/aarch64-builtins.cc >> @@ -783,6 +783,12 @@ enum aarch64_builtins >> AARCH64_SIMD_PATTERN_START = AARCH64_SIMD_BUILTIN_LANE_CHECK + 1, >> AARCH64_SIMD_BUILTIN_MAX = AARCH64_SIMD_PATTERN_START >> + ARRAY_SIZE (aarch64_simd_builtin_data) - 1, >> + AARCH64_BUILTIN_SDOTV8QI, >> + AARCH64_BUILTIN_SDOTV16QI, >> + AARCH64_BUILTIN_UDOTV8QI, >> + AARCH64_BUILTIN_UDOTV16QI, >> + AARCH64_BUILTIN_USDOTV8QI, >> + AARCH64_BUILTIN_USDOTV16QI, >> AARCH64_CRC32_BUILTIN_BASE, >> AARCH64_CRC32_BUILTINS >> AARCH64_CRC32_BUILTIN_MAX, >> @@ -1642,6 +1648,60 @@ handle_arm_neon_h (void) >> aarch64_init_simd_intrinsics (); >> } >> >> +void >> +aarch64_init_builtin_dotprod_functions (void) >> +{ >> + tree fndecl = NULL; >> + tree ftype = NULL; >> + >> + tree uv8qi = aarch64_simd_builtin_type (V8QImode, qualifier_unsigned); >> + tree sv8qi = aarch64_simd_builtin_type (V8QImode, qualifier_none); >> + tree uv16qi = aarch64_simd_builtin_type (V16QImode, qualifier_unsigned); >> + tree sv16qi = aarch64_simd_builtin_type (V16QImode, qualifier_none); >> + tree uv2si = aarch64_simd_builtin_type (V2SImode, qualifier_unsigned); >> + tree sv2si = aarch64_simd_builtin_type (V2SImode, qualifier_none); >> + tree uv4si = aarch64_simd_builtin_type (V4SImode, qualifier_unsigned); >> + tree sv4si = aarch64_simd_builtin_type (V4SImode, qualifier_none); >> + >> + struct builtin_decls_data >> + { >> + tree out_type_node; >> + tree in_type1_node; >> + tree in_type2_node; >> + const char *builtin_name; >> + int function_code; >> + }; >> + >> +#define NAME(A) "__builtin_aarch64_" #A >> +#define ENUM(B) AARCH64_BUILTIN_##B >> + >> + builtin_decls_data bdda[] = >> + { >> + { sv2si, sv8qi, sv8qi, NAME (sdot_prodv8qi), ENUM (SDOTV8QI) >> }, >> + { uv2si, uv8qi, uv8qi, NAME (udot_prodv8qi_uuuu), ENUM (UDOTV8QI) >> }, >> + { sv2si, uv8qi, sv8qi, NAME (usdot_prodv8qi_suss), ENUM (USDOTV8QI) >> }, >> + { sv4si, sv16qi, sv16qi, NAME (sdot_prodv16qi), ENUM (SDOTV16QI) >> }, >> + { uv4si, uv16qi, uv16qi, NAME (udot_prodv16qi_uuuu), ENUM (UDOTV16QI) >> }, >> + { sv4si, uv16qi, sv16qi, NAME (usdot_prodv16qi_suss), ENUM (USDOTV16QI) >> }, >> + }; >> + >> +#undef NAME >> +#undef ENUM >> + >> + builtin_decls_data *bdd = bdda; >> + builtin_decls_data *bdd_end = bdd + (ARRAY_SIZE (bdda)); >> + >> + for (; bdd < bdd_end; bdd++) >> + { >> + ftype = build_function_type_list (bdd->out_type_node, >> bdd->in_type1_node, >> + bdd->in_type2_node, bdd->out_type_node, >> + NULL_TREE); >> + fndecl = aarch64_general_add_builtin (bdd->builtin_name, >> + ftype, bdd->function_code); >> + aarch64_builtin_decls[bdd->function_code] = fndecl; >> + } >> +} >> + >> static void >> aarch64_init_simd_builtins (void) >> { >> @@ -1654,6 +1714,8 @@ aarch64_init_simd_builtins (void) >> aarch64_init_simd_builtin_scalar_types (); >> >> aarch64_init_simd_builtin_functions (false); >> + aarch64_init_builtin_dotprod_functions (); >> + > > Perhaps we should take this opportunity to instead migrate the dot-product > intrinsics to the simulate_builtin_function_decl framework instead so that > they get created as part of “#pragma GCC aarch64 “arm_neon.h””. > > That’s the direction of travel we want with these builtins so I’d rather not > complicate the legacy builtin handling code here. > I think it shouldn’t be much more work than this patch as you’ve already got > the various static bookkeeping data on hand.
To avoid mission creep, it might be simpler to change: BUILTIN_VB (TERNOP, sdot_prod, 10, NONE) BUILTIN_VB (TERNOPU, udot_prod, 10, NONE) BUILTIN_VB (TERNOP_SUSS, usdot_prod, 10, NONE) to: BUILTIN_VB (TERNOP, sdot_prod, 0, NONE) BUILTIN_VB (TERNOPU, udot_prod, 0, NONE) BUILTIN_VB (TERNOP_SUSS, usdot_prod, 0, NONE) so that the internal names are aarch64_udot_prodv8qi etc., and then add: constexpr insn_code CODE_FOR_aarch64_udot_prodv8qi = CODE_FOR_udot_prodv2siv8qi; etc. to aarch64-builtins.cc. I agree that moving to the pragma approach would be a good thing long-term, but at heart this patch is meant to be a renaming exercise. Thanks, Richard