<saurabh....@arm.com> writes: > The AArch64 FEAT_FAMINMAX extension is optional from Armv9.2-a and > mandatory from Armv9.5-a. It introduces instructions for computing the > floating point absolute maximum and minimum of the two vectors element-wise. > > This patch introduces AdvSIMD faminmax intrinsics. The intrinsics of > this extension are implemented as the following builtin functions: > * vamax_f16 > * vamaxq_f16 > * vamax_f32 > * vamaxq_f32 > * vamaxq_f64 > * vamin_f16 > * vaminq_f16 > * vamin_f32 > * vaminq_f32 > * vaminq_f64 > > We are defining a new way to add AArch64 AdvSIMD intrinsics by listing > all the intrinsics in a .def file and then using that .def file to > initialise various data structures. This would lead to more concise code > and easier addition of the new AdvSIMD intrinsics in future. > > The faminmax intrinsics are defined using the new approach. > > gcc/ChangeLog: > > * config/aarch64/aarch64-builtins.cc > (ENTRY): Macro to parse the contents of > aarch64-simd-pragma-builtins.def. > (enum aarch64_builtins): New enum values for faminmax builtins > via aarch64-simd-pragma-builtins.def. > (enum aarch64_builtin_signatures): Enum to specify the > number of operands a builtin will take. > (ENTRY_VHSDF): Macro to parse the contents of > aarch64-simd-pragma-builtins.def. > (struct aarch64_pragma_builtins_data): Struct to hold data from > aarch64-simd-pragma-builtins.def. > (aarch64_fntype): New function to define function types of > intrinsics given an object of type aarch64_pragma_builtins_data. > (aarch64_init_pragma_builtins): New function to define pragma > builtins. > (aarch64_get_pragma_builtin): New function to get a row of > aarch64_pragma_builtins, given code. > (handle_arm_neon_h): Modify to call > aarch64_init_pragma_builtins. > (aarch64_general_check_builtin_call): Modify to check whether > required flag is being used for pragma builtins. > (aarch64_expand_pragma_builtin): New function to emit > instructions of pragma_builtin. > (aarch64_general_expand_builtin): Modify to call > aarch64_expand_pragma_builtin. > * config/aarch64/aarch64-option-extensions.def > (AARCH64_OPT_EXTENSION): Introduce new flag for this extension. > * config/aarch64/aarch64-simd.md > (@aarch64_<faminmax_uns_op><mode>): Instruction pattern for > faminmax intrinsics. > * config/aarch64/aarch64.h > (TARGET_FAMINMAX): Introduce new flag for this extension. > * config/aarch64/iterators.md: New iterators and unspecs. > * config/arm/types.md: Introduce neon_fp_aminmax<q> attributes. > * doc/invoke.texi: Document extension in AArch64 Options. > * config/aarch64/aarch64-simd-pragma-builtins.def: New file to > list pragma builtins. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/simd/faminmax-builtins-no-flag.c: New test. > * gcc.target/aarch64/simd/faminmax-builtins.c: New test. > --- > gcc/config/aarch64/aarch64-builtins.cc | 123 ++++++++++++++++++ > .../aarch64/aarch64-option-extensions.def | 2 + > .../aarch64/aarch64-simd-pragma-builtins.def | 23 ++++ > gcc/config/aarch64/aarch64-simd.md | 11 ++ > gcc/config/aarch64/aarch64.h | 4 + > gcc/config/aarch64/iterators.md | 9 ++ > gcc/config/arm/types.md | 5 + > gcc/doc/invoke.texi | 2 + > .../aarch64/simd/faminmax-builtins-no-flag.c | 10 ++ > .../aarch64/simd/faminmax-builtins.c | 115 ++++++++++++++++ > 10 files changed, 304 insertions(+) > create mode 100644 gcc/config/aarch64/aarch64-simd-pragma-builtins.def > create mode 100644 > gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins-no-flag.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins.c > > diff --git a/gcc/config/aarch64/aarch64-builtins.cc > b/gcc/config/aarch64/aarch64-builtins.cc > index eb878b933fe..6e64ae86c52 100644 > --- a/gcc/config/aarch64/aarch64-builtins.cc > +++ b/gcc/config/aarch64/aarch64-builtins.cc > @@ -757,6 +757,18 @@ typedef struct > #define VAR1(T, N, MAP, FLAG, A) \ > AARCH64_SIMD_BUILTIN_##T##_##N##A, > > +#undef ENTRY > +#define ENTRY(N, S, M, U, F) \ > + AARCH64_##N, > + > +#undef ENTRY_VHSDF > +#define ENTRY_VHSDF(NAME, SIGNATURE, UNSPEC, EXTENSIONS) \ > + AARCH64_##NAME##_f16, \ > + AARCH64_##NAME##q_f16, \ > + AARCH64_##NAME##_f32, \ > + AARCH64_##NAME##q_f32, \ > + AARCH64_##NAME##q_f64, > + > enum aarch64_builtins > { > AARCH64_BUILTIN_MIN, > @@ -829,6 +841,10 @@ enum aarch64_builtins > AARCH64_RBIT, > AARCH64_RBITL, > AARCH64_RBITLL, > + /* Pragma builtins. */ > + AARCH64_PRAGMA_BUILTIN_START, > +#include "aarch64-simd-pragma-builtins.def" > + AARCH64_PRAGMA_BUILTIN_END, > /* System register builtins. */ > AARCH64_RSR, > AARCH64_RSRP, > @@ -947,6 +963,7 @@ const char *aarch64_scalar_builtin_types[] = { > > extern GTY(()) aarch64_simd_type_info aarch64_simd_types[]; > > +#undef ENTRY > #define ENTRY(E, M, Q, G) \ > {E, "__" #E, #G "__" #E, NULL_TREE, NULL_TREE, E_##M##mode, qualifier_##Q}, > struct aarch64_simd_type_info aarch64_simd_types [] = { > @@ -1547,6 +1564,80 @@ aarch64_init_simd_builtin_functions (bool > called_from_pragma) > } > } > > +enum aarch64_builtin_signatures > +{ > + AARCH64_BUILTIN_SIGNATURE_MIN, > + binary, > + AARCH64_BUILTIN_SIGNATURE_MAX, > +};
"binary" on its own probably isn't unique enough, and similar names could clash with target-independent code. One way of avoiding that would be to use: AARCH64_BUILTIN_SIGNATURE_binary and use token pasting in ENTRY to add the "AARCH64_BUILTIN_SIGNATURE_" prefix. But I guess a more C++ way would be to use enum classes: enum class aarch64_builtin_signature { binary, }; which is probably a bit nicer. (I don't think we need the min and max values.) Then: > + > +#undef ENTRY > +#define ENTRY(N, S, M, U, F) \ > + {#N, S, E_##M##mode, U, F}, ...this would be: {#N, aarch64_builtin_signature::S, E_##M##mode, U, F}, > + > +#undef ENTRY_VHSDF > +#define ENTRY_VHSDF(NAME, SIGNATURE, UNSPEC, EXTENSIONS) \ > + ENTRY (NAME##_f16, SIGNATURE, V4HF, UNSPEC, EXTENSIONS) \ > + ENTRY (NAME##q_f16, SIGNATURE, V8HF, UNSPEC, EXTENSIONS) \ > + ENTRY (NAME##_f32, SIGNATURE, V2SF, UNSPEC, EXTENSIONS) \ > + ENTRY (NAME##q_f32, SIGNATURE, V4SF, UNSPEC, EXTENSIONS) \ > + ENTRY (NAME##q_f64, SIGNATURE, V2DF, UNSPEC, EXTENSIONS) > + > +/* Initialize pragma builtins. */ > + > +struct aarch64_pragma_builtins_data > +{ > + const char *name; > + aarch64_builtin_signatures signature; > + machine_mode mode; > + int unspec; > + aarch64_feature_flags required_extensions; > +}; > + > +static aarch64_pragma_builtins_data aarch64_pragma_builtins[] = { > +#include "aarch64-simd-pragma-builtins.def" > +}; > + > +static tree > +aarch64_fntype (const aarch64_pragma_builtins_data &builtin_data) > +{ > + auto type = aarch64_simd_builtin_type (builtin_data.mode, qualifier_none); > + switch (builtin_data.signature) > + { > + case binary: And similarly here. > + return build_function_type_list (type, type, type, NULL_TREE); > + default: > + gcc_unreachable (); > + } > +} > + > [...] > @@ -3189,6 +3287,25 @@ aarch64_expand_builtin_data_intrinsic (unsigned int > fcode, tree exp, rtx target) > return ops[0].value; > } > > +static rtx > +aarch64_expand_pragma_builtin (tree exp, rtx target, > + const aarch64_pragma_builtins_data* builtin_data) Formatting, should be: const aarch64_pragma_builtins_data *builtin_data) with the "*" after the space. > +{ > + expand_operand ops[3]; > + auto mode = builtin_data->mode; > + auto op1 = expand_normal (CALL_EXPR_ARG (exp, 0)); > + auto op2 = expand_normal (CALL_EXPR_ARG (exp, 1)); > + create_output_operand (&ops[0], target, mode); > + create_input_operand (&ops[1], op1, mode); > + create_input_operand (&ops[2], op2, mode); > + > + auto unspec = builtin_data->unspec; > + auto icode = code_for_aarch64 (unspec, mode); > + expand_insn (icode, 3, ops); > + > + return target; > +} > + > /* Expand an expression EXP as fpsr or fpcr setter (depending on > UNSPEC) using MODE. */ > static void > @@ -3369,6 +3486,11 @@ aarch64_general_expand_builtin (unsigned int fcode, > tree exp, rtx target, > && fcode <= AARCH64_RBITLL) > return aarch64_expand_builtin_data_intrinsic (fcode, exp, target); > > + if (auto builtin_data = aarch64_get_pragma_builtin (fcode)) > + { > + return aarch64_expand_pragma_builtin (exp, target, builtin_data); > + } > + Formatting nit, should be no braces for single statements: if (auto builtin_data = aarch64_get_pragma_builtin (fcode)) return aarch64_expand_pragma_builtin (exp, target, builtin_data); > gcc_unreachable (); > } > > [...] > diff --git a/gcc/config/aarch64/aarch64-simd.md > b/gcc/config/aarch64/aarch64-simd.md > index 23c03a96371..7542c81ed91 100644 > --- a/gcc/config/aarch64/aarch64-simd.md > +++ b/gcc/config/aarch64/aarch64-simd.md > @@ -9910,3 +9910,14 @@ > "shl\\t%d0, %d1, #16" > [(set_attr "type" "neon_shift_imm")] > ) > + > +;; faminmax > +(define_insn "@aarch64_<faminmax_uns_op><mode>" > + [(set (match_operand:VHSDF 0 "register_operand" "=w") > + (unspec:VHSDF [(match_operand:VHSDF 1 "register_operand" "w") > + (match_operand:VHSDF 2 "register_operand" "w")] > + FAMINMAX_UNS))] > + "TARGET_FAMINMAX" > + "<faminmax_uns_op>\t%0.<Vtype>, %1.<Vtype>, %2.<Vtype>" > + [(set_attr "type" "neon_fp_aminmax<q>")] > +) Sorry for not noticing this last time, but: rather than add a new scheduling type for these instructions, let's just assume for now that they have the same performance characteristics as fmax and fmin. We can tweak it later if a scheduling model needs to distinguish them. Reusing an existing type is better because existing scheduling models will recognise it as a known operation, rather than treating it as an unknown operation (which is usually done conservatively). Not that it matters for modern OoO cores, but still. :) > diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h > index 2dfb999bea5..de14f57071a 100644 > --- a/gcc/config/aarch64/aarch64.h > +++ b/gcc/config/aarch64/aarch64.h > @@ -457,6 +457,10 @@ constexpr auto AARCH64_FL_DEFAULT_ISA_MODE > ATTRIBUTE_UNUSED > enabled through +gcs. */ > #define TARGET_GCS AARCH64_HAVE_ISA (GCS) > > +/* Floating Point Absolute Maximum/Minimum extension instructions are > + enabled through +faminmax. */ I realise this happens in pre-existing code too, but: there should just be one space after "/*": /* Floating Point Absolute Maximum/Minimum extension instructions are enabled through +faminmax. */ Thanks, Richard > +#define TARGET_FAMINMAX AARCH64_HAVE_ISA (FAMINMAX) > + > [...]