Hi Saurabh, > On 1 Aug 2024, at 11:14, Saurabh Jha wrote: > > External email: Use caution opening links or attachments > > > The AArch64 FEAT_FAMINMAX extension is optional from Armv9.2-a and > mandatory from Armv9.5-a. It introduces instructions for computing the > floating point absolute maximum and minimum of the two vectors element-wise. > > This patch does three things: > 1. Introduces AdvSIMD faminmax intrinsics. > 2. Adds code generation support for famax and famin in terms of the > existing operators. > 3. Move report_missing_extension and reported_missing_extension_p to > make it more usable. > > The intrinsics of this extension are implemented as the following > builtin functions: > * vamax_f16 > * vamaxq_f16 > * vamax_f32 > * vamaxq_f32 > * vamaxq_f64 > * vamin_f16 > * vaminq_f16 > * vamin_f32 > * vaminq_f32 > * vaminq_f64 > > For code generation, famax/famin is equivalent to first taking fabs of > the operands and then taking fmax/fmin of the results of fabs. > famax/famin (a, b) = fmax/fmin (fabs (a), fabs (b)) > This is correct because NaN/Inf handling of famax/famin and fmax/fmin > are same. We cannot use fmaxnm/fminnm here as Nan/Inf are handled > differently in them. >
Thank you for the explanation. > We moved the definition of `report_missing_extension` from > gcc/config/aarch64/aarch64-sve-builtins.cc to > gcc/config/aarch64/aarch64-builtins.cc and its declaration to > gcc/config/aarch64/aarch64-builtins.h. We also moved the declaration > of `reported_missing_extension_p` from > gcc/config/aarch64/aarch64-sve-builtins.cc > to gcc/config/aarch64/aarch64-builtins.cc, closer to the definition of > `report_missing_extension`. In the exsiting code structure, this leads > to `report_missing_extension` being usable from both normal builtins > and sve builtins. Thanks, this looks much better. > > gcc/ChangeLog: > > * config/aarch64/aarch64-builtins.cc > (enum aarch64_builtins): New enum values for faminmax builtins. > (aarch64_init_faminmax_builtins): New function to declare new builtins. > (handle_arm_neon_h): Modify to call aarch64_init_faminmax_builtins. > (aarch64_general_check_builtin_call): Modify to check whether > +faminmax flag is being used and printing error message if not being used. > (aarch64_expand_builtin_faminmax): New function to emit instructions > of this extension. > (aarch64_general_expand_builtin): Modify to call > aarch64_expand_builtin_faminmax. > (report_missing_extension): Move from > config/aarch64/aarch64-sve-builtins.cc. > * config/aarch64/aarch64-builtins.h > (report_missing_extension): Declaration for this function so > that it can be used wherever this header is included. > (reported_missing_extension_p): Move from > config/aarch64/aarch64-sve-builtins.cc > * config/aarch64/aarch64-option-extensions.def > (AARCH64_OPT_EXTENSION): Introduce new flag for this extension. > * config/aarch64/aarch64-simd.md > (aarch64_<faminmax><mode>): Introduce instruction pattern for this > extension. > * config/aarch64/aarch64-sve-builtins.cc > (reported_missing_extension_p): Move to > config/aarch64/aarch64-builtins.cc > (report_missing_extension): Move to > config/aarch64/aarch64-builtins.cc. > * config/aarch64/aarch64.h > (TARGET_FAMINMAX): Introduce new flag for this extension. > * config/aarch64/iterators.md: Introduce new iterators for this > extension. > * config/arm/types.md: Introduce neon_fp_aminmax<q> attributes. > * doc/invoke.texi: Document extension in AArch64 Options. > > gcc/testsuite/ChangeLog: > > * gcc.target/aarch64/simd/faminmax-builtins-no-flag.c: New test. > * gcc.target/aarch64/simd/faminmax-builtins.c: New test. > * gcc.target/aarch64/simd/faminmax-codegen-no-flag.c: New test. > * gcc.target/aarch64/simd/faminmax-codegen.c: New test. The aarch64 backend parts look good, but I think the tests could be improved. Sorry for missing it out in the first review. > > --- > Hi, > > Regression tested for aarch64-none-linux-gnu and found no regressions. > > This patch is a revised version of an earlier patch > https://gcc.gnu.org/pipermail/gcc-patches/2024-July/657914.html but has > more scope than that. That's why I didn't add "v2" in the subject line. > > Ok for master? I don't have commit access so can someone please commit > on my behalf? > > Regards, > Saurabh > --- > gcc/config/aarch64/aarch64-builtins.cc | 173 +++++++++++++++++- > gcc/config/aarch64/aarch64-builtins.h | 5 +- > .../aarch64/aarch64-option-extensions.def | 2 + > gcc/config/aarch64/aarch64-simd.md | 12 ++ > gcc/config/aarch64/aarch64-sve-builtins.cc | 22 --- > gcc/config/aarch64/aarch64.h | 4 + > gcc/config/aarch64/iterators.md | 8 + > gcc/config/arm/types.md | 6 + > gcc/doc/invoke.texi | 2 + > .../aarch64/simd/faminmax-builtins-no-flag.c | 10 + > .../aarch64/simd/faminmax-builtins.c | 75 ++++++++ > .../aarch64/simd/faminmax-codegen-no-flag.c | 54 ++++++ > .../aarch64/simd/faminmax-codegen.c | 104 +++++++++++ > 13 files changed, 445 insertions(+), 32 deletions(-) > create mode 100644 > gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins-no-flag.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins.c > create mode 100644 > gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen-no-flag.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/simd/faminmax-codegen.c > diff --git a/gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins-no-flag.c b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins-no-flag.c new file mode 100644 index 00000000000..63ed1508c23 --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins-no-flag.c @@ -0,0 +1,10 @@ +/* { dg-do assemble} */ +/* { dg-additional-options "-march=armv9-a" } */ + +#include "arm_neon.h" + +void +test (float32x4_t a, float32x4_t b) +{ + vamaxq_f32 (a, b); /* { dg-error {ACLE function 'vamaxq_f32' requires ISA extension 'faminmax'} } */ +} diff --git a/gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins.c b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins.c new file mode 100644 index 00000000000..f2b5bafb81c --- /dev/null +++ b/gcc/testsuite/gcc.target/aarch64/simd/faminmax-builtins.c @@ -0,0 +1,75 @@ +/* { dg-do assemble} */ +/* { dg-additional-options "-march=armv9-a+faminmax" } */ + +#include "arm_neon.h" + +float16x4_t +test_vamax_f16 (float16x4_t a, float16x4_t b) +{ + return vamax_f16 (a, b); +} + +float16x8_t +test_vamaxq_f16 (float16x8_t a, float16x8_t b) +{ + return vamaxq_f16 (a, b); +} + +float32x2_t +test_vamax_f32 (float32x2_t a, float32x2_t b) +{ + return vamax_f32 (a, b); +} + +float32x4_t +test_vamaxq_f32 (float32x4_t a, float32x4_t b) +{ + return vamaxq_f32 (a, b); +} + +float64x2_t +test_vamaxq_f64 (float64x2_t a, float64x2_t b) +{ + return vamaxq_f64 (a, b); +} + +float16x4_t +test_vamin_f16 (float16x4_t a, float16x4_t b) +{ + return vamin_f16 (a, b); +} + +float16x8_t +test_vaminq_f16 (float16x8_t a, float16x8_t b) +{ + return vaminq_f16 (a, b); +} + +float32x2_t +test_vamin_f32 (float32x2_t a, float32x2_t b) +{ + return vamin_f32 (a, b); +} + +float32x4_t +test_vaminq_f32 (float32x4_t a, float32x4_t b) +{ + return vaminq_f32 (a, b); +} + +float64x2_t +test_vaminq_f64 (float64x2_t a, float64x2_t b) +{ + return vaminq_f64 (a, b); +} + +/* { dg-final { scan-assembler-times {\tfamax\tv[0-9]+.4h, v[0-9]+.4h, v[0-9]+.4h} 1 } } */ +/* { dg-final { scan-assembler-times {\tfamax\tv[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8h} 1 } } */ +/* { dg-final { scan-assembler-times {\tfamax\tv[0-9]+.2s, v[0-9]+.2s, v[0-9]+.2s} 1 } } */ +/* { dg-final { scan-assembler-times {\tfamax\tv[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s} 1 } } */ +/* { dg-final { scan-assembler-times {\tfamax\tv[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d} 1 } } */ +/* { dg-final { scan-assembler-times {\tfamin\tv[0-9]+.4h, v[0-9]+.4h, v[0-9]+.4h} 1 } } */ +/* { dg-final { scan-assembler-times {\tfamin\tv[0-9]+.8h, v[0-9]+.8h, v[0-9]+.8h} 1 } } */ +/* { dg-final { scan-assembler-times {\tfamin\tv[0-9]+.2s, v[0-9]+.2s, v[0-9]+.2s} 1 } } */ +/* { dg-final { scan-assembler-times {\tfamin\tv[0-9]+.4s, v[0-9]+.4s, v[0-9]+.4s} 1 } } */ +/* { dg-final { scan-assembler-times {\tfamin\tv[0-9]+.2d, v[0-9]+.2d, v[0-9]+.2d} 1 } } */ We should use the check-function-bodies infrastructure to match the exact assembly for each of the test functions. You can grep around the gcc.target/aarch64 directory for examples of how to use that. It would also be good to have some tests for generating these instructions from straight C code using __builtin_fmax/fmin and __builtin_fabs to make sure we’re not missing out on general codegen improvements from this work. Thanks, Kyrill