From: Abhishek Kaushik <[email protected]>

The FMA folds in match.pd currently only matches (negate @0) directly.
When the negated operand is wrapped in a type conversion
(e.g. (convert (negate @0))), the simplification to IFN_FNMA does not
trigger.

This prevents folding of patterns such as:

*c = *c - (v8u)(*a * *b);

when the multiply operands undergo vector type conversions before being
passed to FMA. In such cases the expression lowers to neg + mla instead
of the more optimal msb on AArch64 SVE, because the canonicalization
step cannot see through the casts.

Extend the match pattern to allow optional conversions on the negated
operand and the second multiplicand:

(fmas:c (convert? (negate @0)) (convert? @1) @2)

and explicitly rebuild the converted operands in the IFN_FNMA
replacement. This enables recognition of the subtraction-of-product form
even when vector element type casts are present.

With this change, AArch64 SVE code generation is able to select msb
instead of emitting a separate neg followed by mla.

This patch was bootstrapped and regression tested on aarch64-linux-gnu.

gcc/
        PR target/123897
        * match.pd: Allow optional conversions in FMA-to-FNMA
        canonicalization and reconstruct converted operands in
        the replacement.

gcc/testsuite/
        PR target/123897
        * gcc.target/aarch64/sve/fnma_match.c: New test.
        * gcc.target/aarch64/sve/pr123897.c:
        Update the test to scan for FNMA in the tree dump.
---
 gcc/match.pd                                  |  4 +--
 .../gcc.target/aarch64/sve/fnma_match.c       | 28 +++++++++++++++++++
 .../gcc.target/aarch64/sve/pr123897.c         |  3 +-
 3 files changed, 32 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/aarch64/sve/fnma_match.c

diff --git a/gcc/match.pd b/gcc/match.pd
index 7f16fd4e081..4cce9463f8f 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -10255,8 +10255,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (if (canonicalize_math_after_vectorization_p ())
  (for fmas (FMA)
   (simplify
-   (fmas:c (negate @0) @1 @2)
-   (IFN_FNMA @0 @1 @2))
+   (fmas:c (convert? (negate @0)) (convert? @1) @2)
+   (IFN_FNMA (convert @0) (convert @1) @2))
   (simplify
    (fmas @0 @1 (negate @2))
    (IFN_FMS @0 @1 @2))
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/fnma_match.c 
b/gcc/testsuite/gcc.target/aarch64/sve/fnma_match.c
new file mode 100644
index 00000000000..08607b172e2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/fnma_match.c
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=armv9-a -msve-vector-bits=256" } */
+
+typedef __attribute__((__vector_size__(sizeof(int)*8))) signed int v8i;
+typedef __attribute__((__vector_size__(sizeof(int)*8))) unsigned int v8u;
+
+void g(v8i *a,v8i *b,v8u *c)
+{
+  *c = *c - (v8u)(*a * *b);
+}
+
+void h(v8u *a,v8u *b,v8i *c)
+{
+  *c = *c - (v8i)(*a * *b);
+}
+
+void x(v8i *a,v8i *b,v8i *c)
+{
+  *c = *c - (*a * *b);
+}
+
+void y(v8u *a,v8u *b,v8u *c)
+{
+  *c = *c - (*a * *b);
+}
+
+/* { dg-final { scan-assembler-times "\\tmsb\\t" 4 } } */
+/* { dg-final { scan-assembler-not "\\tneg\\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr123897.c 
b/gcc/testsuite/gcc.target/aarch64/sve/pr123897.c
index d74efabb7f8..45bc52522a9 100644
--- a/gcc/testsuite/gcc.target/aarch64/sve/pr123897.c
+++ b/gcc/testsuite/gcc.target/aarch64/sve/pr123897.c
@@ -13,4 +13,5 @@ void g(v8i *a,v8i *b,v8u *c)
   *c = *c - (v8u)(*a * *b);
 }
 
-/* { dg-final { scan-tree-dump-times "\.FMA" 2 "widening_mul" } } */
+/* { dg-final { scan-tree-dump-times "\.FMA" 1 "widening_mul" } } */
+/* { dg-final { scan-tree-dump-times "\.FNMA" 1 "widening_mul" } } */
-- 
2.43.0

Reply via email to