External email: Use caution opening links or attachments
On 28/10/25 22:03, Alex Coplan wrote:
External email: Use caution opening links or attachments
Hi Dhruv,
Sorry for the long wait on this one. Comments below ...
Hi Alex,
Thanks for the review. I have attached a new version of the patch
to this email.
On 18/08/2025 21:31, [email protected] wrote:
From: Dhruv Chawla <[email protected]>
This patch folds the following patterns:
- umax (a, add (a, b)) -> [sum, ovf] = adds (a, b); !ovf ? sum : a
- umin (a, add (a, b)) -> [sum, ovf] = adds (a, b); !ovf ? a : sum
- umax (a, sub (a, b)) -> [diff, ovf] = subs (a, b); ovf ? diff : a
- umin (a, sub (a, b)) -> [diff, ovf] = subs (a, b); ovf ? a : diff
Where ovf is the overflow flag. adds and subs are generated by
generating a parallel compare+plus/minus which maps to the pattern
add<mode>3_compareC. sub<mode>3_compareC is also created to have an
equivalent pattern for the subs instruction. In the case of subs, the
overflow flag represents underflow.
This patch is a respin of the patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2025-May/685021.html as per
the suggestion to turn it into a target-specific transform by Richard
Biener.
Bootstrapped and regtested on aarch64-unknown-linux-gnu.
Signed-off-by: Dhruv Chawla <[email protected]>
PR middle-end/116815
gcc/ChangeLog:
* config/aarch64/aarch64.md (sub<mode>3_compareC): New pattern.
(*aarch64_plus_within_<optab><mode>3_<ovf_commutate>): Likewise.
(*aarch64_minus_within_<optab><mode>3): Likewise.
* config/aarch64/iterators.md (ovf_add_cmp): New code attribute.
(ovf_sub_cmp): Likewise.
(ovf_commutate): New iterator.
(ovf_comm_opp): New int attribute.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/pr116815-1.c: New test.
* gcc.target/aarch64/pr116815-2.c: Likewise.
* gcc.target/aarch64/pr116815-3.c: Likewise.
* gcc.target/aarch64/pr116815-4.c: Likewise.
* gcc.target/aarch64/pr116815-5.c: Likewise.
* gcc.target/aarch64/pr116815-6.c: Likewise.
---
gcc/config/aarch64/aarch64.md | 76 +++++++++++
gcc/config/aarch64/iterators.md | 9 ++
gcc/testsuite/gcc.target/aarch64/pr116815-1.c | 120 ++++++++++++++++++
gcc/testsuite/gcc.target/aarch64/pr116815-2.c | 93 ++++++++++++++
gcc/testsuite/gcc.target/aarch64/pr116815-3.c | 62 +++++++++
gcc/testsuite/gcc.target/aarch64/pr116815-4.c | 48 +++++++
gcc/testsuite/gcc.target/aarch64/pr116815-5.c | 44 +++++++
gcc/testsuite/gcc.target/aarch64/pr116815-6.c | 60 +++++++++
8 files changed, 512 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-3.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-4.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-5.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-6.c
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 8e431a10cb1..a55fe5c85d8 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3715,6 +3715,20 @@
[(set_attr "type" "alus_sreg")]
)
+;; An equivalent to add<mode>3_compareC
+(define_insn "sub<mode>3_compareC"
+ [(set (reg:CC_C CC_REGNUM)
+ (compare:CC_C
+ (minus:GPI
+ (match_operand:GPI 1 "register_operand" "r")
+ (match_operand:GPI 2 "register_operand" "r"))
+ (match_dup 1)))
+ (set (match_operand:GPI 0 "register_operand" "=r")
+ (minus:GPI (match_dup 1) (match_dup 2)))]
+ ""
+ "subs\t%<w>0, %<w>1, %<w>2"
+)
+
I don't think we want/need this pattern. subs(a,b) underflows precisely
when a < b, i.e. when a - b < 0 so you can just use a plain CCmode
comparison. I think you can make use of the existing
sub<mode>3_compare1 pattern here.
This should simplify the define_insn_and_split for the subs case below.
(define_peephole2
[(set (match_operand:GPI 0 "aarch64_general_reg")
(minus:GPI (match_operand:GPI 1 "aarch64_reg_or_zero")
@@ -4455,6 +4469,68 @@
[(set_attr "type" "<su>div")]
)
+;; umax (a, add (a, b)) => [sum, ovf] = adds (a, b); !ovf ? sum : a
+;; umin (a, add (a, b)) => [sum, ovf] = adds (a, b); !ovf ? a : sum
+;; ... along with the commutative version of add (a, b) i.e. add (b, a).
For the sake of saving one comment line, it might be better to actually
spell out the commutated variant, AIUI e.g. for umax that's:
umax (b, add (a, b)) => [sum, ovf] = adds (b, a); !ovf ? sum : b
Done. I also added this to the patch description.
+(define_insn_and_split "*aarch64_plus_within_<optab><mode>3_<ovf_commutate>"
+ [(set (match_operand:GPI 0 "register_operand" "=r")
+ (UMAXMIN:GPI
+ (plus:GPI (match_operand:GPI 1 "register_operand" "r")
+ (match_operand:GPI 2 "register_operand" "r"))
+ (match_dup ovf_commutate)))
+ (clobber (match_scratch:GPI 3))]
I think you might want an "=r" constraint on the match_scratch here.
I am a bit confused here - I expect this pattern to only execute pre-RA
because it requires being able to call gen_reg_rtx (which only works
pre-RA). Does RA populate scratch registers by itself? Otherwise, I
don't really see any other way for this split to run post-RA.
Uros: IIRC I ran into issues where unconditionally calling gen_reg_rtx
still caused an ICE, maybe because the code ran during reload itself.
I guess I could replace "!reload_completed" with "can_create_pseudo_p ()"?
+ "!TARGET_CSSC"
+ "#"
+ "&& !reload_completed"
+ [(parallel
+ [(set (reg:CC_C CC_REGNUM)
+ (compare:CC_C (plus:GPI (match_dup ovf_commutate)
+ (match_dup <ovf_comm_opp>))
+ (match_dup ovf_commutate)))
+ (set (match_dup 3) (plus:GPI (match_dup ovf_commutate)
+ (match_dup <ovf_comm_opp>)))])
+ (set (match_dup 0)
+ (if_then_else:GPI (<ovf_add_cmp> (reg:CC_C CC_REGNUM)
+ (const_int 0))
+ (match_dup 3)
+ (match_dup ovf_commutate)))]
+ {
+ if (GET_CODE (operands[3]) == SCRATCH)
+ operands[3] = gen_reg_rtx (<MODE>mode);
At first I was a bit unsure about this idiom, since md.texi:9358 has
this to say about define_split (and therefore about
define_insn_and_split):
The @var{preparation-statements} are similar to those statements that
are specified for @code{define_expand} (@pxref{Expander Definitions})
and are executed before the new RTL is generated to prepare for the
generated code or emit some insns whose pattern is not fixed. Unlike
those in @code{define_expand}, however, these statements must not
generate any new pseudo-registers. Once reload has completed, they also
must not allocate any space in the stack frame.
but I see that we already use this idiom for a few patterns in aarch64-sve.md.
Would appreciate a second opinion on this. Either the existing SVE patterns are
wrong or (perhaps more likely) the md.texi docs are out of date.
+ }
+)
+
+;; In this case the ovf represents an underflow.
+;; umax (a, sub (a, b)) => [diff, ovf] = subs (a, b); ovf ? diff : a
+;; umin (a, sub (a, b)) => [diff, ovf] = subs (a, b); ovf ? a : diff
+(define_insn_and_split "*aarch64_minus_within_<optab><mode>3"
+ [(set (match_operand:GPI 0 "register_operand" "=r")
+ (UMAXMIN:GPI
+ (minus:GPI (match_operand:GPI 1 "register_operand" "r")
+ (match_operand:GPI 2 "register_operand" "r"))
+ (match_dup 1)))
+ (clobber (match_scratch:GPI 3))]
+ "!TARGET_CSSC"
+ "#"
+ "&& !reload_completed"
+ [(parallel
+ [(set (reg:CC_C CC_REGNUM)
+ (compare:CC_C (minus:GPI (match_dup 1) (match_dup 2))
+ (match_dup 1)))
+ (set (match_dup 3) (minus:GPI (match_dup 1) (match_dup 2)))])
+ ;; There is an inverse mapping here from CC_C -> CC as this requires the
+ ;; inverse of the comparison from the above pattern.
As mentioned above you should be able to do the whole thing in CCmode, which
should simplify things here.
+ (set (match_dup 0)
+ (if_then_else:GPI (<ovf_sub_cmp> (reg:CC CC_REGNUM)
+ (const_int 0))
+ (match_dup 3)
+ (match_dup 1)))]
+ {
+ if (GET_CODE (operands[3]) == SCRATCH)
+ operands[3] = gen_reg_rtx (<MODE>mode);
+ }
+)
+
;; -------------------------------------------------------------------
;; Comparison insns
;; -------------------------------------------------------------------
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index c3771d9402b..a29243247bd 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -2804,6 +2804,8 @@
(define_code_iterator FMAXMIN [smax smin])
+(define_code_iterator UMAXMIN [umax umin])
+
;; Signed and unsigned max operations.
(define_code_iterator USMAX [smax umax])
@@ -3092,6 +3094,9 @@
(define_code_attr maxminand [(smax "bic") (smin "and")])
+(define_code_attr ovf_add_cmp [(umax "geu") (umin "ltu")])
+(define_code_attr ovf_sub_cmp [(umax "ltu") (umin "geu")])
+
;; MLA/MLS attributes.
(define_code_attr as [(ss_plus "a") (ss_minus "s")])
@@ -5007,3 +5012,7 @@
(UNSPEC_F2CVT "f2cvt")
(UNSPEC_F1CVTLT "f1cvtlt")
(UNSPEC_F2CVTLT "f2cvtlt")])
+
+;; Operand numbers for commutative operations
+(define_int_iterator ovf_commutate [1 2])
Note that this iterator already exists with the name BSL_DUP. It might
be better to rename it to something more general and re-purpose it here,
rather than duplicating it.
Yeah, I saw that as well - using BSL_DUP and bsl_mov would achieve the same
purpose, but I am not sure what commonality between my usage and BSL can be
used for naming? ONE_TWO / two_one are the obvious names, but they don't seem
very good.
+(define_int_attr ovf_comm_opp [(1 "2") (2 "1")])
diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-1.c
b/gcc/testsuite/gcc.target/aarch64/pr116815-1.c
new file mode 100644
index 00000000000..f3bdb794318
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr116815-1.c
@@ -0,0 +1,120 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+/* PR middle-end/116815 */
+
+/* Single-use tests. */
+
+static inline unsigned __attribute__ ((always_inline))
+max (unsigned a, unsigned b)
+{
+ return a > b ? a : b;
+}
+
+static inline unsigned __attribute__ ((always_inline))
+min (unsigned a, unsigned b)
+{
+ return a < b ? a : b;
+}
+
+#define OPERATION(op, type, N, exp1, exp2)
\
+ unsigned u##op##type##N (unsigned a, unsigned b) { return op (exp1, exp2); }
+
+/*
+** umaxadd1:
+** adds (w[0-9]+), w0, w1
+** csel w0, \1, w0, cc
+** ret
+*/
+OPERATION (max, add, 1, a, a + b)
+
+/*
+** umaxadd2:
+** adds (w[0-9]+), w0, w1
+** csel w0, \1, w0, cc
+** ret
+*/
+OPERATION (max, add, 2, a, b + a)
+
+/*
+** umaxadd3:
+** adds (w[0-9]+), w0, w1
+** csel w0, \1, w0, cc
+** ret
+*/
+OPERATION (max, add, 3, a + b, a)
+
+/*
+** umaxadd4:
+** adds (w[0-9]+), w0, w1
+** csel w0, \1, w0, cc
+** ret
+*/
+OPERATION (max, add, 4, b + a, a)
+
+/*
+** uminadd1:
+** adds (w[0-9]+), w0, w1
+** csel w0, \1, w0, cs
+** ret
+*/
+OPERATION (min, add, 1, a, a + b)
+
+/*
+** uminadd2:
+** adds (w[0-9]+), w0, w1
+** csel w0, \1, w0, cs
+** ret
+*/
+OPERATION (min, add, 2, a, b + a)
+
+/*
+** uminadd3:
+** adds (w[0-9]+), w0, w1
+** csel w0, \1, w0, cs
+** ret
+*/
+OPERATION (min, add, 3, a + b, a)
+
+/*
+** uminadd4:
+** adds (w[0-9]+), w0, w1
+** csel w0, \1, w0, cs
+** ret
+*/
+OPERATION (min, add, 4, b + a, a)
+
+/* sub requires the inverse of the comparison from add. */
+
+/*
+** umaxsub1:
+** subs (w[0-9]+), w0, w1
+** csel w0, \1, w0, cc
+** ret
+*/
+OPERATION (max, sub, 1, a, a - b)
+
+/*
+** umaxsub2:
+** subs (w[0-9]+), w0, w1
+** csel w0, \1, w0, cc
+** ret
+*/
+OPERATION (max, sub, 2, a - b, a)
+
+/*
+** uminsub1:
+** subs (w[0-9]+), w0, w1
+** csel w0, \1, w0, cs
+** ret
+*/
+OPERATION (min, sub, 1, a, a - b)
+
+/*
+** uminsub2:
+** subs (w[0-9]+), w0, w1
+** csel w0, \1, w0, cs
+** ret
+*/
+OPERATION (min, sub, 2, a - b, a)
diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-2.c
b/gcc/testsuite/gcc.target/aarch64/pr116815-2.c
new file mode 100644
index 00000000000..f2dd7b0bbb3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr116815-2.c
@@ -0,0 +1,93 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* PR middle-end/116815 */
+
+/* Negative tests. */
I don't see the benefit in including negative tests here, the tests will
get optimized out by the middle-end anyway due to C overflow rules.
Sure. I have removed this testcase in the new revision.
+
+static inline int __attribute__ ((always_inline))
+smax (int a, int b)
+{
+ return a > b ? a : b;
+}
+
+static inline int __attribute__ ((always_inline))
+smin (int a, int b)
+{
+ return a < b ? a : b;
+}
+
+static inline unsigned __attribute__ ((always_inline))
+umax (unsigned a, unsigned b)
+{
+ return a > b ? a : b;
+}
+
+static inline unsigned __attribute__ ((always_inline))
+umin (unsigned a, unsigned b)
+{
+ return a < b ? a : b;
+}
+
+#define ASSUME(cond) if (!(cond)) __builtin_unreachable ();
+
+/* This transformation does not trigger on signed types. */
+
+int
+smax_add (int a, int b)
+{
+ ASSUME (b >= 0);
+ return smax (a, a + b);
+}
+
+int
+smin_add (int a, int b)
+{
+ ASSUME (b >= 0);
+ return smin (a, a + b);
+}
+
+int
+smax_sub (int a, int b)
+{
+ ASSUME (b >= 0);
+ return smax (a, a - b);
+}
+
+int
+smin_sub (int a, int b)
+{
+ ASSUME (b >= 0);
+ return smin (a, a - b);
+}
+
+/* Invalid patterns. */
+
+/* This can potentially be matched, but the RHS gets factored to
+ (a + b) * b. */
+unsigned
+umax_factored (unsigned a, unsigned b)
+{
+ return umax (a * b, a * b + b * b);
+}
+
+unsigned
+umin_mult (unsigned a, unsigned b)
+{
+ return umin (a, a * b);
+}
+
+unsigned
+umax_sub (unsigned a, unsigned b)
+{
+ return umax (a, b - a);
+}
+
+unsigned
+umin_sub (unsigned a, unsigned b)
+{
+ return umin (a, b - a);
+}
+
+/* { dg-final { scan-assembler-not "adds\\t" } } */
+/* { dg-final { scan-assembler-not "subs\\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-3.c
b/gcc/testsuite/gcc.target/aarch64/pr116815-3.c
new file mode 100644
index 00000000000..f8050377116
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr116815-3.c
@@ -0,0 +1,62 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+/* PR middle-end/116815 */
+
+/* Multi-use tests. */
+
+static inline unsigned __attribute__ ((always_inline))
+max (unsigned a, unsigned b)
+{
+ return a > b ? a : b;
+}
+
+static inline unsigned __attribute__ ((always_inline))
+min (unsigned a, unsigned b)
+{
+ return a < b ? a : b;
+}
+
+/* FIXME: This should only generate one adds. */
I appreciate you raising the shortcomings of the patch, but perhaps better just
to mention it in the cover letter rather than adding a test that explicitly
defends the suboptimal codegen. AIUI, check-function-bodies tests are intended
for when we're confident there is one optimal sequence of codegen that we want
to generate.
I have added the FIXME as a comment in the commit message, does that seem okay?
This testcase is actually a leftover from when the patch was a GIMPLE transform,
because this used to work before and is a regression after switching to a
target-specific transform. I have removed this testcase now.
You could also raise a missed-optimization PR once this patch lands to track the
opportunity for improvement. That's probably the best way to avoid this getting
forgotten about.
Makes sense, I will do that.
Note another possible opportunity for improvement is perhaps to extend
this to handle cases where one of the operands is a constant, but that
can be left for follow-on work.
+
+/*
+** umax_add_umin_add:
+** adds (w[0-9]+), w0, w1
+** csel \1, \1, w0, cc
+** adds (w[0-9]+), w1, w0
+** csel w0, \2, w1, cs
+** add w0, \1, \2
+** ret
+*/
+unsigned
+umax_add_umin_add (unsigned a, unsigned b)
+{
+ return max (a, a + b) + min (a + b, b);
+}
+
+/*
+** umin_add_umax_add:
+** adds (w[0-9]+), w0, w1
+** csel \1, \1, w0, cs
+** adds (w[0-9]+), w1, w0
+** csel \2, \2, w1, cc
+** add w0, \1, \2
+** ret
+*/
+unsigned
+umin_add_umax_add (unsigned a, unsigned b)
+{
+ return min (a, b + a) + max (b + a, b);
+}
+
+/* FIXME: This pattern does not get optimized. */
+
+unsigned
+multiple_paths (unsigned a, unsigned b)
+{
+ if (a > 5)
+ return max (a, a + b);
+ else
+ return min (a, a + b);
+}
diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-4.c
b/gcc/testsuite/gcc.target/aarch64/pr116815-4.c
new file mode 100644
index 00000000000..c1be921334b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr116815-4.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* PR middle-end/116815 */
+
+/* Single-use tests with a use of the min-max in an if-condition. */
What exactly is this test trying to achieve? It's not clear from a first glance.
This is also a leftover from the prior version which checks that multiple uses
across
basic blocks get optimized correctly. I have updated the comment to specify
this more
clearly, but I can remove the testcase if it doesn't seem strictly necessary.
Thanks,
Alex
+
+static inline unsigned __attribute__ ((always_inline))
+max (unsigned a, unsigned b)
+{
+ return a > b ? a : b;
+}
+
+static inline unsigned __attribute__ ((always_inline))
+min (unsigned a, unsigned b)
+{
+ return a < b ? a : b;
+}
+
+#define OPERATION(op, type, N, exp1, exp2)
\
+ unsigned u##op##type##N (unsigned a, unsigned b, unsigned c, unsigned d,
\
+ unsigned e) \
+ {
\
+ unsigned result = op (exp1, exp2);
\
+ if (result == c || result == c * 2)
\
+ return d;
\
+ else
\
+ return e;
\
+ }
+
+OPERATION (max, add, 1, a, a + b)
+OPERATION (max, add, 2, a, b + a)
+OPERATION (max, add, 3, a + b, a)
+OPERATION (max, add, 4, b + a, a)
+
+OPERATION (min, add, 1, a, a + b)
+OPERATION (min, add, 2, a, b + a)
+OPERATION (min, add, 3, a + b, a)
+OPERATION (min, add, 4, b + a, a)
+
+OPERATION (max, sub, 1, a, a - b)
+OPERATION (max, sub, 2, a - b, a)
+
+OPERATION (min, sub, 1, a, a - b)
+OPERATION (min, sub, 2, a - b, a)
+
+/* { dg-final { scan-assembler-times "adds\\t" 8 } } */
+/* { dg-final { scan-assembler-times "subs\\t" 4 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-5.c
b/gcc/testsuite/gcc.target/aarch64/pr116815-5.c
new file mode 100644
index 00000000000..015c868aec2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr116815-5.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#pragma GCC target "+cssc"
+
+/* PR middle-end/116815 */
+
+/* Make sure that umax/umin instructions are generated with CSSC. */
+
+static inline unsigned __attribute__ ((always_inline))
+max (unsigned a, unsigned b)
+{
+ return a > b ? a : b;
+}
+
+static inline unsigned __attribute__ ((always_inline))
+min (unsigned a, unsigned b)
+{
+ return a < b ? a : b;
+}
+
+#define OPERATION(op, type, N, exp1, exp2)
\
+ unsigned u##op##type##N (unsigned a, unsigned b) { return op (exp1, exp2); }
+
+OPERATION (max, add, 1, a, a + b)
+OPERATION (max, add, 2, a, b + a)
+OPERATION (max, add, 3, a + b, a)
+OPERATION (max, add, 4, b + a, a)
+
+OPERATION (min, add, 1, a, a + b)
+OPERATION (min, add, 2, a, b + a)
+OPERATION (min, add, 3, a + b, a)
+OPERATION (min, add, 4, b + a, a)
+
+OPERATION (max, sub, 1, a, a - b)
+OPERATION (max, sub, 2, a - b, a)
+
+OPERATION (min, sub, 1, a, a - b)
+OPERATION (min, sub, 2, a - b, a)
+
+/* { dg-final { scan-assembler-times "umax\\t" 6 } } */
+/* { dg-final { scan-assembler-times "umin\\t" 6 } } */
+/* { dg-final { scan-assembler-not "adds\\t" } } */
+/* { dg-final { scan-assembler-not "subs\\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-6.c
b/gcc/testsuite/gcc.target/aarch64/pr116815-6.c
new file mode 100644
index 00000000000..2b5407bcfc2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr116815-6.c
@@ -0,0 +1,60 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+/* PR middle-end/116815 */
+
+/* Verify that the transformation gives correct results */
+
+[[gnu::always_inline]]
+inline unsigned min (unsigned a, unsigned b)
+{
+ return (a < b) ? a : b;
+}
+
+[[gnu::always_inline]]
+inline unsigned max (unsigned a, unsigned b)
+{
+ return (a > b) ? a : b;
+}
+
+[[gnu::noipa]] unsigned
+umaxadd (unsigned a, unsigned b)
+{
+ return max (a + b, a);
+}
+
+[[gnu::noipa]] unsigned
+umaxsub (unsigned a, unsigned b)
+{
+ return max (a - b, a);
+}
+
+[[gnu::noipa]] unsigned
+uminadd (unsigned a, unsigned b)
+{
+ return min (a + b, a);
+}
+
+[[gnu::noipa]] unsigned
+uminsub (unsigned a, unsigned b)
+{
+ return min (a - b, a);
+}
+
+int
+main ()
+{
+ /* Overflows to 0x30000000. */
+ if (umaxadd (0x90000000, 0xa0000000) != 0x90000000)
+ __builtin_abort ();
+
+ if (uminadd (0x90000000, 0xa0000000) != 0x30000000)
+ __builtin_abort ();
+
+ /* Underflows to 0x60000000. */
+ if (umaxsub (0x00000000, 0xa0000000) != 0x60000000)
+ __builtin_abort ();
+
+ if (uminsub (0x00000000, 0xa0000000) != 0x00000000)
+ __builtin_abort ();
+}
--
2.44.0
--
Regards,
Dhruv
-- >8 --
From 588dca91b719168f8ff98b7e8ee4b395bccc7cda Mon Sep 17 00:00:00 2001
From: Dhruv Chawla <[email protected]>
Date: Wed, 23 Jul 2025 01:41:51 -0700
Subject: [PATCH] [aarch64] Make better use of overflowing operations in
max/min(a, add/sub(a, b)) [PR116815]
This patch folds the following patterns:
- For add:
- umax (a, add (a, b)) -> [sum, ovf] = adds (a, b); !ovf ? sum : a
- umin (a, add (a, b)) -> [sum, ovf] = adds (a, b); !ovf ? a : sum
... along with the commutated versions:
- umax (a, add (b, a)) -> [sum, ovf] = adds (b, a); !ovf ? sum : a
- umin (a, add (b, a)) -> [sum, ovf] = adds (b, a); !ovf ? a : sum
- For sub:
- umax (a, sub (a, b)) -> [diff, udf] = subs (a, b); udf ? diff : a
- umin (a, sub (a, b)) -> [diff, udf] = subs (a, b); udf ? a : diff
Where ovf is the overflow flag and udf is the underflow flag. adds and subs are
generated by generating parallel compare+plus/minus which map to
add<mode>3_compareC and sub<mode>3_compare1.
This patch is a respin of the patch posted at
https://gcc.gnu.org/pipermail/gcc-patches/2025-May/685021.html as per
the suggestion to turn it into a target-specific transform by Richard
Biener.
FIXME: This pattern cannot currently factor multiple occurences of the
add expression into a single adds, eg: max (a, a + b) + min (a + b, b)
ends up generating two adds instructions. This is something that
was lost when going from GIMPLE to target-specific transforms.
Bootstrapped and regtested on aarch64-unknown-linux-gnu.
Signed-off-by: Dhruv Chawla <[email protected]>
PR middle-end/116815
gcc/ChangeLog:
* config/aarch64/aarch64.md
(*aarch64_plus_within_<optab><mode>3_<ovf_commutate>): New pattern.
(*aarch64_minus_within_<optab><mode>3): Likewise.
* config/aarch64/iterators.md (ovf_add_cmp): New code attribute.
(udf_sub_cmp): Likewise.
(UMAXMIN): New code iterator.
(ovf_commutate): New iterator.
(ovf_comm_opp): New int attribute.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/pr116815-1.c: New test.
* gcc.target/aarch64/pr116815-2.c: Likewise.
* gcc.target/aarch64/pr116815-3.c: Likewise.
* gcc.target/aarch64/pr116815-4.c: Likewise.
---
gcc/config/aarch64/aarch64.md | 60 +++++++++
gcc/config/aarch64/iterators.md | 9 ++
gcc/testsuite/gcc.target/aarch64/pr116815-1.c | 120 ++++++++++++++++++
gcc/testsuite/gcc.target/aarch64/pr116815-2.c | 48 +++++++
gcc/testsuite/gcc.target/aarch64/pr116815-3.c | 44 +++++++
gcc/testsuite/gcc.target/aarch64/pr116815-4.c | 60 +++++++++
6 files changed, 341 insertions(+)
create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-1.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-2.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-3.c
create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-4.c
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 98c65a74c8e..307df4e92ba 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -4488,6 +4488,66 @@
[(set_attr "type" "<su>div")]
)
+;; umax (a, add (a, b)) => [sum, ovf] = adds (a, b); !ovf ? sum : a
+;; umin (a, add (a, b)) => [sum, ovf] = adds (a, b); !ovf ? a : sum
+;; ... and the commutated versions:
+;; umax (a, add (b, a)) => [sum, ovf] = adds (b, a); !ovf ? sum : a
+;; umin (a, add (b, a)) => [sum, ovf] = adds (b, a); !ovf ? a : sum
+(define_insn_and_split "*aarch64_plus_within_<optab><mode>3_<ovf_commutate>"
+ [(set (match_operand:GPI 0 "register_operand" "=r")
+ (UMAXMIN:GPI
+ (plus:GPI (match_operand:GPI 1 "register_operand" "r")
+ (match_operand:GPI 2 "register_operand" "r"))
+ (match_dup ovf_commutate)))
+ (clobber (match_scratch:GPI 3))]
+ "!TARGET_CSSC"
+ "#"
+ "&& !reload_completed"
+ [(parallel
+ [(set (reg:CC_C CC_REGNUM)
+ (compare:CC_C (plus:GPI (match_dup ovf_commutate)
+ (match_dup <ovf_comm_opp>))
+ (match_dup ovf_commutate)))
+ (set (match_dup 3) (plus:GPI (match_dup ovf_commutate)
+ (match_dup <ovf_comm_opp>)))])
+ (set (match_dup 0)
+ (if_then_else:GPI (<ovf_add_cmp> (reg:CC_C CC_REGNUM)
+ (const_int 0))
+ (match_dup 3)
+ (match_dup ovf_commutate)))]
+ {
+ if (GET_CODE (operands[3]) == SCRATCH)
+ operands[3] = gen_reg_rtx (<MODE>mode);
+ }
+)
+
+;; umax (a, sub (a, b)) => [diff, udf] = subs (a, b); udf ? diff : a
+;; umin (a, sub (a, b)) => [diff, udf] = subs (a, b); udf ? a : diff
+(define_insn_and_split "*aarch64_minus_within_<optab><mode>3"
+ [(set (match_operand:GPI 0 "register_operand" "=r")
+ (UMAXMIN:GPI
+ (minus:GPI (match_operand:GPI 1 "register_operand" "r")
+ (match_operand:GPI 2 "register_operand" "r"))
+ (match_dup 1)))
+ (clobber (match_scratch:GPI 3))]
+ "!TARGET_CSSC"
+ "#"
+ "&& !reload_completed"
+ [(parallel
+ [(set (reg:CC CC_REGNUM)
+ (compare:CC (match_dup 1) (match_dup 2)))
+ (set (match_dup 3) (minus:GPI (match_dup 1) (match_dup 2)))])
+ (set (match_dup 0)
+ (if_then_else:GPI (<udf_sub_cmp> (reg:CC CC_REGNUM)
+ (const_int 0))
+ (match_dup 3)
+ (match_dup 1)))]
+ {
+ if (GET_CODE (operands[3]) == SCRATCH)
+ operands[3] = gen_reg_rtx (<MODE>mode);
+ }
+)
+
;; -------------------------------------------------------------------
;; Comparison insns
;; -------------------------------------------------------------------
diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
index 517b2808b5f..74186cfae8c 100644
--- a/gcc/config/aarch64/iterators.md
+++ b/gcc/config/aarch64/iterators.md
@@ -2827,6 +2827,8 @@
(define_code_iterator FMAXMIN [smax smin])
+(define_code_iterator UMAXMIN [umax umin])
+
;; Signed and unsigned max operations.
(define_code_iterator USMAX [smax umax])
@@ -3115,6 +3117,9 @@
(define_code_attr maxminand [(smax "bic") (smin "and")])
+(define_code_attr ovf_add_cmp [(umax "geu") (umin "ltu")])
+(define_code_attr udf_sub_cmp [(umax "ltu") (umin "geu")])
+
;; MLA/MLS attributes.
(define_code_attr as [(ss_plus "a") (ss_minus "s")])
@@ -5126,3 +5131,7 @@
(UNSPEC_F2CVT "f2cvt")
(UNSPEC_F1CVTLT "f1cvtlt")
(UNSPEC_F2CVTLT "f2cvtlt")])
+
+;; Operand numbers for commutative operations
+(define_int_iterator ovf_commutate [1 2])
+(define_int_attr ovf_comm_opp [(1 "2") (2 "1")])
diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-1.c
b/gcc/testsuite/gcc.target/aarch64/pr116815-1.c
new file mode 100644
index 00000000000..f3bdb794318
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr116815-1.c
@@ -0,0 +1,120 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+/* { dg-final { check-function-bodies "**" "" "" } } */
+
+/* PR middle-end/116815 */
+
+/* Single-use tests. */
+
+static inline unsigned __attribute__ ((always_inline))
+max (unsigned a, unsigned b)
+{
+ return a > b ? a : b;
+}
+
+static inline unsigned __attribute__ ((always_inline))
+min (unsigned a, unsigned b)
+{
+ return a < b ? a : b;
+}
+
+#define OPERATION(op, type, N, exp1, exp2)
\
+ unsigned u##op##type##N (unsigned a, unsigned b) { return op (exp1, exp2); }
+
+/*
+** umaxadd1:
+** adds (w[0-9]+), w0, w1
+** csel w0, \1, w0, cc
+** ret
+*/
+OPERATION (max, add, 1, a, a + b)
+
+/*
+** umaxadd2:
+** adds (w[0-9]+), w0, w1
+** csel w0, \1, w0, cc
+** ret
+*/
+OPERATION (max, add, 2, a, b + a)
+
+/*
+** umaxadd3:
+** adds (w[0-9]+), w0, w1
+** csel w0, \1, w0, cc
+** ret
+*/
+OPERATION (max, add, 3, a + b, a)
+
+/*
+** umaxadd4:
+** adds (w[0-9]+), w0, w1
+** csel w0, \1, w0, cc
+** ret
+*/
+OPERATION (max, add, 4, b + a, a)
+
+/*
+** uminadd1:
+** adds (w[0-9]+), w0, w1
+** csel w0, \1, w0, cs
+** ret
+*/
+OPERATION (min, add, 1, a, a + b)
+
+/*
+** uminadd2:
+** adds (w[0-9]+), w0, w1
+** csel w0, \1, w0, cs
+** ret
+*/
+OPERATION (min, add, 2, a, b + a)
+
+/*
+** uminadd3:
+** adds (w[0-9]+), w0, w1
+** csel w0, \1, w0, cs
+** ret
+*/
+OPERATION (min, add, 3, a + b, a)
+
+/*
+** uminadd4:
+** adds (w[0-9]+), w0, w1
+** csel w0, \1, w0, cs
+** ret
+*/
+OPERATION (min, add, 4, b + a, a)
+
+/* sub requires the inverse of the comparison from add. */
+
+/*
+** umaxsub1:
+** subs (w[0-9]+), w0, w1
+** csel w0, \1, w0, cc
+** ret
+*/
+OPERATION (max, sub, 1, a, a - b)
+
+/*
+** umaxsub2:
+** subs (w[0-9]+), w0, w1
+** csel w0, \1, w0, cc
+** ret
+*/
+OPERATION (max, sub, 2, a - b, a)
+
+/*
+** uminsub1:
+** subs (w[0-9]+), w0, w1
+** csel w0, \1, w0, cs
+** ret
+*/
+OPERATION (min, sub, 1, a, a - b)
+
+/*
+** uminsub2:
+** subs (w[0-9]+), w0, w1
+** csel w0, \1, w0, cs
+** ret
+*/
+OPERATION (min, sub, 2, a - b, a)
diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-2.c
b/gcc/testsuite/gcc.target/aarch64/pr116815-2.c
new file mode 100644
index 00000000000..eb1fdb94797
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr116815-2.c
@@ -0,0 +1,48 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+/* PR middle-end/116815 */
+
+/* Multi-use testcase across basic blocks. */
+
+static inline unsigned __attribute__ ((always_inline))
+max (unsigned a, unsigned b)
+{
+ return a > b ? a : b;
+}
+
+static inline unsigned __attribute__ ((always_inline))
+min (unsigned a, unsigned b)
+{
+ return a < b ? a : b;
+}
+
+#define OPERATION(op, type, N, exp1, exp2)
\
+ unsigned u##op##type##N (unsigned a, unsigned b, unsigned c, unsigned d,
\
+ unsigned e) \
+ {
\
+ unsigned result = op (exp1, exp2);
\
+ if (result == c || result == c * 2)
\
+ return d;
\
+ else
\
+ return e;
\
+ }
+
+OPERATION (max, add, 1, a, a + b)
+OPERATION (max, add, 2, a, b + a)
+OPERATION (max, add, 3, a + b, a)
+OPERATION (max, add, 4, b + a, a)
+
+OPERATION (min, add, 1, a, a + b)
+OPERATION (min, add, 2, a, b + a)
+OPERATION (min, add, 3, a + b, a)
+OPERATION (min, add, 4, b + a, a)
+
+OPERATION (max, sub, 1, a, a - b)
+OPERATION (max, sub, 2, a - b, a)
+
+OPERATION (min, sub, 1, a, a - b)
+OPERATION (min, sub, 2, a - b, a)
+
+/* { dg-final { scan-assembler-times "adds\\t" 8 } } */
+/* { dg-final { scan-assembler-times "subs\\t" 4 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-3.c
b/gcc/testsuite/gcc.target/aarch64/pr116815-3.c
new file mode 100644
index 00000000000..015c868aec2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr116815-3.c
@@ -0,0 +1,44 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+#pragma GCC target "+cssc"
+
+/* PR middle-end/116815 */
+
+/* Make sure that umax/umin instructions are generated with CSSC. */
+
+static inline unsigned __attribute__ ((always_inline))
+max (unsigned a, unsigned b)
+{
+ return a > b ? a : b;
+}
+
+static inline unsigned __attribute__ ((always_inline))
+min (unsigned a, unsigned b)
+{
+ return a < b ? a : b;
+}
+
+#define OPERATION(op, type, N, exp1, exp2)
\
+ unsigned u##op##type##N (unsigned a, unsigned b) { return op (exp1, exp2); }
+
+OPERATION (max, add, 1, a, a + b)
+OPERATION (max, add, 2, a, b + a)
+OPERATION (max, add, 3, a + b, a)
+OPERATION (max, add, 4, b + a, a)
+
+OPERATION (min, add, 1, a, a + b)
+OPERATION (min, add, 2, a, b + a)
+OPERATION (min, add, 3, a + b, a)
+OPERATION (min, add, 4, b + a, a)
+
+OPERATION (max, sub, 1, a, a - b)
+OPERATION (max, sub, 2, a - b, a)
+
+OPERATION (min, sub, 1, a, a - b)
+OPERATION (min, sub, 2, a - b, a)
+
+/* { dg-final { scan-assembler-times "umax\\t" 6 } } */
+/* { dg-final { scan-assembler-times "umin\\t" 6 } } */
+/* { dg-final { scan-assembler-not "adds\\t" } } */
+/* { dg-final { scan-assembler-not "subs\\t" } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-4.c
b/gcc/testsuite/gcc.target/aarch64/pr116815-4.c
new file mode 100644
index 00000000000..d262d2170f3
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/pr116815-4.c
@@ -0,0 +1,60 @@
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+/* PR middle-end/116815 */
+
+/* Verify that the transformation gives correct results */
+
+static inline unsigned __attribute__ ((always_inline))
+min (unsigned a, unsigned b)
+{
+ return (a < b) ? a : b;
+}
+
+static inline unsigned __attribute__ ((always_inline))
+max (unsigned a, unsigned b)
+{
+ return (a > b) ? a : b;
+}
+
+__attribute__ ((noipa)) unsigned
+umaxadd (unsigned a, unsigned b)
+{
+ return max (a + b, a);
+}
+
+__attribute__ ((noipa)) unsigned
+umaxsub (unsigned a, unsigned b)
+{
+ return max (a - b, a);
+}
+
+__attribute__ ((noipa)) unsigned
+uminadd (unsigned a, unsigned b)
+{
+ return min (a + b, a);
+}
+
+__attribute__ ((noipa)) unsigned
+uminsub (unsigned a, unsigned b)
+{
+ return min (a - b, a);
+}
+
+int
+main ()
+{
+ /* Overflows to 0x30000000. */
+ if (umaxadd (0x90000000, 0xa0000000) != 0x90000000)
+ __builtin_abort ();
+
+ if (uminadd (0x90000000, 0xa0000000) != 0x30000000)
+ __builtin_abort ();
+
+ /* Underflows to 0x60000000. */
+ if (umaxsub (0x00000000, 0xa0000000) != 0x60000000)
+ __builtin_abort ();
+
+ if (uminsub (0x00000000, 0xa0000000) != 0x00000000)
+ __builtin_abort ();
+}
--
2.44.0