On Tue, Oct 28, 2025 at 9:35 AM Alex Coplan <[email protected]> wrote:
>
> Hi Dhruv,
>
> Sorry for the long wait on this one. Comments below ...
>
> On 18/08/2025 21:31, [email protected] wrote:
> > From: Dhruv Chawla <[email protected]>
> >
> > This patch folds the following patterns:
> > - umax (a, add (a, b)) -> [sum, ovf] = adds (a, b); !ovf ? sum : a
> > - umin (a, add (a, b)) -> [sum, ovf] = adds (a, b); !ovf ? a : sum
> > - umax (a, sub (a, b)) -> [diff, ovf] = subs (a, b); ovf ? diff : a
> > - umin (a, sub (a, b)) -> [diff, ovf] = subs (a, b); ovf ? a : diff
> >
> > Where ovf is the overflow flag. adds and subs are generated by
> > generating a parallel compare+plus/minus which maps to the pattern
> > add<mode>3_compareC. sub<mode>3_compareC is also created to have an
> > equivalent pattern for the subs instruction. In the case of subs, the
> > overflow flag represents underflow.
> >
> > This patch is a respin of the patch posted at
> > https://gcc.gnu.org/pipermail/gcc-patches/2025-May/685021.html as per
> > the suggestion to turn it into a target-specific transform by Richard
> > Biener.
> >
> > Bootstrapped and regtested on aarch64-unknown-linux-gnu.
> >
> > Signed-off-by: Dhruv Chawla <[email protected]>
> >
> > PR middle-end/116815
> >
> > gcc/ChangeLog:
> >
> > * config/aarch64/aarch64.md (sub<mode>3_compareC): New pattern.
> > (*aarch64_plus_within_<optab><mode>3_<ovf_commutate>): Likewise.
> > (*aarch64_minus_within_<optab><mode>3): Likewise.
> > * config/aarch64/iterators.md (ovf_add_cmp): New code attribute.
> > (ovf_sub_cmp): Likewise.
> > (ovf_commutate): New iterator.
> > (ovf_comm_opp): New int attribute.
> >
> > gcc/testsuite/ChangeLog:
> >
> > * gcc.target/aarch64/pr116815-1.c: New test.
> > * gcc.target/aarch64/pr116815-2.c: Likewise.
> > * gcc.target/aarch64/pr116815-3.c: Likewise.
> > * gcc.target/aarch64/pr116815-4.c: Likewise.
> > * gcc.target/aarch64/pr116815-5.c: Likewise.
> > * gcc.target/aarch64/pr116815-6.c: Likewise.
> > ---
> > gcc/config/aarch64/aarch64.md | 76 +++++++++++
> > gcc/config/aarch64/iterators.md | 9 ++
> > gcc/testsuite/gcc.target/aarch64/pr116815-1.c | 120 ++++++++++++++++++
> > gcc/testsuite/gcc.target/aarch64/pr116815-2.c | 93 ++++++++++++++
> > gcc/testsuite/gcc.target/aarch64/pr116815-3.c | 62 +++++++++
> > gcc/testsuite/gcc.target/aarch64/pr116815-4.c | 48 +++++++
> > gcc/testsuite/gcc.target/aarch64/pr116815-5.c | 44 +++++++
> > gcc/testsuite/gcc.target/aarch64/pr116815-6.c | 60 +++++++++
> > 8 files changed, 512 insertions(+)
> > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-1.c
> > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-2.c
> > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-3.c
> > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-4.c
> > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-5.c
> > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr116815-6.c
> >
> > diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> > index 8e431a10cb1..a55fe5c85d8 100644
> > --- a/gcc/config/aarch64/aarch64.md
> > +++ b/gcc/config/aarch64/aarch64.md
> > @@ -3715,6 +3715,20 @@
> > [(set_attr "type" "alus_sreg")]
> > )
> >
> > +;; An equivalent to add<mode>3_compareC
> > +(define_insn "sub<mode>3_compareC"
> > + [(set (reg:CC_C CC_REGNUM)
> > + (compare:CC_C
> > + (minus:GPI
> > + (match_operand:GPI 1 "register_operand" "r")
> > + (match_operand:GPI 2 "register_operand" "r"))
> > + (match_dup 1)))
> > + (set (match_operand:GPI 0 "register_operand" "=r")
> > + (minus:GPI (match_dup 1) (match_dup 2)))]
> > + ""
> > + "subs\t%<w>0, %<w>1, %<w>2"
> > +)
> > +
>
> I don't think we want/need this pattern. subs(a,b) underflows precisely
> when a < b, i.e. when a - b < 0 so you can just use a plain CCmode
> comparison. I think you can make use of the existing
> sub<mode>3_compare1 pattern here.
>
> This should simplify the define_insn_and_split for the subs case below.
>
> > (define_peephole2
> > [(set (match_operand:GPI 0 "aarch64_general_reg")
> > (minus:GPI (match_operand:GPI 1 "aarch64_reg_or_zero")
> > @@ -4455,6 +4469,68 @@
> > [(set_attr "type" "<su>div")]
> > )
> >
> > +;; umax (a, add (a, b)) => [sum, ovf] = adds (a, b); !ovf ? sum : a
> > +;; umin (a, add (a, b)) => [sum, ovf] = adds (a, b); !ovf ? a : sum
> > +;; ... along with the commutative version of add (a, b) i.e. add (b, a).
>
> For the sake of saving one comment line, it might be better to actually
> spell out the commutated variant, AIUI e.g. for umax that's:
> umax (b, add (a, b)) => [sum, ovf] = adds (b, a); !ovf ? sum : b
>
> > +(define_insn_and_split
> > "*aarch64_plus_within_<optab><mode>3_<ovf_commutate>"
> > + [(set (match_operand:GPI 0 "register_operand" "=r")
> > + (UMAXMIN:GPI
> > + (plus:GPI (match_operand:GPI 1 "register_operand" "r")
> > + (match_operand:GPI 2 "register_operand" "r"))
> > + (match_dup ovf_commutate)))
> > + (clobber (match_scratch:GPI 3))]
>
> I think you might want an "=r" constraint on the match_scratch here.
In this case since the split is only allowed pre-RA I think not having
"=r" is correct
BUT I don't see anything blocking this pattern from matching post RA.
So I think we want to add the "=r" and remove the !reload_completed
check below just in case. I doubt it will match because it requires
the combining of a few (more than 2) different instructions together
which does not happen post RA.
>
> > + "!TARGET_CSSC"
> > + "#"
> > + "&& !reload_completed"
> > + [(parallel
> > + [(set (reg:CC_C CC_REGNUM)
> > + (compare:CC_C (plus:GPI (match_dup ovf_commutate)
> > + (match_dup <ovf_comm_opp>))
> > + (match_dup ovf_commutate)))
> > + (set (match_dup 3) (plus:GPI (match_dup ovf_commutate)
> > + (match_dup <ovf_comm_opp>)))])
> > + (set (match_dup 0)
> > + (if_then_else:GPI (<ovf_add_cmp> (reg:CC_C CC_REGNUM)
> > + (const_int 0))
> > + (match_dup 3)
> > + (match_dup ovf_commutate)))]
> > + {
> > + if (GET_CODE (operands[3]) == SCRATCH)
> > + operands[3] = gen_reg_rtx (<MODE>mode);
>
> At first I was a bit unsure about this idiom, since md.texi:9358 has
> this to say about define_split (and therefore about
> define_insn_and_split):
>
> The @var{preparation-statements} are similar to those statements that
> are specified for @code{define_expand} (@pxref{Expander Definitions})
> and are executed before the new RTL is generated to prepare for the
> generated code or emit some insns whose pattern is not fixed. Unlike
> those in @code{define_expand}, however, these statements must not
> generate any new pseudo-registers. Once reload has completed, they also
> must not allocate any space in the stack frame.
>
> but I see that we already use this idiom for a few patterns in aarch64-sve.md.
> Would appreciate a second opinion on this. Either the existing SVE patterns
> are
> wrong or (perhaps more likely) the md.texi docs are out of date.
The documentation was not updated many years ago when the support for
this was done.
See https://gcc.gnu.org/legacy-ml/gcc/2007-07/msg00013.html and
https://inbox.sourceware.org/gcc-patches/[email protected]/
(and a few others) forgot to update
the documentation. Yes, it is a long long time ago. But basically
define_split can create new pseudo-registers if pre RA.
Basically pre-DF new pseudos caused issues after expand because things
were more static back then.
Thanks,
Andrew Pinski
>
> > + }
> > +)
> > +
> > +;; In this case the ovf represents an underflow.
> > +;; umax (a, sub (a, b)) => [diff, ovf] = subs (a, b); ovf ? diff : a
> > +;; umin (a, sub (a, b)) => [diff, ovf] = subs (a, b); ovf ? a : diff
> > +(define_insn_and_split "*aarch64_minus_within_<optab><mode>3"
> > + [(set (match_operand:GPI 0 "register_operand" "=r")
> > + (UMAXMIN:GPI
> > + (minus:GPI (match_operand:GPI 1 "register_operand" "r")
> > + (match_operand:GPI 2 "register_operand" "r"))
> > + (match_dup 1)))
> > + (clobber (match_scratch:GPI 3))]
> > + "!TARGET_CSSC"
> > + "#"
> > + "&& !reload_completed"
> > + [(parallel
> > + [(set (reg:CC_C CC_REGNUM)
> > + (compare:CC_C (minus:GPI (match_dup 1) (match_dup 2))
> > + (match_dup 1)))
> > + (set (match_dup 3) (minus:GPI (match_dup 1) (match_dup 2)))])
> > + ;; There is an inverse mapping here from CC_C -> CC as this requires the
> > + ;; inverse of the comparison from the above pattern.
>
> As mentioned above you should be able to do the whole thing in CCmode, which
> should simplify things here.
>
> > + (set (match_dup 0)
> > + (if_then_else:GPI (<ovf_sub_cmp> (reg:CC CC_REGNUM)
> > + (const_int 0))
> > + (match_dup 3)
> > + (match_dup 1)))]
> > + {
> > + if (GET_CODE (operands[3]) == SCRATCH)
> > + operands[3] = gen_reg_rtx (<MODE>mode);
> > + }
> > +)
> > +
> > ;; -------------------------------------------------------------------
> > ;; Comparison insns
> > ;; -------------------------------------------------------------------
> > diff --git a/gcc/config/aarch64/iterators.md
> > b/gcc/config/aarch64/iterators.md
> > index c3771d9402b..a29243247bd 100644
> > --- a/gcc/config/aarch64/iterators.md
> > +++ b/gcc/config/aarch64/iterators.md
> > @@ -2804,6 +2804,8 @@
> >
> > (define_code_iterator FMAXMIN [smax smin])
> >
> > +(define_code_iterator UMAXMIN [umax umin])
> > +
> > ;; Signed and unsigned max operations.
> > (define_code_iterator USMAX [smax umax])
> >
> > @@ -3092,6 +3094,9 @@
> >
> > (define_code_attr maxminand [(smax "bic") (smin "and")])
> >
> > +(define_code_attr ovf_add_cmp [(umax "geu") (umin "ltu")])
> > +(define_code_attr ovf_sub_cmp [(umax "ltu") (umin "geu")])
> > +
> > ;; MLA/MLS attributes.
> > (define_code_attr as [(ss_plus "a") (ss_minus "s")])
> >
> > @@ -5007,3 +5012,7 @@
> > (UNSPEC_F2CVT "f2cvt")
> > (UNSPEC_F1CVTLT "f1cvtlt")
> > (UNSPEC_F2CVTLT "f2cvtlt")])
> > +
> > +;; Operand numbers for commutative operations
> > +(define_int_iterator ovf_commutate [1 2])
>
> Note that this iterator already exists with the name BSL_DUP. It might
> be better to rename it to something more general and re-purpose it here,
> rather than duplicating it.
>
> > +(define_int_attr ovf_comm_opp [(1 "2") (2 "1")])
> > diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-1.c
> > b/gcc/testsuite/gcc.target/aarch64/pr116815-1.c
> > new file mode 100644
> > index 00000000000..f3bdb794318
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/pr116815-1.c
> > @@ -0,0 +1,120 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" } */
> > +/* { dg-final { check-function-bodies "**" "" "" } } */
> > +
> > +/* PR middle-end/116815 */
> > +
> > +/* Single-use tests. */
> > +
> > +static inline unsigned __attribute__ ((always_inline))
> > +max (unsigned a, unsigned b)
> > +{
> > + return a > b ? a : b;
> > +}
> > +
> > +static inline unsigned __attribute__ ((always_inline))
> > +min (unsigned a, unsigned b)
> > +{
> > + return a < b ? a : b;
> > +}
> > +
> > +#define OPERATION(op, type, N, exp1, exp2)
> > \
> > + unsigned u##op##type##N (unsigned a, unsigned b) { return op (exp1,
> > exp2); }
> > +
> > +/*
> > +** umaxadd1:
> > +** adds (w[0-9]+), w0, w1
> > +** csel w0, \1, w0, cc
> > +** ret
> > +*/
> > +OPERATION (max, add, 1, a, a + b)
> > +
> > +/*
> > +** umaxadd2:
> > +** adds (w[0-9]+), w0, w1
> > +** csel w0, \1, w0, cc
> > +** ret
> > +*/
> > +OPERATION (max, add, 2, a, b + a)
> > +
> > +/*
> > +** umaxadd3:
> > +** adds (w[0-9]+), w0, w1
> > +** csel w0, \1, w0, cc
> > +** ret
> > +*/
> > +OPERATION (max, add, 3, a + b, a)
> > +
> > +/*
> > +** umaxadd4:
> > +** adds (w[0-9]+), w0, w1
> > +** csel w0, \1, w0, cc
> > +** ret
> > +*/
> > +OPERATION (max, add, 4, b + a, a)
> > +
> > +/*
> > +** uminadd1:
> > +** adds (w[0-9]+), w0, w1
> > +** csel w0, \1, w0, cs
> > +** ret
> > +*/
> > +OPERATION (min, add, 1, a, a + b)
> > +
> > +/*
> > +** uminadd2:
> > +** adds (w[0-9]+), w0, w1
> > +** csel w0, \1, w0, cs
> > +** ret
> > +*/
> > +OPERATION (min, add, 2, a, b + a)
> > +
> > +/*
> > +** uminadd3:
> > +** adds (w[0-9]+), w0, w1
> > +** csel w0, \1, w0, cs
> > +** ret
> > +*/
> > +OPERATION (min, add, 3, a + b, a)
> > +
> > +/*
> > +** uminadd4:
> > +** adds (w[0-9]+), w0, w1
> > +** csel w0, \1, w0, cs
> > +** ret
> > +*/
> > +OPERATION (min, add, 4, b + a, a)
> > +
> > +/* sub requires the inverse of the comparison from add. */
> > +
> > +/*
> > +** umaxsub1:
> > +** subs (w[0-9]+), w0, w1
> > +** csel w0, \1, w0, cc
> > +** ret
> > +*/
> > +OPERATION (max, sub, 1, a, a - b)
> > +
> > +/*
> > +** umaxsub2:
> > +** subs (w[0-9]+), w0, w1
> > +** csel w0, \1, w0, cc
> > +** ret
> > +*/
> > +OPERATION (max, sub, 2, a - b, a)
> > +
> > +/*
> > +** uminsub1:
> > +** subs (w[0-9]+), w0, w1
> > +** csel w0, \1, w0, cs
> > +** ret
> > +*/
> > +OPERATION (min, sub, 1, a, a - b)
> > +
> > +/*
> > +** uminsub2:
> > +** subs (w[0-9]+), w0, w1
> > +** csel w0, \1, w0, cs
> > +** ret
> > +*/
> > +OPERATION (min, sub, 2, a - b, a)
> > diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-2.c
> > b/gcc/testsuite/gcc.target/aarch64/pr116815-2.c
> > new file mode 100644
> > index 00000000000..f2dd7b0bbb3
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/pr116815-2.c
> > @@ -0,0 +1,93 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" } */
> > +
> > +/* PR middle-end/116815 */
> > +
> > +/* Negative tests. */
>
> I don't see the benefit in including negative tests here, the tests will
> get optimized out by the middle-end anyway due to C overflow rules.
>
> > +
> > +static inline int __attribute__ ((always_inline))
> > +smax (int a, int b)
> > +{
> > + return a > b ? a : b;
> > +}
> > +
> > +static inline int __attribute__ ((always_inline))
> > +smin (int a, int b)
> > +{
> > + return a < b ? a : b;
> > +}
> > +
> > +static inline unsigned __attribute__ ((always_inline))
> > +umax (unsigned a, unsigned b)
> > +{
> > + return a > b ? a : b;
> > +}
> > +
> > +static inline unsigned __attribute__ ((always_inline))
> > +umin (unsigned a, unsigned b)
> > +{
> > + return a < b ? a : b;
> > +}
> > +
> > +#define ASSUME(cond) if (!(cond)) __builtin_unreachable ();
> > +
> > +/* This transformation does not trigger on signed types. */
> > +
> > +int
> > +smax_add (int a, int b)
> > +{
> > + ASSUME (b >= 0);
> > + return smax (a, a + b);
> > +}
> > +
> > +int
> > +smin_add (int a, int b)
> > +{
> > + ASSUME (b >= 0);
> > + return smin (a, a + b);
> > +}
> > +
> > +int
> > +smax_sub (int a, int b)
> > +{
> > + ASSUME (b >= 0);
> > + return smax (a, a - b);
> > +}
> > +
> > +int
> > +smin_sub (int a, int b)
> > +{
> > + ASSUME (b >= 0);
> > + return smin (a, a - b);
> > +}
> > +
> > +/* Invalid patterns. */
> > +
> > +/* This can potentially be matched, but the RHS gets factored to
> > + (a + b) * b. */
> > +unsigned
> > +umax_factored (unsigned a, unsigned b)
> > +{
> > + return umax (a * b, a * b + b * b);
> > +}
> > +
> > +unsigned
> > +umin_mult (unsigned a, unsigned b)
> > +{
> > + return umin (a, a * b);
> > +}
> > +
> > +unsigned
> > +umax_sub (unsigned a, unsigned b)
> > +{
> > + return umax (a, b - a);
> > +}
> > +
> > +unsigned
> > +umin_sub (unsigned a, unsigned b)
> > +{
> > + return umin (a, b - a);
> > +}
> > +
> > +/* { dg-final { scan-assembler-not "adds\\t" } } */
> > +/* { dg-final { scan-assembler-not "subs\\t" } } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-3.c
> > b/gcc/testsuite/gcc.target/aarch64/pr116815-3.c
> > new file mode 100644
> > index 00000000000..f8050377116
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/pr116815-3.c
> > @@ -0,0 +1,62 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" } */
> > +/* { dg-final { check-function-bodies "**" "" "" } } */
> > +
> > +/* PR middle-end/116815 */
> > +
> > +/* Multi-use tests. */
> > +
> > +static inline unsigned __attribute__ ((always_inline))
> > +max (unsigned a, unsigned b)
> > +{
> > + return a > b ? a : b;
> > +}
> > +
> > +static inline unsigned __attribute__ ((always_inline))
> > +min (unsigned a, unsigned b)
> > +{
> > + return a < b ? a : b;
> > +}
> > +
> > +/* FIXME: This should only generate one adds. */
>
> I appreciate you raising the shortcomings of the patch, but perhaps better
> just
> to mention it in the cover letter rather than adding a test that explicitly
> defends the suboptimal codegen. AIUI, check-function-bodies tests are
> intended
> for when we're confident there is one optimal sequence of codegen that we want
> to generate.
>
> You could also raise a missed-optimization PR once this patch lands to track
> the
> opportunity for improvement. That's probably the best way to avoid this
> getting
> forgotten about.
>
> Note another possible opportunity for improvement is perhaps to extend
> this to handle cases where one of the operands is a constant, but that
> can be left for follow-on work.
>
> > +
> > +/*
> > +** umax_add_umin_add:
> > +** adds (w[0-9]+), w0, w1
> > +** csel \1, \1, w0, cc
> > +** adds (w[0-9]+), w1, w0
> > +** csel w0, \2, w1, cs
> > +** add w0, \1, \2
> > +** ret
> > +*/
> > +unsigned
> > +umax_add_umin_add (unsigned a, unsigned b)
> > +{
> > + return max (a, a + b) + min (a + b, b);
> > +}
> > +
> > +/*
> > +** umin_add_umax_add:
> > +** adds (w[0-9]+), w0, w1
> > +** csel \1, \1, w0, cs
> > +** adds (w[0-9]+), w1, w0
> > +** csel \2, \2, w1, cc
> > +** add w0, \1, \2
> > +** ret
> > +*/
> > +unsigned
> > +umin_add_umax_add (unsigned a, unsigned b)
> > +{
> > + return min (a, b + a) + max (b + a, b);
> > +}
> > +
> > +/* FIXME: This pattern does not get optimized. */
> > +
> > +unsigned
> > +multiple_paths (unsigned a, unsigned b)
> > +{
> > + if (a > 5)
> > + return max (a, a + b);
> > + else
> > + return min (a, a + b);
> > +}
> > diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-4.c
> > b/gcc/testsuite/gcc.target/aarch64/pr116815-4.c
> > new file mode 100644
> > index 00000000000..c1be921334b
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/pr116815-4.c
> > @@ -0,0 +1,48 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" } */
> > +
> > +/* PR middle-end/116815 */
> > +
> > +/* Single-use tests with a use of the min-max in an if-condition. */
>
> What exactly is this test trying to achieve? It's not clear from a first
> glance.
>
> Thanks,
> Alex
>
> > +
> > +static inline unsigned __attribute__ ((always_inline))
> > +max (unsigned a, unsigned b)
> > +{
> > + return a > b ? a : b;
> > +}
> > +
> > +static inline unsigned __attribute__ ((always_inline))
> > +min (unsigned a, unsigned b)
> > +{
> > + return a < b ? a : b;
> > +}
> > +
> > +#define OPERATION(op, type, N, exp1, exp2)
> > \
> > + unsigned u##op##type##N (unsigned a, unsigned b, unsigned c, unsigned d,
> > \
> > + unsigned e)
> > \
> > + {
> > \
> > + unsigned result = op (exp1, exp2);
> > \
> > + if (result == c || result == c * 2)
> > \
> > + return d;
> > \
> > + else
> > \
> > + return e;
> > \
> > + }
> > +
> > +OPERATION (max, add, 1, a, a + b)
> > +OPERATION (max, add, 2, a, b + a)
> > +OPERATION (max, add, 3, a + b, a)
> > +OPERATION (max, add, 4, b + a, a)
> > +
> > +OPERATION (min, add, 1, a, a + b)
> > +OPERATION (min, add, 2, a, b + a)
> > +OPERATION (min, add, 3, a + b, a)
> > +OPERATION (min, add, 4, b + a, a)
> > +
> > +OPERATION (max, sub, 1, a, a - b)
> > +OPERATION (max, sub, 2, a - b, a)
> > +
> > +OPERATION (min, sub, 1, a, a - b)
> > +OPERATION (min, sub, 2, a - b, a)
> > +
> > +/* { dg-final { scan-assembler-times "adds\\t" 8 } } */
> > +/* { dg-final { scan-assembler-times "subs\\t" 4 } } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-5.c
> > b/gcc/testsuite/gcc.target/aarch64/pr116815-5.c
> > new file mode 100644
> > index 00000000000..015c868aec2
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/pr116815-5.c
> > @@ -0,0 +1,44 @@
> > +/* { dg-do compile } */
> > +/* { dg-options "-O2" } */
> > +
> > +#pragma GCC target "+cssc"
> > +
> > +/* PR middle-end/116815 */
> > +
> > +/* Make sure that umax/umin instructions are generated with CSSC. */
> > +
> > +static inline unsigned __attribute__ ((always_inline))
> > +max (unsigned a, unsigned b)
> > +{
> > + return a > b ? a : b;
> > +}
> > +
> > +static inline unsigned __attribute__ ((always_inline))
> > +min (unsigned a, unsigned b)
> > +{
> > + return a < b ? a : b;
> > +}
> > +
> > +#define OPERATION(op, type, N, exp1, exp2)
> > \
> > + unsigned u##op##type##N (unsigned a, unsigned b) { return op (exp1,
> > exp2); }
> > +
> > +OPERATION (max, add, 1, a, a + b)
> > +OPERATION (max, add, 2, a, b + a)
> > +OPERATION (max, add, 3, a + b, a)
> > +OPERATION (max, add, 4, b + a, a)
> > +
> > +OPERATION (min, add, 1, a, a + b)
> > +OPERATION (min, add, 2, a, b + a)
> > +OPERATION (min, add, 3, a + b, a)
> > +OPERATION (min, add, 4, b + a, a)
> > +
> > +OPERATION (max, sub, 1, a, a - b)
> > +OPERATION (max, sub, 2, a - b, a)
> > +
> > +OPERATION (min, sub, 1, a, a - b)
> > +OPERATION (min, sub, 2, a - b, a)
> > +
> > +/* { dg-final { scan-assembler-times "umax\\t" 6 } } */
> > +/* { dg-final { scan-assembler-times "umin\\t" 6 } } */
> > +/* { dg-final { scan-assembler-not "adds\\t" } } */
> > +/* { dg-final { scan-assembler-not "subs\\t" } } */
> > diff --git a/gcc/testsuite/gcc.target/aarch64/pr116815-6.c
> > b/gcc/testsuite/gcc.target/aarch64/pr116815-6.c
> > new file mode 100644
> > index 00000000000..2b5407bcfc2
> > --- /dev/null
> > +++ b/gcc/testsuite/gcc.target/aarch64/pr116815-6.c
> > @@ -0,0 +1,60 @@
> > +/* { dg-do run } */
> > +/* { dg-options "-O2" } */
> > +
> > +/* PR middle-end/116815 */
> > +
> > +/* Verify that the transformation gives correct results */
> > +
> > +[[gnu::always_inline]]
> > +inline unsigned min (unsigned a, unsigned b)
> > +{
> > + return (a < b) ? a : b;
> > +}
> > +
> > +[[gnu::always_inline]]
> > +inline unsigned max (unsigned a, unsigned b)
> > +{
> > + return (a > b) ? a : b;
> > +}
> > +
> > +[[gnu::noipa]] unsigned
> > +umaxadd (unsigned a, unsigned b)
> > +{
> > + return max (a + b, a);
> > +}
> > +
> > +[[gnu::noipa]] unsigned
> > +umaxsub (unsigned a, unsigned b)
> > +{
> > + return max (a - b, a);
> > +}
> > +
> > +[[gnu::noipa]] unsigned
> > +uminadd (unsigned a, unsigned b)
> > +{
> > + return min (a + b, a);
> > +}
> > +
> > +[[gnu::noipa]] unsigned
> > +uminsub (unsigned a, unsigned b)
> > +{
> > + return min (a - b, a);
> > +}
> > +
> > +int
> > +main ()
> > +{
> > + /* Overflows to 0x30000000. */
> > + if (umaxadd (0x90000000, 0xa0000000) != 0x90000000)
> > + __builtin_abort ();
> > +
> > + if (uminadd (0x90000000, 0xa0000000) != 0x30000000)
> > + __builtin_abort ();
> > +
> > + /* Underflows to 0x60000000. */
> > + if (umaxsub (0x00000000, 0xa0000000) != 0x60000000)
> > + __builtin_abort ();
> > +
> > + if (uminsub (0x00000000, 0xa0000000) != 0x00000000)
> > + __builtin_abort ();
> > +}
> > --
> > 2.44.0
> >