> -----Original Message----- > From: Li, Pan2 <pan2...@intel.com> > Sent: Thursday, November 7, 2024 12:57 PM > To: Tamar Christina <tamar.christ...@arm.com>; Richard Biener > <richard.guent...@gmail.com> > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; > jeffreya...@gmail.com; rdapp....@gmail.com > Subject: RE: [PATCH v2 01/10] Match: Simplify branch form 4 of unsigned > SAT_ADD into branchless > > I see your point that the backend can leverage condition move to emit the > branch > code. > > > For instance see https://godbolt.org/z/fvrq3aq6K > > On ISAs with conditional operations the branch version gets ifconverted. > > On AArch64 we get: > > sat_add_u_1(unsigned int, unsigned int): > > adds w0, w0, w1 > > csinv w0, w0, wzr, cc > > ret > > so just 2 instructions, and also branchless. On x86_64 we get: > > sat_add_u_1(unsigned int, unsigned int): > > add edi, esi > > mov eax, -1 > > cmovnc eax, edi > > ret > > so 3 instructions but a dependency chain of 2. > > also branchless. This patch would regress both of these. > > But the above Godbolt may not be a good example for evidence, because both > x86_64 and aarch64 implemented usadd > already. > Thus, they all go to usadd<QImode>. For example as below, the sat_add_u_1 and > sat_add_u_2 are almost the > same when the backend implemented usadd. > > #include <stdint.h> > > #define T uint32_t > > T sat_add_u_1 (T x, T y) > { > return (T)(x + y) < x ? -1 : (x + y); > } > > T sat_add_u_2 (T x, T y) > { > return (x + y) | -((x + y) < x); > } > > It will become different when take gcc 14.2 (which doesn’t have .SAT_ADD > GIMPLE > IR), the x86_64 > will have below asm dump for -O3. Looks like no obvious difference here. > > sat_add_u_1(unsigned int, unsigned int): > add edi, esi > mov eax, -1 > cmovnc eax, edi > ret > > sat_add_u_2(unsigned int, unsigned int): > add edi, esi > sbb eax, eax > or eax, edi > ret >
Because CE is able to recognize the idiom back into a conditional move. Pick a target that doesn't have conditional instructions, like PowerPC https://godbolt.org/z/4bTv18WMv You'll see that this canonicalization has made codegen worse. After: .L.sat_add_u_1(unsigned int, unsigned int): add 4,3,4 rldicl 9,4,0,32 subf 3,3,9 sradi 3,3,63 or 3,3,4 rldicl 3,3,0,32 blr and before .L.sat_add_u_1(unsigned int, unsigned int): add 4,3,4 cmplw 0,4,3 bge 0,.L2 li 4,-1 .L2: rldicl 3,4,0,32 blr It means now it always has to execute 6 instructions, whereas before it was 4 or 5 depending on the order of the branch. So for those architectures, it's always slower. Tamar > As well as for the vector mode for x86_64 gcc 14.2 as below. > > void vec_sat_add_u_1 (T *a, T *b, T *out, int n) > { > for (int i = 0; i < n; i++) { > T x = a[i]; > T y = b[i]; > out[i] = (T)(x + y) < x ? -1 : (x + y); > } > } > > void vec_sat_add_u_2 (T * __restrict a, T * __restrict b, > T * __restrict out, int n) > { > for (int i = 0; i < n; i++) { > T x = a[i]; > T y = b[i]; > out[i] = (x + y) | -((x + y) < x); > } > } > > vec_sat_add_u_1(unsigned int*, unsigned int*, unsigned int*, int): > .... > .L15: > movdqu xmm0, XMMWORD PTR [rdi+rax] > movdqu xmm1, XMMWORD PTR [rsi+rax] > paddd xmm1, xmm0 > psubd xmm0, xmm2 > movdqa xmm3, xmm1 > psubd xmm3, xmm2 > pcmpgtd xmm0, xmm3 > por xmm0, xmm1 > movups XMMWORD PTR [rdx+rax], xmm0 > add rax, 16 > cmp r8, rax > jne .L15 > mov eax, ecx > and eax, -4 > mov r8d, eax > cmp ecx, eax > je .L11 > sub ecx, eax > mov r9d, ecx > cmp ecx, 1 > je .L17 > .L14: > movq xmm2, QWORD PTR .LC2[rip] > mov ecx, r8d > movq xmm0, QWORD PTR [rdi+rcx*4] > movq xmm1, QWORD PTR [rsi+rcx*4] > paddd xmm1, xmm0 > psubd xmm0, xmm2 > movdqa xmm3, xmm1 > psubd xmm3, xmm2 > pcmpgtd xmm0, xmm3 > movdqa xmm2, xmm0 > pandn xmm2, xmm1 > por xmm0, xmm2 > movq QWORD PTR [rdx+rcx*4], xmm0 > .... > > vec_sat_add_u_2(unsigned int*, unsigned int*, unsigned int*, int): > ... > .L50: > movdqu xmm0, XMMWORD PTR [rdi+rax] > movdqu xmm1, XMMWORD PTR [rsi+rax] > paddd xmm1, xmm0 > psubd xmm0, xmm2 > movdqa xmm3, xmm1 > psubd xmm3, xmm2 > pcmpgtd xmm0, xmm3 > por xmm0, xmm1 > movups XMMWORD PTR [rdx+rax], xmm0 > add rax, 16 > cmp rax, r8 > jne .L50 > mov eax, ecx > and eax, -4 > mov r8d, eax > cmp ecx, eax > je .L64 > .L49: > sub ecx, r8d > cmp ecx, 1 > je .L52 > movq xmm0, QWORD PTR [rdi+r8*4] > movq xmm1, QWORD PTR [rsi+r8*4] > movq xmm2, QWORD PTR .LC2[rip] > paddd xmm1, xmm0 > psubd xmm0, xmm2 > movdqa xmm3, xmm1 > psubd xmm3, xmm2 > pcmpgtd xmm0, xmm3 > movdqa xmm2, xmm0 > pandn xmm2, xmm1 > por xmm0, xmm2 > movq QWORD PTR [rdx+r8*4], xmm0 > ... Because x86 is not a fully masked architecture. > > Pan > > -----Original Message----- > From: Tamar Christina <tamar.christ...@arm.com> > Sent: Thursday, November 7, 2024 7:43 PM > To: Li, Pan2 <pan2...@intel.com>; Richard Biener <richard.guent...@gmail.com> > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; > jeffreya...@gmail.com; rdapp....@gmail.com > Subject: RE: [PATCH v2 01/10] Match: Simplify branch form 4 of unsigned > SAT_ADD into branchless > > > -----Original Message----- > > From: Li, Pan2 <pan2...@intel.com> > > Sent: Thursday, November 7, 2024 1:45 AM > > To: Tamar Christina <tamar.christ...@arm.com>; Richard Biener > > <richard.guent...@gmail.com> > > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; > > jeffreya...@gmail.com; rdapp....@gmail.com > > Subject: RE: [PATCH v2 01/10] Match: Simplify branch form 4 of unsigned > > SAT_ADD into branchless > > > > I see, thanks Tamar for the explanation. > > > > > The problem with the rewrite is that it pessimists the code if the > > > saturating > > > instructions are not recognized afterwards. > > > > The original idea is somehow independent with the backend support IFN_SAT_* > or > > not. > > Given we have sorts of form of IFN_SAT_*, some of them are cheap while > > others > > are heavy > > from the perspective of gimple. It is possible to do some canonicalization > > here. > > > > > Additionally the branch version will get the benefit of branch prediction > > > when > ran > > > inside a loop. > > > > Not very sure if my understanding is correct, but the branchless version is > > preferred in most case IMO if > > they have nearly count of stmt. Not sure if it is still true during the > > vectorization. > > I don't think this is true. The branchless version must be either less > instructions or > cheaper instruction. The branches, especially in a loop become mostly free > due to > branch prediction. Modern branch predictors are really good at idoms like > this. > > That said, the other issue with doing this in gimple is that it interferes > with the RTL > conditional execution pass. > > For instance see https://godbolt.org/z/fvrq3aq6K > > On ISAs with conditional operations the branch version gets ifconverted. > > On AArch64 we get: > > sat_add_u_1(unsigned int, unsigned int): > adds w0, w0, w1 > csinv w0, w0, wzr, cc > ret > > so just 2 instructions, and also branchless. On x86_64 we get: > > sat_add_u_1(unsigned int, unsigned int): > add edi, esi > mov eax, -1 > cmovnc eax, edi > ret > > so 3 instructions but a dependency chain of 2. > > also branchless. This patch would regress both of these. > > Thanks, > Tamar > > > > > Pan > > > > -----Original Message----- > > From: Tamar Christina <tamar.christ...@arm.com> > > Sent: Thursday, November 7, 2024 12:25 AM > > To: Li, Pan2 <pan2...@intel.com>; Richard Biener > > <richard.guent...@gmail.com> > > Cc: gcc-patches@gcc.gnu.org; juzhe.zh...@rivai.ai; kito.ch...@gmail.com; > > jeffreya...@gmail.com; rdapp....@gmail.com > > Subject: RE: [PATCH v2 01/10] Match: Simplify branch form 4 of unsigned > > SAT_ADD into branchless > > > > > -----Original Message----- > > > From: Li, Pan2 <pan2...@intel.com> > > > Sent: Wednesday, November 6, 2024 1:31 PM > > > To: Richard Biener <richard.guent...@gmail.com> > > > Cc: gcc-patches@gcc.gnu.org; Tamar Christina <tamar.christ...@arm.com>; > > > juzhe.zh...@rivai.ai; kito.ch...@gmail.com; jeffreya...@gmail.com; > > > rdapp....@gmail.com > > > Subject: RE: [PATCH v2 01/10] Match: Simplify branch form 4 of unsigned > > > SAT_ADD into branchless > > > > > > Never mind and thanks Richard for comments. > > > > > > > Sorry for falling back in reviewing - it's not exactly clear the > > > > "cheap" form is > > > > cheaper. When I count the number of gimple statements (sub-expressions) > > > > the original appears as 3 while the result looks to have 5. > > > > > > I may have a question about how we count the stmts, not sure the x <= _1 > > > and > > > then goto should be counted > > > > They should, because they both become a instructions that need to be > > executed > in > > the execution chain. > > > > > as 1 or 2 stmts in below example. I try to count it by myself but not > > > very sure > the > > > understanding is correct. > > > > > > Before this patch: > > > 1 │ uint8_t sat_add_u_1 (uint8_t x, uint8_t y) > > > 2 │ { > > > 3 │ uint8_t D.2809; > > > 4 │ > > > 5 │ _1 = x + y; // 1 > > > 6 │ if (x <= _1) goto <D.2810>; else goto <D.2811>; // 2 for x <= > > > _1, 3 for > goto > > > ??? > > > 7 │ <D.2810>: > > > 8 │ D.2809 = x + y; // 4 (token) > > > 9 │ goto <D.2812>; // 5 (token) for goto ??? > > > 10 │ <D.2811>: > > > 11 │ D.2809 = 255; // 4 (not token) > > > 12 │ <D.2812>: > > > 13 │ return D.2809; // 6 for token, 5 for not token > > > 14 │ } > > > > > > After this patch: > > > 1 │ uint8_t sat_add_u_1 (uint8_t x, uint8_t y) > > > 2 │ { > > > 3 │ uint8_t D.2809; > > > 4 │ > > > 5 │ _1 = x + y; // 1 > > > 6 │ _2 = x + y; // 2 > > > 7 │ _3 = x > _2; // 3 > > > 8 │ _4 = (unsigned char) _3; // 4 > > > 9 │ _5 = -_4; // 5 > > > 10 │ D.2809 = _1 | _5; // 6 > > > 11 │ return D.2809; // 7 > > > 12 │ } > > > > > > > The catch is of > > > > course that the original might involve control flow and a PHI. > > > > _1 and _2 are counted as one, as they'll be CSE'd. > > After the patch your dependency chain is 5 instructions, as in, > > to create the result you must execute 5 sequential instructions. > > > > 1 │ uint8_t sat_add_u_1 (uint8_t x, uint8_t y) > > 2 │ { > > 3 │ uint8_t D.2809; > > 4 │ > > 5 │ _1 = x + y; > > 6 │ _2 = x + y; // 1 > > 7 │ _3 = x > _2; // 2 > > 8 │ _4 = (unsigned char) _3; // 3 > > 9 │ _5 = -_4; // 4 > > 10 │ D.2809 = _1 | _5; //5 > > 11 │ return D.2809; > > 12 │ } > > > > Before your change you had 3, because you always do > > Compare + branch + set. > > > > The problem with the rewrite is that it pessimists the code if the > > saturating > > instructions are not recognized afterwards. > > > > Note this is also the case for vector codegen, The original code in vector > > would be > > > > 1. duplicate 255 > > 1. x + y > > 2. compare > > 3. select. > > > > The first two instructions are independent and execute in parallel, so the > > original > > vector codegen also had a dependency chain of 3. > > > > Additionally the branch version will get the benefit of branch prediction > > when ran > > inside a loop. > > > > > > > > > The pattern will apply during late PHI-OPT or during GENERIC folding > > > > done by the frontends or gimplfiication - you scan the gimplification > > > > dump > > > > below, so having it apply there means it will influence inlining > > > > heuristics > > > > for example. I wonder which variant is considered larger by its > > > > heuristic. > > > > > > > What do others think of the early canonicalization of these to > > > > straight-line code? > > > > > > Make sense, we could find one form that most of us consider to be the > > "cheapest". > > > > I would disagree with Richard here in that this would be a pessimistic > > codegen for > > targets that don't later implement the IFN_SAT. > > > > Thanks, > > Tamar > > > > > > > > Pan > > > > > > -----Original Message----- > > > From: Richard Biener <richard.guent...@gmail.com> > > > Sent: Wednesday, November 6, 2024 8:43 PM > > > To: Li, Pan2 <pan2...@intel.com> > > > Cc: gcc-patches@gcc.gnu.org; tamar.christ...@arm.com; > juzhe.zh...@rivai.ai; > > > kito.ch...@gmail.com; jeffreya...@gmail.com; rdapp....@gmail.com > > > Subject: Re: [PATCH v2 01/10] Match: Simplify branch form 4 of unsigned > > > SAT_ADD into branchless > > > > > > On Thu, Oct 31, 2024 at 7:29 AM <pan2...@intel.com> wrote: > > > > > > > > From: Pan Li <pan2...@intel.com> > > > > > > > > There are sorts of forms for the unsigned SAT_ADD. Some of them are > > > > complicated while others are cheap. This patch would like to simplify > > > > the complicated form into the cheap ones. For example as below: > > > > > > > > From the form 4 (branch): > > > > SAT_U_ADD = (X + Y) < x ? -1 : (X + Y). > > > > > > > > To (branchless): > > > > SAT_U_ADD = (X + Y) | - ((X + Y) < X). > > > > > > > > #define T uint8_t > > > > > > > > T sat_add_u_1 (T x, T y) > > > > { > > > > return (T)(x + y) < x ? -1 : (x + y); > > > > } > > > > > > > > Before this patch: > > > > 1 │ uint8_t sat_add_u_1 (uint8_t x, uint8_t y) > > > > 2 │ { > > > > 3 │ uint8_t D.2809; > > > > 4 │ > > > > 5 │ _1 = x + y; > > > > 6 │ if (x <= _1) goto <D.2810>; else goto <D.2811>; > > > > 7 │ <D.2810>: > > > > 8 │ D.2809 = x + y; > > > > 9 │ goto <D.2812>; > > > > 10 │ <D.2811>: > > > > 11 │ D.2809 = 255; > > > > 12 │ <D.2812>: > > > > 13 │ return D.2809; > > > > 14 │ } > > > > > > > > After this patch: > > > > 1 │ uint8_t sat_add_u_1 (uint8_t x, uint8_t y) > > > > 2 │ { > > > > 3 │ uint8_t D.2809; > > > > 4 │ > > > > 5 │ _1 = x + y; > > > > 6 │ _2 = x + y; > > > > 7 │ _3 = x > _2; > > > > 8 │ _4 = (unsigned char) _3; > > > > 9 │ _5 = -_4; > > > > 10 │ D.2809 = _1 | _5; > > > > 11 │ return D.2809; > > > > 12 │ } > > > > > > > > The below test suites are passed for this patch. > > > > * The rv64gcv fully regression test. > > > > * The x86 bootstrap test. > > > > * The x86 fully regression test. > > > > > > > > gcc/ChangeLog: > > > > > > > > * match.pd: Remove unsigned branch form 4 for SAT_ADD, and > > > > add simplify to branchless instead. > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > * gcc.dg/sat_arith_simplify.h: New test. > > > > * gcc.dg/sat_u_add-simplify-2-u16.c: New test. > > > > * gcc.dg/sat_u_add-simplify-2-u32.c: New test. > > > > * gcc.dg/sat_u_add-simplify-2-u64.c: New test. > > > > * gcc.dg/sat_u_add-simplify-2-u8.c: New test. > > > > > > > > Signed-off-by: Pan Li <pan2...@intel.com> > > > > --- > > > > gcc/match.pd | 23 +++++++++---------- > > > > gcc/testsuite/gcc.dg/sat_arith_simplify.h | 10 ++++++++ > > > > .../gcc.dg/sat_u_add-simplify-2-u16.c | 11 +++++++++ > > > > .../gcc.dg/sat_u_add-simplify-2-u32.c | 11 +++++++++ > > > > .../gcc.dg/sat_u_add-simplify-2-u64.c | 11 +++++++++ > > > > .../gcc.dg/sat_u_add-simplify-2-u8.c | 11 +++++++++ > > > > 6 files changed, 65 insertions(+), 12 deletions(-) > > > > create mode 100644 gcc/testsuite/gcc.dg/sat_arith_simplify.h > > > > create mode 100644 gcc/testsuite/gcc.dg/sat_u_add-simplify-2-u16.c > > > > create mode 100644 gcc/testsuite/gcc.dg/sat_u_add-simplify-2-u32.c > > > > create mode 100644 gcc/testsuite/gcc.dg/sat_u_add-simplify-2-u64.c > > > > create mode 100644 gcc/testsuite/gcc.dg/sat_u_add-simplify-2-u8.c > > > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > > index c851ac56e37..8425d7c4f20 100644 > > > > --- a/gcc/match.pd > > > > +++ b/gcc/match.pd > > > > @@ -3146,18 +3146,17 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > > > (match (unsigned_integer_sat_add @0 @1) > > > > (bit_ior:c (usadd_left_part_2 @0 @1) (usadd_right_part_2 @0 @1))) > > > > > > > > -/* Simplify SAT_U_ADD to the cheap form > > > > - From: SAT_U_ADD = (X + Y) >= x ? (X + Y) : -1. > > > > - To: SAT_U_ADD = (X + Y) | - ((X + Y) < X). */ > > > > -(simplify (cond (ge (plus:c@2 @0 @1) @0) @2 integer_minus_onep) > > > > - (if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type) > > > > - && types_match (type, @0, @1)) > > > > - (bit_ior @2 (negate (convert (lt @2 @0)))))) > > > > - > > > > -/* Unsigned saturation add, case 4 (branch with lt): > > > > - SAT_U_ADD = (X + Y) < x ? -1 : (X + Y). */ > > > > -(match (unsigned_integer_sat_add @0 @1) > > > > - (cond^ (lt (usadd_left_part_1@2 @0 @1) @0) integer_minus_onep @2)) > > > > +/* Simplify sorts of SAT_U_ADD forms to the cheap one. > > > > + The cheap form: SAT_U_ADD = (X + Y) | - ((X + Y) < X). */ > > > > +(if (INTEGRAL_TYPE_P (type) && TYPE_UNSIGNED (type)) > > > > + /* From SAT_U_ADD = (X + Y) >= x ? (X + Y) : -1. */ > > > > + (simplify (cond (ge (plus:c@2 @0 @1) @0) @2 integer_minus_onep) > > > > + (if (types_match (type, @0, @1)) > > > > + (bit_ior (plus@2 @0 @1) (negate (convert (lt @2 @0)))))) > > > > + /* From SAT_U_ADD = (X + Y) < x ? -1 : (X + Y). */ > > > > + (simplify (cond (lt (plus:c@2 @0 @1) @0) integer_minus_onep @2) > > > > + (if (types_match (type, @0, @1)) > > > > + (bit_ior (plus@2 @0 @1) (negate (convert (lt @2 @0))))))) > > > > > > Sorry for falling back in reviewing - it's not exactly clear the "cheap" > > > form is > > > cheaper. When I count the number of gimple statements (sub-expressions) > > > the original appears as 3 while the result looks to have 5. The catch is > > > of > > > course that the original might involve control flow and a PHI. > > > > > > The pattern will apply during late PHI-OPT or during GENERIC folding > > > done by the frontends or gimplfiication - you scan the gimplification dump > > > below, so having it apply there means it will influence inlining > > > heuristics > > > for example. I wonder which variant is considered larger by its > > > heuristic. > > > > > > What do others think of the early canonicalization of these to > > > straight-line code? > > > > > > Thanks, > > > Richard. > > > > > > > /* Unsigned saturation add, case 5 (branch with eq .ADD_OVERFLOW): > > > > SAT_U_ADD = REALPART_EXPR <.ADD_OVERFLOW> == 0 ? > .ADD_OVERFLOW > > : - > > > 1. */ > > > > diff --git a/gcc/testsuite/gcc.dg/sat_arith_simplify.h > > > b/gcc/testsuite/gcc.dg/sat_arith_simplify.h > > > > new file mode 100644 > > > > index 00000000000..46ac00426b2 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/sat_arith_simplify.h > > > > @@ -0,0 +1,10 @@ > > > > +#ifndef HAVE_DEFINED_SAT_ARITH_SIMPLIFY_H > > > > +#define HAVE_DEFINED_SAT_ARITH_SIMPLIFY_H > > > > + > > > > +#define DEF_SAT_U_ADD_2(T) \ > > > > +T sat_u_add_##T##_2 (T x, T y) \ > > > > +{ \ > > > > + return (T)(x + y) < x ? -1 : (x + y); \ > > > > +} > > > > + > > > > +#endif > > > > diff --git a/gcc/testsuite/gcc.dg/sat_u_add-simplify-2-u16.c > > > b/gcc/testsuite/gcc.dg/sat_u_add-simplify-2-u16.c > > > > new file mode 100644 > > > > index 00000000000..b170b35096c > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/sat_u_add-simplify-2-u16.c > > > > @@ -0,0 +1,11 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-options "-O2 -fdump-tree-gimple-details" } */ > > > > + > > > > +#include <stdint.h> > > > > +#include "sat_arith_simplify.h" > > > > + > > > > +DEF_SAT_U_ADD_2 (uint16_t) > > > > + > > > > +/* { dg-final { scan-tree-dump-not " if " "gimple" } } */ > > > > +/* { dg-final { scan-tree-dump-not " else " "gimple" } } */ > > > > +/* { dg-final { scan-tree-dump-not " goto " "gimple" } } */ > > > > diff --git a/gcc/testsuite/gcc.dg/sat_u_add-simplify-2-u32.c > > > b/gcc/testsuite/gcc.dg/sat_u_add-simplify-2-u32.c > > > > new file mode 100644 > > > > index 00000000000..8830ed7b878 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/sat_u_add-simplify-2-u32.c > > > > @@ -0,0 +1,11 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-options "-O2 -fdump-tree-gimple-details" } */ > > > > + > > > > +#include <stdint.h> > > > > +#include "sat_arith_simplify.h" > > > > + > > > > +DEF_SAT_U_ADD_2 (uint32_t) > > > > + > > > > +/* { dg-final { scan-tree-dump-not " if " "gimple" } } */ > > > > +/* { dg-final { scan-tree-dump-not " else " "gimple" } } */ > > > > +/* { dg-final { scan-tree-dump-not " goto " "gimple" } } */ > > > > diff --git a/gcc/testsuite/gcc.dg/sat_u_add-simplify-2-u64.c > > > b/gcc/testsuite/gcc.dg/sat_u_add-simplify-2-u64.c > > > > new file mode 100644 > > > > index 00000000000..76c4d4bddaa > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/sat_u_add-simplify-2-u64.c > > > > @@ -0,0 +1,11 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-options "-O2 -fdump-tree-gimple-details" } */ > > > > + > > > > +#include <stdint.h> > > > > +#include "sat_arith_simplify.h" > > > > + > > > > +DEF_SAT_U_ADD_2 (uint64_t) > > > > + > > > > +/* { dg-final { scan-tree-dump-not " if " "gimple" } } */ > > > > +/* { dg-final { scan-tree-dump-not " else " "gimple" } } */ > > > > +/* { dg-final { scan-tree-dump-not " goto " "gimple" } } */ > > > > diff --git a/gcc/testsuite/gcc.dg/sat_u_add-simplify-2-u8.c > > > b/gcc/testsuite/gcc.dg/sat_u_add-simplify-2-u8.c > > > > new file mode 100644 > > > > index 00000000000..b034b8eedb1 > > > > --- /dev/null > > > > +++ b/gcc/testsuite/gcc.dg/sat_u_add-simplify-2-u8.c > > > > @@ -0,0 +1,11 @@ > > > > +/* { dg-do compile } */ > > > > +/* { dg-options "-O2 -fdump-tree-gimple-details" } */ > > > > + > > > > +#include <stdint.h> > > > > +#include "sat_arith_simplify.h" > > > > + > > > > +DEF_SAT_U_ADD_2 (uint8_t) > > > > + > > > > +/* { dg-final { scan-tree-dump-not " if " "gimple" } } */ > > > > +/* { dg-final { scan-tree-dump-not " else " "gimple" } } */ > > > > +/* { dg-final { scan-tree-dump-not " goto " "gimple" } } */ > > > > -- > > > > 2.43.0 > > > >