sub(a, b))

Richard Biener Thu, 05 Jun 2025 23:52:12 -0700

On Fri, Jun 6, 2025 at 6:09 AM Dhruv Chawla <dhr...@nvidia.com> wrote:
>
> On 05/06/25 12:01, Richard Biener wrote:
> > External email: Use caution opening links or attachments
> >
> >
> > On Wed, Jun 4, 2025 at 7:44 PM Andrew Pinski <pins...@gmail.com> wrote:
> >>
> >> On Wed, Jun 4, 2025 at 6:27 AM Richard Biener
> >> <richard.guent...@gmail.com> wrote:
> >>>
> >>> On Thu, May 29, 2025 at 10:04 AM <dhr...@nvidia.com> wrote:
> >>>>
> >>>> From: Dhruv Chawla <dhr...@nvidia.com>
> >>>>
> >>>> This patch folds the following patterns:
> >>>> - max (a, add (a, b)) -> [sum, ovf] = addo (a, b); !ovf ? sum : a
> >>>> - max (a, sub (a, b)) -> [sum, ovf] = subo (a, b); !ovf ? a : sum
> >>>> - min (a, add (a, b)) -> [sum, ovf] = addo (a, b); !ovf ? a : sum
> >>>> - min (a, sub (a, b)) -> [sum, ovf] = addo (a, b); !ovf ? sum : a
> >>>
> >>> I wonder whether this is really beneficial without considering the
> >>> target.  IMO a COND_EXPR is always less "canonical", the original
> >>> form should better optimize with surrounding code.
> >>
> >> This happens very late in the gimple optimization.
> >>
> >>>
> >>> I suppose you are after improved code generation for aarch64?  Can
> >>> this not be achieved by RTL level simplification / instruction combining?
> >>
> >> So the RTL level combine gets us:
> >> ```
> >> (set (reg:SI 105 [ _5 ])
> >>      (umax:SI (plus:SI (reg/v:SI 103 [ a ])
> >>              (reg:SI 108 [ b ]))
> >>          (reg/v:SI 103 [ a ])))
> >> ```
> >> the aarch64 backend could match this I suspect but it looks like this
> >> transformation also helps x86 and other targets which don't have umax
> >> patterns/obtab but have add with overflow optabs.
> >
> > On x86 STV can decide to use SSE regs for this where min/max are
> > available.  Turning it into cmov on oflag would be premature.  STV1
> > is before combine, STV2 after it.
> >
> > IMO the above shows it's perfect for a target specific combine?
> >
> > Richard.
>
> I'm not too familiar with x86 so I am not sure how to make it use SSE regs,
> but I am seeing better (?) or equivalent codegen with the patch, at least
> with the number of instructions - for example, with pr116815-1.c, these are
> the two patterns I'm seeing:
>
> @@ -97,9 +97,9 @@ uminadd3:
>   uminadd4:
>   .LFB21:
>          .cfi_startproc
> -       leal    (%rdi,%rsi), %eax
> -       cmpl    %edi, %eax
> -       cmova   %edi, %eax
> +       addl    %edi, %esi
> +       movl    %esi, %eax
> +       cmovnc  %edi, %eax
>          ret
>          .cfi_endproc
>   .LFE21:
> @@ -112,8 +112,7 @@ umaxsub1:
>          .cfi_startproc
>          movl    %edi, %eax
>          subl    %esi, %eax
> -       cmpl    %edi, %eax
> -       cmovb   %edi, %eax
> +       cmovnb  %edi, %eax
>          ret
>          .cfi_endproc
>
> Either a slightly different sequence or one instruction less. But I am
> compiling with a cross-compiler (plus -O3 -march=znver5), so I may be
> doing something wrong.


For example

static inline int
smax (int a, int b)
{
  return a > b ? a : b;
}

void
smax_add (int *a, int *b, int *r)
{
  int res = smax (*a, *a + *b);
  *r = res;
}

with -O2 -march=znver2 produces

smax_add:
.LFB1:
        .cfi_startproc
        vmovd   (%rdi), %xmm1
        vmovd   (%rsi), %xmm0
        vpaddd  %xmm0, %xmm1, %xmm0
        vpmaxsd %xmm1, %xmm0, %xmm0
        vmovd   %xmm0, (%rdx)

the cost considerations of STV are a bit iffy, different targets have
different GPR <-> XMM move costs, but using memory sources
helps.

It really depends on micro-architectural details whether the
cmov sequence or the XMM sequence is better.  Sure, when
we can use the oflag instead of an extra compare that's better,
but as said this looks to be recoverable on RTL.

As proposed the transform is also prone to happen early during
unfolding of abstraction during early inlining and optimization, so
the context will likely not be visible in practice which means possible
regressions with optimizing with surrounding code not aware of
add-with-overflow (like for example SCEV analysis).

Richard.

>
> >
> >> In fact looking at the code gen between the two versions, with
> >> aarch64's cssc, using umax might be better.
> >> ```
> >>          add     w1, w0, w1        // c_3, a, b
> >>          umax    w0, w1, w0      //, c_3, a
> >> ```
> >> vs:
> >> ```
> >>          adds    w8, w0, w1
> >>          csel    w0, w0, w8, hs
> >> ```
> >>
> >> Because we don't clobber CC/flags.
> >>
> >>>
> >>> Richard.
> >>>
> >>>> Where ovf is the overflow flag, addo and subo are overflowing addition 
> >>>> and
> >>>> subtraction, respectively. The folded patterns can normally be 
> >>>> implemented as
> >>>> an overflowing operation combined with a conditional move/select 
> >>>> instruction.
> >>>>
> >>>> Explanation for the conditions handled in arith_overflow_check_p:
> >>>>
> >>>> Case 1/2: r = a + b; max/min (a, r) or max/min (r, a)
> >>>>    lhs (r)
> >>>>      if crhs1 (a) and crhs2 (r)
> >>>>        => lhs (r) == crhs2 (r) &&
> >>>>           (rhs1 (a or b) == crhs1 (a) || rhs2 (a or b) == crhs1 (a))
> >>>>      if crhs1 (r) and crhs2 (a)
> >>>>        => lhs (r) == crhs1 (r) &&
> >>>>           (rhs1 (a or b) == crhs2 (a) || rhs2 (a or b) == crhs2 (a))
> >>>>
> >>>> Both rhs1 and rhs2 are checked in (rhs<n> == crhs<n>) as addition is
> >>>> commutative.
> >>>>
> >>>> Case 3/4: r = a - b; max/min (a, r) or max/min (r, a)
> >>>>    lhs (r)
> >>>>      if crhs1 (a) and crhs2 (r)
> >>>>        => lhs (r) == crhs2 (r) && rhs1 (a) == crhs1 (a)
> >>>>      if crhs1 (r) and crhs2 (a)
> >>>>        => lhs (r) == crhs1 (r) && rhs1 (a) == crhs2 (a)
> >>>>
> >>>> Bootstrapped and regtested on aarch64-unknown-linux-gnu.
> >>>>
> >>>> Signed-off-by: Dhruv Chawla <dhr...@nvidia.com>
> >>>>
> >>>> gcc/ChangeLog:
> >>>>
> >>>>          PR middle-end/116815
> >>>>          * tree-ssa-math-opts.cc (arith_overflow_check_p): Match min/max
> >>>>          patterns.
> >>>>          (build_minmax_replacement_statements): New function.
> >>>>          (match_arith_overflow): Update to handle min/max patterns.
> >>>>
> >>>> gcc/testsuite/ChangeLog:
> >>>>
> >>>>          * gcc.dg/tree-ssa/pr116815-1.c: New test.
> >>>>          * gcc.dg/tree-ssa/pr116815-2.c: Likewise.
> >>>>          * gcc.dg/tree-ssa/pr116815-3.c: Likewise.
> >>>> ---
> >>>>   gcc/testsuite/gcc.dg/tree-ssa/pr116815-1.c |  42 ++++++
> >>>>   gcc/testsuite/gcc.dg/tree-ssa/pr116815-2.c |  93 +++++++++++++
> >>>>   gcc/testsuite/gcc.dg/tree-ssa/pr116815-3.c |  43 ++++++
> >>>>   gcc/tree-ssa-math-opts.cc                  | 151 +++++++++++++++++++--
> >>>>   4 files changed, 318 insertions(+), 11 deletions(-)
> >>>>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116815-1.c
> >>>>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116815-2.c
> >>>>   create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr116815-3.c
> >>>>
> >>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr116815-1.c 
> >>>> b/gcc/testsuite/gcc.dg/tree-ssa/pr116815-1.c
> >>>> new file mode 100644
> >>>> index 00000000000..5d62843d63c
> >>>> --- /dev/null
> >>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr116815-1.c
> >>>> @@ -0,0 +1,42 @@
> >>>> +/* { dg-do compile } */
> >>>> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> >>>> +
> >>>> +/* PR middle-end/116815 */
> >>>> +
> >>>> +/* Single-use tests.  */
> >>>> +
> >>>> +static inline unsigned
> >>>> +max (unsigned a, unsigned b)
> >>>> +{
> >>>> +  return a > b ? a : b;
> >>>> +}
> >>>> +
> >>>> +static inline unsigned
> >>>> +min (unsigned a, unsigned b)
> >>>> +{
> >>>> +  return a < b ? a : b;
> >>>> +}
> >>>> +
> >>>> +#define OPERATION(op, type, N, exp1, exp2)                              
> >>>>        \
> >>>> +  unsigned u##op##type##N (unsigned a, unsigned b) { return op (exp1, 
> >>>> exp2); }
> >>>> +
> >>>> +OPERATION (max, add, 1, a, a + b)
> >>>> +OPERATION (max, add, 2, a, b + a)
> >>>> +OPERATION (max, add, 3, a + b, a)
> >>>> +OPERATION (max, add, 4, b + a, a)
> >>>> +
> >>>> +OPERATION (min, add, 1, a, a + b)
> >>>> +OPERATION (min, add, 2, a, b + a)
> >>>> +OPERATION (min, add, 3, a + b, a)
> >>>> +OPERATION (min, add, 4, b + a, a)
> >>>> +
> >>>> +OPERATION (max, sub, 1, a, a - b)
> >>>> +OPERATION (max, sub, 2, a - b, a)
> >>>> +
> >>>> +OPERATION (min, sub, 1, a, a - b)
> >>>> +OPERATION (min, sub, 2, a - b, a)
> >>>> +
> >>>> +/* { dg-final { scan-tree-dump-not "MAX_EXPR" "optimized" } } */
> >>>> +/* { dg-final { scan-tree-dump-not "MIN_EXPR" "optimized" } } */
> >>>> +/* { dg-final { scan-tree-dump-times "ADD_OVERFLOW" 8 "optimized" } } */
> >>>> +/* { dg-final { scan-tree-dump-times "SUB_OVERFLOW" 4 "optimized" } } */
> >>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr116815-2.c 
> >>>> b/gcc/testsuite/gcc.dg/tree-ssa/pr116815-2.c
> >>>> new file mode 100644
> >>>> index 00000000000..56e8038ef82
> >>>> --- /dev/null
> >>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr116815-2.c
> >>>> @@ -0,0 +1,93 @@
> >>>> +/* { dg-do compile } */
> >>>> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> >>>> +
> >>>> +/* PR middle-end/116815 */
> >>>> +
> >>>> +/* Negative tests.  */
> >>>> +
> >>>> +static inline int
> >>>> +smax (int a, int b)
> >>>> +{
> >>>> +  return a > b ? a : b;
> >>>> +}
> >>>> +
> >>>> +static inline int
> >>>> +smin (int a, int b)
> >>>> +{
> >>>> +  return a < b ? a : b;
> >>>> +}
> >>>> +
> >>>> +static inline unsigned
> >>>> +umax (unsigned a, unsigned b)
> >>>> +{
> >>>> +  return a > b ? a : b;
> >>>> +}
> >>>> +
> >>>> +static inline unsigned
> >>>> +umin (unsigned a, unsigned b)
> >>>> +{
> >>>> +  return a < b ? a : b;
> >>>> +}
> >>>> +
> >>>> +#define ASSUME(cond) if (!(cond)) __builtin_unreachable ();
> >>>> +
> >>>> +/* This transformation does not trigger on signed types.  */
> >>>> +
> >>>> +int
> >>>> +smax_add (int a, int b)
> >>>> +{
> >>>> +  ASSUME (b >= 0);
> >>>> +  return smax (a, a + b);
> >>>> +}
> >>>> +
> >>>> +int
> >>>> +smin_add (int a, int b)
> >>>> +{
> >>>> +  ASSUME (b >= 0);
> >>>> +  return smin (a, a + b);
> >>>> +}
> >>>> +
> >>>> +int
> >>>> +smax_sub (int a, int b)
> >>>> +{
> >>>> +  ASSUME (b >= 0);
> >>>> +  return smax (a, a - b);
> >>>> +}
> >>>> +
> >>>> +int
> >>>> +smin_sub (int a, int b)
> >>>> +{
> >>>> +  ASSUME (b >= 0);
> >>>> +  return smin (a, a - b);
> >>>> +}
> >>>> +
> >>>> +/* Invalid patterns.  */
> >>>> +
> >>>> +/* This can potentially be matched, but the RHS gets factored to
> >>>> +   (a + b) * b.  */
> >>>> +unsigned
> >>>> +umax_factored (unsigned a, unsigned b)
> >>>> +{
> >>>> +  return umax (a * b, a * b + b * b);
> >>>> +}
> >>>> +
> >>>> +unsigned
> >>>> +umin_mult (unsigned a, unsigned b)
> >>>> +{
> >>>> +  return umin (a, a * b);
> >>>> +}
> >>>> +
> >>>> +unsigned
> >>>> +umax_sub (unsigned a, unsigned b)
> >>>> +{
> >>>> +  return umax (a, b - a);
> >>>> +}
> >>>> +
> >>>> +unsigned
> >>>> +umin_sub (unsigned a, unsigned b)
> >>>> +{
> >>>> +  return umin (a, b - a);
> >>>> +}
> >>>> +
> >>>> +/* { dg-final { scan-tree-dump-not "ADD_OVERFLOW" "optimized" } } */
> >>>> +/* { dg-final { scan-tree-dump-not "SUB_OVERFLOW" "optimized" } } */
> >>>> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr116815-3.c 
> >>>> b/gcc/testsuite/gcc.dg/tree-ssa/pr116815-3.c
> >>>> new file mode 100644
> >>>> index 00000000000..af1fe18d50a
> >>>> --- /dev/null
> >>>> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr116815-3.c
> >>>> @@ -0,0 +1,43 @@
> >>>> +/* { dg-do compile } */
> >>>> +/* { dg-options "-O2 -fdump-tree-optimized" } */
> >>>> +
> >>>> +/* PR middle-end/116815 */
> >>>> +
> >>>> +/* Multi-use tests.  */
> >>>> +
> >>>> +static inline unsigned
> >>>> +max (unsigned a, unsigned b)
> >>>> +{
> >>>> +  return a > b ? a : b;
> >>>> +}
> >>>> +
> >>>> +static inline unsigned
> >>>> +min (unsigned a, unsigned b)
> >>>> +{
> >>>> +  return a < b ? a : b;
> >>>> +}
> >>>> +
> >>>> +unsigned
> >>>> +umax_add_umin_add (unsigned a, unsigned b)
> >>>> +{
> >>>> +  return max (a, a + b) + min (a + b, b);
> >>>> +}
> >>>> +
> >>>> +unsigned
> >>>> +umin_add_umax_add (unsigned a, unsigned b)
> >>>> +{
> >>>> +  return min (a, b + a) + max (b + a, b);
> >>>> +}
> >>>> +
> >>>> +unsigned
> >>>> +multiple_paths (unsigned a, unsigned b)
> >>>> +{
> >>>> +  if (a > 5)
> >>>> +    return max (a, a + b);
> >>>> +  else
> >>>> +    return min (a, a + b);
> >>>> +}
> >>>> +
> >>>> +/* { dg-final { scan-tree-dump-not "MAX_EXPR" "optimized" } } */
> >>>> +/* { dg-final { scan-tree-dump-not "MIN_EXPR" "optimized" } } */
> >>>> +/* { dg-final { scan-tree-dump-times "ADD_OVERFLOW" 3 "optimized" } } */
> >>>> diff --git a/gcc/tree-ssa-math-opts.cc b/gcc/tree-ssa-math-opts.cc
> >>>> index 7e819f37446..f08cac68ca7 100644
> >>>> --- a/gcc/tree-ssa-math-opts.cc
> >>>> +++ b/gcc/tree-ssa-math-opts.cc
> >>>> @@ -3981,11 +3981,26 @@ arith_overflow_check_p (gimple *stmt, gimple 
> >>>> *cast_stmt, gimple *&use_stmt,
> >>>>         return 1;
> >>>>       }
> >>>>
> >>>> -  if (TREE_CODE_CLASS (ccode) != tcc_comparison)
> >>>> +  if (TREE_CODE_CLASS (ccode) != tcc_comparison
> >>>> +      && TREE_CODE_CLASS (ccode) != tcc_binary)
> >>>>       return 0;
> >>>>
> >>>>     switch (ccode)
> >>>>       {
> >>>> +    case MAX_EXPR:
> >>>> +    case MIN_EXPR:
> >>>> +      /* 1. r = a + b; max (a, r) or max (r, a)
> >>>> +        2. r = a + b; min (a, r) or min (r, a)
> >>>> +        3. r = a - b; max (a, r) or max (r, a)
> >>>> +        4. r = a - b; min (a, r) or min (r, a)  */
> >>>> +      if ((code == PLUS_EXPR
> >>>> +          && ((lhs == crhs1 && (rhs1 == crhs2 || rhs2 == crhs2))
> >>>> +              || (lhs == crhs2 && (rhs1 == crhs1 || rhs2 == crhs2))))
> >>>> +         || (code == MINUS_EXPR
> >>>> +             && ((lhs == crhs1 && rhs1 == crhs2)
> >>>> +                 || (lhs == crhs2 && rhs1 == crhs1))))
> >>>> +       return 1;
> >>>> +      break;
> >>>>       case GT_EXPR:
> >>>>       case LE_EXPR:
> >>>>         if (maxval)
> >>>> @@ -4339,6 +4354,73 @@ match_saturation_trunc (gimple_stmt_iterator 
> >>>> *gsi, gphi *phi)
> >>>>     return true;
> >>>>   }
> >>>>
> >>>> +/* Assume that ovf = overflow_flag (add/sub (...)).
> >>>> +   The replacement forms are:
> >>>> +     max (add) -> ovf ? a : a + b
> >>>> +     min (sub) -> ovf ? a : a - b
> >>>> +     max (sub) -> ovf ? a - b : a
> >>>> +     min (add) -> ovf ? a + b : a.  */
> >>>> +
> >>>> +static tree
> >>>> +build_minmax_replacement_statements (gimple *def_stmt, tree ovf, tree 
> >>>> new_lhs,
> >>>> +                                    tree type, gimple *minmax_stmt)
> >>>> +{
> >>>> +  enum tree_code code = gimple_assign_rhs_code (def_stmt);
> >>>> +  enum tree_code rhs_code = gimple_assign_rhs_code (minmax_stmt);
> >>>> +  gcc_checking_assert (code == PLUS_EXPR || code == MINUS_EXPR);
> >>>> +
> >>>> +  tree lhs = gimple_assign_lhs (def_stmt);
> >>>> +  tree rhs1 = gimple_assign_rhs1 (def_stmt);
> >>>> +  tree rhs2 = gimple_assign_rhs2 (def_stmt);
> >>>> +
> >>>> +  tree use_rhs1 = gimple_assign_rhs1 (minmax_stmt);
> >>>> +  tree use_rhs2 = gimple_assign_rhs2 (minmax_stmt);
> >>>> +
> >>>> +  /* First figure out which variable from def_stmt will be used in the
> >>>> +     COND_EXPR.  */
> >>>> +  tree minmax_var = NULL_TREE;
> >>>> +  /* Either max/min (a, add/sub (a, b)) or
> >>>> +           max/min (add/sub (a, b), a).  */
> >>>> +  if ((lhs == use_rhs2 && use_rhs1 == rhs1)
> >>>> +      || (lhs == use_rhs1 && use_rhs2 == rhs1))
> >>>> +    minmax_var = rhs1;
> >>>> +  /* Either max/min (a, add (b, a)) or
> >>>> +           max/min (add (b, a), a).  */
> >>>> +  else if (code == PLUS_EXPR)
> >>>> +    minmax_var = rhs2;
> >>>> +
> >>>> +  /* The above should always match rhs1 for MINUS_EXPR.  */
> >>>> +  gcc_checking_assert (
> >>>> +    minmax_var != NULL_TREE
> >>>> +    && (code == PLUS_EXPR || (use_rhs1 != rhs2 && use_rhs2 != rhs2)));
> >>>> +
> >>>> +  /* Figure out if we have to generate:
> >>>> +       (ovf != 0 ? new_lhs : minmax_var) or
> >>>> +       (ovf != 0 ? minmax_var : new_lhs) i.e. (ovf == 0 ? new_lhs : 
> >>>> minmax_var).
> >>>> +     The default case is assumed to be the first one.  */
> >>>> +  bool flip = false;
> >>>> +  if ((rhs_code == MIN_EXPR && code == PLUS_EXPR)
> >>>> +      || (rhs_code == MAX_EXPR && code == MINUS_EXPR))
> >>>> +    flip = true;
> >>>> +
> >>>> +  /* Generate the actual code.  */
> >>>> +  tree minmax = make_ssa_name (type);
> >>>> +  tree comparison_result = make_ssa_name (boolean_type_node);
> >>>> +  tree comparison_expr = build2 (flip ? EQ_EXPR : NE_EXPR, 
> >>>> boolean_type_node,
> >>>> +                                ovf, build_int_cst (type, 0));
> >>>> +  gimple *comparison_stmt
> >>>> +    = gimple_build_assign (comparison_result, comparison_expr);
> >>>> +
> >>>> +  tree conditional
> >>>> +    = build3 (COND_EXPR, type, comparison_result, minmax_var, new_lhs);
> >>>> +  gimple *new_minmax_stmt = gimple_build_assign (minmax, conditional);
> >>>> +  gimple_stmt_iterator gsi = gsi_for_stmt (minmax_stmt);
> >>>> +  gsi_insert_before (&gsi, comparison_stmt, GSI_NEW_STMT);
> >>>> +  gsi_insert_after (&gsi, new_minmax_stmt, GSI_NEW_STMT);
> >>>> +
> >>>> +  return minmax;
> >>>> +}
> >>>> +
> >>>>   /* Recognize for unsigned x
> >>>>      x = y - z;
> >>>>      if (x > y)
> >>>> @@ -4391,7 +4473,19 @@ match_saturation_trunc (gimple_stmt_iterator 
> >>>> *gsi, gphi *phi)
> >>>>      z = IMAGPART_EXPR <_7>;
> >>>>      _8 = IMAGPART_EXPR <_7>;
> >>>>      _9 = _8 != 0;
> >>>> -   iftmp.0_3 = (int) _9;  */
> >>>> +   iftmp.0_3 = (int) _9;
> >>>> +
> >>>> +   And also recognize:
> >>>> +   c = max/min (a, add/sub (a, b))
> >>>> +   and replace it with
> >>>> +   _7 = .(ADD|SUB)_OVERFLOW (a, b);
> >>>> +   _8 = REALPART_EXPR <_7>;
> >>>> +   _9 = IMAGPART_EXPR <_7>;
> >>>> +   _10 = _9 != 0; (or _9 == 0)
> >>>> +   _11 = _10 ? _8 : a;
> >>>> +   c = _11;
> >>>> +   This can be optimized to a single conditional select instruction 
> >>>> with an
> >>>> +   overflowing arithmetic instruction.  */
> >>>>
> >>>>   static bool
> >>>>   match_arith_overflow (gimple_stmt_iterator *gsi, gimple *stmt,
> >>>> @@ -4425,6 +4519,7 @@ match_arith_overflow (gimple_stmt_iterator *gsi, 
> >>>> gimple *stmt,
> >>>>
> >>>>     tree rhs1 = gimple_assign_rhs1 (stmt);
> >>>>     tree rhs2 = gimple_assign_rhs2 (stmt);
> >>>> +  bool minmax_use_seen = false;
> >>>>     FOR_EACH_IMM_USE_FAST (use_p, iter, lhs)
> >>>>       {
> >>>>         use_stmt = USE_STMT (use_p);
> >>>> @@ -4445,6 +4540,13 @@ match_arith_overflow (gimple_stmt_iterator *gsi, 
> >>>> gimple *stmt,
> >>>>                  return false;
> >>>>                cond_stmt = use_stmt;
> >>>>              }
> >>>> +         if (gimple_code (use_stmt) == GIMPLE_ASSIGN
> >>>> +             && gimple_assign_rhs_class (use_stmt) == GIMPLE_BINARY_RHS)
> >>>> +           {
> >>>> +             tree_code rhs_code = gimple_assign_rhs_code (use_stmt);
> >>>> +             if (rhs_code == MAX_EXPR || rhs_code == MIN_EXPR)
> >>>> +               minmax_use_seen = true;
> >>>> +           }
> >>>>            ovf_use_seen = true;
> >>>>          }
> >>>>         else
> >>>> @@ -4494,7 +4596,10 @@ match_arith_overflow (gimple_stmt_iterator *gsi, 
> >>>> gimple *stmt,
> >>>>
> >>>>     tree maxval = NULL_TREE;
> >>>>     if (!ovf_use_seen
> >>>> -      || (code != MULT_EXPR && (code == BIT_NOT_EXPR ? use_seen : 
> >>>> !use_seen))
> >>>> +      || (code != MULT_EXPR
> >>>> +         && (code == BIT_NOT_EXPR
> >>>> +               ? use_seen
> >>>> +               : !minmax_use_seen && !use_seen))
> >>>>         || (code == PLUS_EXPR
> >>>>            && optab_handler (uaddv4_optab,
> >>>>                              TYPE_MODE (type)) == CODE_FOR_nothing)
> >>>> @@ -4758,6 +4863,7 @@ match_arith_overflow (gimple_stmt_iterator *gsi, 
> >>>> gimple *stmt,
> >>>>       gsi_insert_after (gsi, g2, GSI_NEW_STMT);
> >>>>     else
> >>>>       gsi_insert_before (gsi, g2, GSI_SAME_STMT);
> >>>> +
> >>>>     if (code == MULT_EXPR)
> >>>>       mul_stmts.quick_push (g2);
> >>>>
> >>>> @@ -4786,15 +4892,25 @@ match_arith_overflow (gimple_stmt_iterator *gsi, 
> >>>> gimple *stmt,
> >>>>         if (gimple_code (use_stmt) == GIMPLE_COND)
> >>>>          {
> >>>>            gcond *cond_stmt = as_a <gcond *> (use_stmt);
> >>>> -         gimple_cond_set_lhs (cond_stmt, ovf);
> >>>> -         gimple_cond_set_rhs (cond_stmt, build_int_cst (type, 0));
> >>>> -         gimple_cond_set_code (cond_stmt, ovf_use == 1 ? NE_EXPR : 
> >>>> EQ_EXPR);
> >>>> +         tree rhs = gimple_cond_rhs (cond_stmt);
> >>>> +         if (TREE_CODE (rhs) == MIN_EXPR || TREE_CODE (rhs) == MAX_EXPR)
> >>>> +           gimple_cond_set_rhs (cond_stmt,
> >>>> +                                build_minmax_replacement_statements (
> >>>> +                                  stmt, ovf, new_lhs, type, use_stmt));
> >>>> +         else
> >>>> +           {
> >>>> +             gimple_cond_set_lhs (cond_stmt, ovf);
> >>>> +             gimple_cond_set_rhs (cond_stmt, build_int_cst (type, 0));
> >>>> +             gimple_cond_set_code (cond_stmt,
> >>>> +                                   ovf_use == 1 ? NE_EXPR : EQ_EXPR);
> >>>> +           }
> >>>>          }
> >>>>         else
> >>>>          {
> >>>>            gcc_checking_assert (is_gimple_assign (use_stmt));
> >>>>            if (gimple_assign_rhs_class (use_stmt) == GIMPLE_BINARY_RHS)
> >>>>              {
> >>>> +             tree_code rhs_code = gimple_assign_rhs_code (use_stmt);
> >>>>                if (gimple_assign_rhs_code (use_stmt) == RSHIFT_EXPR)
> >>>>                  {
> >>>>                    g2 = gimple_build_assign (make_ssa_name 
> >>>> (boolean_type_node),
> >>>> @@ -4843,6 +4959,14 @@ match_arith_overflow (gimple_stmt_iterator *gsi, 
> >>>> gimple *stmt,
> >>>>                    gsi_remove (&gsiu, true);
> >>>>                    continue;
> >>>>                  }
> >>>> +             else if (rhs_code == MIN_EXPR || rhs_code == MAX_EXPR)
> >>>> +               {
> >>>> +                 gimple_stmt_iterator gsi = gsi_for_stmt (use_stmt);
> >>>> +                 gimple_assign_set_rhs_from_tree (
> >>>> +                   &gsi,
> >>>> +                   build_minmax_replacement_statements (stmt, ovf, 
> >>>> new_lhs,
> >>>> +                                                        type, 
> >>>> use_stmt));
> >>>> +               }
> >>>>                else
> >>>>                  {
> >>>>                    gimple_assign_set_rhs1 (use_stmt, ovf);
> >>>> @@ -4854,11 +4978,16 @@ match_arith_overflow (gimple_stmt_iterator *gsi, 
> >>>> gimple *stmt,
> >>>>              }
> >>>>            else
> >>>>              {
> >>>> -             gcc_checking_assert (gimple_assign_rhs_code (use_stmt)
> >>>> -                                  == COND_EXPR);
> >>>> -             tree cond = build2 (ovf_use == 1 ? NE_EXPR : EQ_EXPR,
> >>>> -                                 boolean_type_node, ovf,
> >>>> -                                 build_int_cst (type, 0));
> >>>> +             tree_code rhs_code = gimple_assign_rhs_code (use_stmt);
> >>>> +             gcc_checking_assert (rhs_code == COND_EXPR || rhs_code == 
> >>>> MAX_EXPR
> >>>> +                                  || rhs_code == MIN_EXPR);
> >>>> +             tree cond = NULL_TREE;
> >>>> +             if (rhs_code != COND_EXPR)
> >>>> +               cond = build_minmax_replacement_statements (stmt, ovf, 
> >>>> new_lhs,
> >>>> +                                                           type, 
> >>>> use_stmt);
> >>>> +             else
> >>>> +               cond = build2 (ovf_use == 1 ? NE_EXPR : EQ_EXPR,
> >>>> +                              boolean_type_node, ovf, build_int_cst 
> >>>> (type, 0));
> >>>>                gimple_assign_set_rhs1 (use_stmt, cond);
> >>>>              }
> >>>>          }
> >>>> --
> >>>> 2.44.0
> >>>>
>
>
> --
> Regards,
> Dhruv

Re: [PATCH] widening_mul: Make better use of overflowing operations in codegen of min/max(a, add/sub(a, b))

Reply via email to