On Mon, Jan 6, 2025 at 2:12 PM Richard Sandiford
<richard.sandif...@arm.com> wrote:
>
> g:d882fe5150fbbeb4e44d007bb4964e5b22373021, posted at
> https://gcc.gnu.org/pipermail/gcc-patches/2000-July/033786.html ,
> added code to treat:
>
>   (set (reg:CC cc) (compare:CC (gt:M (reg:CC cc) 0) (lt:M (reg:CC cc) 0)))
>
> as a nop.  This PR shows that that isn't always correct.
> The compare in the set above is between two 0/1 booleans (at least
> on STORE_FLAG_VALUE==1 targets), whereas the unknown comparison that
> produced the incoming (reg:CC cc) is unconstrained; it could be between
> arbitrary integers, or even floats.  The fold is therefore replacing a
> cc that is valid for both signed and unsigned comparisons with one that
> is only known to be valid for signed comparisons.
>
>   (gt (compare (gt cc 0) (lt cc 0) 0)
>
> does simplify to:
>
>   (gt cc 0)
>
> but:
>
>   (gtu (compare (gt cc 0) (lt cc 0) 0)
>
> does not simplify to:
>
>   (gtu cc 0)
>
> The optimisation didn't come with a testcase, but it was added for
> i386's cmpstrsi, now cmpstrnsi.  That probably doesn't matter as much
> as it once did, since it's now conditional on -minline-all-stringops.
> But the patch is almost 25 years old, so whatever the original
> motivation was, it seems likely that other things now rely on it.
>
> It therefore seems better to try to preserve the optimisation on rtl
> rather than get rid of it.  To do that, we need to look at how the
> result of the outer compare is used.  We'd therefore be looking at four
> instructions (the gt, the lt, the compare, and the use of the compare),
> but combine already allows that for 3-instruction combinations thanks
> to:
>
>   /* If the source is a COMPARE, look for the use of the comparison result
>      and try to simplify it unless we already have used undobuf.other_insn.  
> */
>
> When applied to boolean inputs, a comparison operator is
> effectively a boolean logical operator (AND, ANDNOT, XOR, etc.).
> simplify_logical_relational_operation already had code to simplify
> logical operators between two comparison results, but:
>
> * It only handled IOR, which doesn't cover all the cases needed here.
>   The others are easily added.
>
> * It treated comparisons of integers as having an ORDERED/UNORDERED result.
>   Therefore:
>
>   * it would not treat "true for LT + EQ + GT" as "always true" for
>     comparisons between integers, because the mask excluded the UNORDERED
>     condition.
>
>   * it would try to convert "true for LT + GT" into LTGT even for comparisons
>     between integers.  To prevent an ICE later, the code used:
>
>        /* Many comparison codes are only valid for certain mode classes.  */
>        if (!comparison_code_valid_for_mode (code, mode))
>          return 0;
>
>     However, this used the wrong mode, since "mode" is here the integer
>     result of the comparisons (and the mode of the IOR), not the mode of
>     the things being compared.  Thus the effect was to reject all
>     floating-point-only codes, even when comparing floats.
>
>   I think instead the code should detect whether the comparison is between
>   integer values and remove UNORDERED from consideration if so.  It then
>   always produces a valid comparison (or an always true/false result),
>   and so comparison_code_valid_for_mode is not needed.  In particular,
>   "true for LT + GT" becomes NE for comparisons between integers but
>   remains LTGT for comparisons between floats.
>
> * There was a missing check for whether the comparison inputs had
>   side effects.
>
> While there, it also seemed worth extending
> simplify_logical_relational_operation to unsigned comparisons, since
> that makes the testing easier.
>
> As far as that testing goes: the patch exhaustively tests all
> combinations of integer comparisons in:
>
>   (cmp1 (cmp2 X Y) (cmp3 X Y))
>
> for the 10 integer comparisons, giving 1000 fold attempts in total.
> It then tries all combinations of (X in {-1,0,1} x Y in {-1,0,1})
> on the result of the fold, giving 9 checks per fold, or 9000 in total.
> That's probably more than is typical for self-tests, but it seems to
> complete in neglible time, even for -O0 builds.
>
> Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?

OK.

> The patch isn't exactly a spot fix, and the bug is ancient, so I suppose
> the patch probably isn't suitable for backports.

Maybe for GCC 14, but not without some soaking time of course.

Thanks,
Richard.

> Richard
>
>
> gcc/
>         PR rtl-optimization/117186
>         * rtl.h (simplify_context::simplify_logical_relational_operation): Add
>         an invert0_p parameter.
>         * simplify-rtx.cc (unsigned_comparison_to_mask): New function.
>         (mask_to_unsigned_comparison): Likewise.
>         (comparison_code_valid_for_mode): Delete.
>         (simplify_context::simplify_logical_relational_operation): Add
>         an invert0_p parameter.  Handle AND and XOR.  Handle unsigned
>         comparisons.  Handle always-false results.  Ignore the low bit
>         of the mask if the operands are always ordered and remove the
>         then-redundant check of comparison_code_valid_for_mode.  Check
>         for side-effects in the operands before simplifying them away.
>         (simplify_context::simplify_binary_operation_1): Remove
>         simplification of (compare (gt ...) (lt ...)) and instead...
>         (simplify_context::simplify_relational_operation_1): ...handle
>         comparisons of comparisons here.
>         (test_comparisons): New function.
>         (test_scalar_ops): Call it.
>
> gcc/testsuite/
>         PR rtl-optimization/117186
>         * gcc.dg/torture/pr117186.c: New test.
>         * gcc.target/aarch64/pr117186.c: Likewise.
> ---
>  gcc/rtl.h                                   |   3 +-
>  gcc/simplify-rtx.cc                         | 276 ++++++++++++++------
>  gcc/testsuite/gcc.dg/torture/pr117186.c     |  15 ++
>  gcc/testsuite/gcc.target/aarch64/pr117186.c | 128 +++++++++
>  4 files changed, 339 insertions(+), 83 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.dg/torture/pr117186.c
>  create mode 100644 gcc/testsuite/gcc.target/aarch64/pr117186.c
>
> diff --git a/gcc/rtl.h b/gcc/rtl.h
> index 3d211953b7d..3b676c46880 100644
> --- a/gcc/rtl.h
> +++ b/gcc/rtl.h
> @@ -3478,7 +3478,8 @@ private:
>    rtx simplify_byte_swapping_operation (rtx_code, machine_mode, rtx, rtx);
>    rtx simplify_associative_operation (rtx_code, machine_mode, rtx, rtx);
>    rtx simplify_distributive_operation (rtx_code, machine_mode, rtx, rtx);
> -  rtx simplify_logical_relational_operation (rtx_code, machine_mode, rtx, 
> rtx);
> +  rtx simplify_logical_relational_operation (rtx_code, machine_mode, rtx, 
> rtx,
> +                                            bool = false);
>    rtx simplify_binary_operation_series (rtx_code, machine_mode, rtx, rtx);
>    rtx simplify_distribute_over_subregs (rtx_code, machine_mode, rtx, rtx);
>    rtx simplify_shift_const_int (rtx_code, machine_mode, rtx, unsigned int);
> diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc
> index c77ce956332..71c5d3c1b1b 100644
> --- a/gcc/simplify-rtx.cc
> +++ b/gcc/simplify-rtx.cc
> @@ -2446,6 +2446,61 @@ simplify_context::simplify_associative_operation 
> (rtx_code code,
>    return 0;
>  }
>
> +/* If COMPARISON can be treated as an unsigned comparison, return a mask
> +   that represents it (8 if it includes <, 4 if it includes > and 2
> +   if it includes ==).  Return 0 otherwise.  */
> +static int
> +unsigned_comparison_to_mask (rtx_code comparison)
> +{
> +  switch (comparison)
> +    {
> +    case LTU:
> +      return 8;
> +    case GTU:
> +      return 4;
> +    case EQ:
> +      return 2;
> +
> +    case LEU:
> +      return 10;
> +    case GEU:
> +      return 6;
> +
> +    case NE:
> +      return 12;
> +
> +    default:
> +      return 0;
> +    }
> +}
> +
> +/* Reverse the mapping in unsigned_comparison_to_mask, going from masks
> +   to comparisons.  */
> +static rtx_code
> +mask_to_unsigned_comparison (int mask)
> +{
> +  switch (mask)
> +    {
> +    case 8:
> +      return LTU;
> +    case 4:
> +      return GTU;
> +    case 2:
> +      return EQ;
> +
> +    case 10:
> +      return LEU;
> +    case 6:
> +      return GEU;
> +
> +    case 12:
> +      return NE;
> +
> +    default:
> +      gcc_unreachable ();
> +    }
> +}
> +
>  /* Return a mask describing the COMPARISON.  */
>  static int
>  comparison_to_mask (enum rtx_code comparison)
> @@ -2530,53 +2585,6 @@ mask_to_comparison (int mask)
>      }
>  }
>
> -/* Return true if CODE is valid for comparisons of mode MODE, false
> -   otherwise.
> -
> -   It is always safe to return false, even if the code was valid for the
> -   given mode as that will merely suppress optimizations.  */
> -
> -static bool
> -comparison_code_valid_for_mode (enum rtx_code code, enum machine_mode mode)
> -{
> -  switch (code)
> -    {
> -      /* These are valid for integral, floating and vector modes.  */
> -      case NE:
> -      case EQ:
> -      case GE:
> -      case GT:
> -      case LE:
> -      case LT:
> -       return (INTEGRAL_MODE_P (mode)
> -               || FLOAT_MODE_P (mode)
> -               || VECTOR_MODE_P (mode));
> -
> -      /* These are valid for floating point modes.  */
> -      case LTGT:
> -      case UNORDERED:
> -      case ORDERED:
> -      case UNEQ:
> -      case UNGE:
> -      case UNGT:
> -      case UNLE:
> -      case UNLT:
> -       return FLOAT_MODE_P (mode);
> -
> -      /* These are filtered out in simplify_logical_operation, but
> -        we check for them too as a matter of safety.   They are valid
> -        for integral and vector modes.  */
> -      case GEU:
> -      case GTU:
> -      case LEU:
> -      case LTU:
> -       return INTEGRAL_MODE_P (mode) || VECTOR_MODE_P (mode);
> -
> -      default:
> -       gcc_unreachable ();
> -    }
> -}
> -
>  /* Canonicalize RES, a scalar const0_rtx/const_true_rtx to the right
>     false/true value of comparison with MODE where comparison operands
>     have CMP_MODE.  */
> @@ -2625,17 +2633,16 @@ relational_result (machine_mode mode, machine_mode 
> cmp_mode, rtx res)
>  }
>
>  /* Simplify a logical operation CODE with result mode MODE, operating on OP0
> -   and OP1, which should be both relational operations.  Return 0 if no such
> -   simplification is possible.  */
> +   and OP1, in the case where both are relational operations.  Assume that
> +   OP0 is inverted if INVERT0_P is true.
> +
> +   Return 0 if no such simplification is possible.  */
>  rtx
>  simplify_context::simplify_logical_relational_operation (rtx_code code,
>                                                          machine_mode mode,
> -                                                        rtx op0, rtx op1)
> +                                                        rtx op0, rtx op1,
> +                                                        bool invert0_p)
>  {
> -  /* We only handle IOR of two relational operations.  */
> -  if (code != IOR)
> -    return 0;
> -
>    if (!(COMPARISON_P (op0) && COMPARISON_P (op1)))
>      return 0;
>
> @@ -2643,28 +2650,64 @@ 
> simplify_context::simplify_logical_relational_operation (rtx_code code,
>         && rtx_equal_p (XEXP (op0, 1), XEXP (op1, 1))))
>      return 0;
>
> +  if (side_effects_p (op0))
> +    return 0;
> +
>    enum rtx_code code0 = GET_CODE (op0);
>    enum rtx_code code1 = GET_CODE (op1);
>
> -  /* We don't handle unsigned comparisons currently.  */
> -  if (code0 == LTU || code0 == GTU || code0 == LEU || code0 == GEU)
> -    return 0;
> -  if (code1 == LTU || code1 == GTU || code1 == LEU || code1 == GEU)
> -    return 0;
> +  /* Assume at first that the comparisons are on integers, and that the
> +     operands are therefore ordered.  */
> +  int all = 14;
> +  int mask0 = unsigned_comparison_to_mask (code0);
> +  int mask1 = unsigned_comparison_to_mask (code1);
> +  bool unsigned_p = (IN_RANGE (mask0 & 12, 4, 8)
> +                    || IN_RANGE (mask1 & 12, 4, 8));
> +  if (unsigned_p)
> +    {
> +      /* We only reach here when comparing integers.  Reject mixtures of 
> signed
> +        and unsigned comparisons.  */
> +      if (mask0 == 0 || mask1 == 0)
> +       return 0;
> +    }
> +  else
> +    {
> +      /* See whether the operands might be unordered.  */
> +      if (HONOR_NANS (GET_MODE (XEXP (op0, 0))))
> +       all = 15;
> +      mask0 = comparison_to_mask (code0) & all;
> +      mask1 = comparison_to_mask (code1) & all;
> +    }
>
> -  int mask0 = comparison_to_mask (code0);
> -  int mask1 = comparison_to_mask (code1);
> +  if (invert0_p)
> +    mask0 = mask0 ^ all;
>
> -  int mask = mask0 | mask1;
> +  int mask;
> +  if (code == AND)
> +    mask = mask0 & mask1;
> +  else if (code == IOR)
> +    mask = mask0 | mask1;
> +  else if (code == XOR)
> +    mask = mask0 ^ mask1;
> +  else
> +    return 0;
>
> -  if (mask == 15)
> +  if (mask == all)
>      return relational_result (mode, GET_MODE (op0), const_true_rtx);
>
> -  code = mask_to_comparison (mask);
> +  if (mask == 0)
> +    return relational_result (mode, GET_MODE (op0), const0_rtx);
>
> -  /* Many comparison codes are only valid for certain mode classes.  */
> -  if (!comparison_code_valid_for_mode (code, mode))
> -    return 0;
> +  if (unsigned_p)
> +    code = mask_to_unsigned_comparison (mask);
> +  else
> +    {
> +      code = mask_to_comparison (mask);
> +      /* LTGT and NE are arithmetically equivalent for ordered operands,
> +        with NE being the canonical choice.  */
> +      if (code == LTGT && all == 14)
> +       code = NE;
> +    }
>
>    op0 = XEXP (op1, 0);
>    op1 = XEXP (op1, 1);
> @@ -3217,21 +3260,6 @@ simplify_context::simplify_binary_operation_1 
> (rtx_code code,
>        break;
>
>      case COMPARE:
> -      /* Convert (compare (gt (flags) 0) (lt (flags) 0)) to (flags).  */
> -      if (((GET_CODE (op0) == GT && GET_CODE (op1) == LT)
> -          || (GET_CODE (op0) == GTU && GET_CODE (op1) == LTU))
> -         && XEXP (op0, 1) == const0_rtx && XEXP (op1, 1) == const0_rtx)
> -       {
> -         rtx xop00 = XEXP (op0, 0);
> -         rtx xop10 = XEXP (op1, 0);
> -
> -         if (REG_P (xop00) && REG_P (xop10)
> -             && REGNO (xop00) == REGNO (xop10)
> -             && GET_MODE (xop00) == mode
> -             && GET_MODE (xop10) == mode
> -             && GET_MODE_CLASS (mode) == MODE_CC)
> -           return xop00;
> -       }
>        break;
>
>      case MINUS:
> @@ -6405,6 +6433,31 @@ simplify_context::simplify_relational_operation_1 
> (rtx_code code,
>           && INTVAL (XEXP (tmp, 1)) == GET_MODE_PRECISION (int_mode) - 1)
>         return simplify_gen_binary (AND, mode, XEXP (tmp, 0), const1_rtx);
>      }
> +
> +  /* For two booleans A and B:
> +
> +     A >  B == ~B & A
> +     A >= B == ~B | A
> +     A <  B == ~A & B
> +     A <= B == ~A | B
> +     A == B == ~A ^ B (== ~B ^ A)
> +     A != B ==  A ^ B
> +
> +     simplify_logical_relational_operation checks whether A and B
> +     are booleans.  */
> +  if (code == GTU || code == GT)
> +    return simplify_logical_relational_operation (AND, mode, op1, op0, true);
> +  if (code == GEU || code == GE)
> +    return simplify_logical_relational_operation (IOR, mode, op1, op0, true);
> +  if (code == LTU || code == LT)
> +    return simplify_logical_relational_operation (AND, mode, op0, op1, true);
> +  if (code == LEU || code == LE)
> +    return simplify_logical_relational_operation (IOR, mode, op0, op1, true);
> +  if (code == EQ)
> +    return simplify_logical_relational_operation (XOR, mode, op0, op1, true);
> +  if (code == NE)
> +    return simplify_logical_relational_operation (XOR, mode, op0, op1);
> +
>    return NULL_RTX;
>  }
>
> @@ -8493,6 +8546,63 @@ test_scalar_int_ext_ops2 (machine_mode bmode, 
> machine_mode mmode,
>                  simplify_gen_unary (TRUNCATE, smode, mreg, mmode));
>  }
>
> +/* Test comparisons of comparisons, with the inner comparisons being
> +   between values of mode MODE2 and producing results of mode MODE1,
> +   and with the outer comparisons producing results of mode MODE0.  */
> +
> +static void
> +test_comparisons (machine_mode mode0, machine_mode mode1, machine_mode mode2)
> +{
> +  rtx reg0 = make_test_reg (mode2);
> +  rtx reg1 = make_test_reg (mode2);
> +
> +  static const rtx_code codes[] = {
> +    EQ, NE, LT, LTU, LE, LEU, GE, GEU, GT, GTU
> +  };
> +  constexpr auto num_codes = ARRAY_SIZE (codes);
> +  rtx cmps[num_codes];
> +  rtx vals[] = { constm1_rtx, const0_rtx, const1_rtx };
> +
> +  for (unsigned int i = 0; i < num_codes; ++i)
> +    cmps[i] = gen_rtx_fmt_ee (codes[i], mode1, reg0, reg1);
> +
> +  for (auto code : codes)
> +    for (unsigned int i0 = 0; i0 < num_codes; ++i0)
> +      for (unsigned int i1 = 0; i1 < num_codes; ++i1)
> +       {
> +         rtx cmp_res = simplify_relational_operation (code, mode0, mode1,
> +                                                      cmps[i0], cmps[i1]);
> +         if (i0 >= 2 && i1 >= 2 && (i0 ^ i1) & 1)
> +           ASSERT_TRUE (cmp_res == NULL_RTX);
> +         else
> +           {
> +             ASSERT_TRUE (cmp_res != NULL_RTX
> +                          && (CONSTANT_P (cmp_res)
> +                              || (COMPARISON_P (cmp_res)
> +                                  && GET_MODE (cmp_res) == mode0
> +                                  && REG_P (XEXP (cmp_res, 0))
> +                                  && REG_P (XEXP (cmp_res, 1)))));
> +             for (rtx reg0_val : vals)
> +               for (rtx reg1_val : vals)
> +                 {
> +                   rtx val0 = simplify_const_relational_operation
> +                     (codes[i0], mode1, reg0_val, reg1_val);
> +                   rtx val1 = simplify_const_relational_operation
> +                     (codes[i1], mode1, reg0_val, reg1_val);
> +                   rtx val = simplify_const_relational_operation
> +                     (code, mode0, val0, val1);
> +                   rtx folded = cmp_res;
> +                   if (COMPARISON_P (cmp_res))
> +                     folded = simplify_const_relational_operation
> +                       (GET_CODE (cmp_res), mode0,
> +                        XEXP (cmp_res, 0) == reg0 ? reg0_val : reg1_val,
> +                        XEXP (cmp_res, 1) == reg0 ? reg0_val : reg1_val);
> +                   ASSERT_RTX_EQ (val, folded);
> +                 }
> +           }
> +       }
> +}
> +
>
>  /* Verify some simplifications involving scalar expressions.  */
>
> @@ -8517,6 +8627,8 @@ test_scalar_ops ()
>    test_scalar_int_ext_ops2 (DImode, HImode, QImode);
>    test_scalar_int_ext_ops2 (DImode, SImode, QImode);
>    test_scalar_int_ext_ops2 (DImode, SImode, HImode);
> +
> +  test_comparisons (QImode, HImode, SImode);
>  }
>
>  /* Test vector simplifications involving VEC_DUPLICATE in which the
> diff --git a/gcc/testsuite/gcc.dg/torture/pr117186.c 
> b/gcc/testsuite/gcc.dg/torture/pr117186.c
> new file mode 100644
> index 00000000000..83bcb650102
> --- /dev/null
> +++ b/gcc/testsuite/gcc.dg/torture/pr117186.c
> @@ -0,0 +1,15 @@
> +/* { dg-do run } */
> +
> +[[gnu::noipa]] int
> +f1 (int x, int y)
> +{
> +  return (x < y) < (y < x);
> +}
> +
> +int
> +main (void)
> +{
> +  if (f1 (-1, 0))
> +    __builtin_abort ();
> +  return 0;
> +}
> diff --git a/gcc/testsuite/gcc.target/aarch64/pr117186.c 
> b/gcc/testsuite/gcc.target/aarch64/pr117186.c
> new file mode 100644
> index 00000000000..d6e2a4af58c
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr117186.c
> @@ -0,0 +1,128 @@
> +/* { dg-options "-O" } */
> +/* { dg-final { check-function-bodies "**" "" "" } } */
> +
> +/*
> +** f1:
> +** (
> +**     cmp     w0, w1
> +**     cset    w0, gt
> +** |
> +**     cmp     w1, w0
> +**     cset    w0, lt
> +** )
> +**     ret
> +*/
> +int
> +f1 (int x, int y)
> +{
> +  return (x < y) < (y < x);
> +}
> +
> +/*
> +** f2:
> +**     mov     w0, 0
> +**     ret
> +*/
> +int
> +f2 (int x, int y)
> +{
> +  return (x <= y) < (x < y);
> +}
> +
> +/*
> +** f3:
> +**     cmp     (w0, w1|w1, w0)
> +**     cset    w0, ne
> +**     ret
> +*/
> +int
> +f3 (int x, int y)
> +{
> +  return (x <= y) == (x < y);
> +}
> +
> +/*
> +** f4:
> +**     cmp     (w0, w1|w1, w0)
> +**     cset    w0, ne
> +**     ret
> +*/
> +int
> +f4 (int x, int y)
> +{
> +  return (y < x) != (x < y);
> +}
> +
> +/*
> +** f5:
> +**     cmp     (w0, w1|w1, w0)
> +**     cset    w0, eq
> +**     ret
> +*/
> +int
> +f5 (int x, int y)
> +{
> +  return (y >= x) > (y > x);
> +}
> +
> +/*
> +** f6:
> +** (
> +**     cmp     w0, w1
> +**     cset    w0, ge
> +** |
> +**     cmp     w1, w0
> +**     cset    w0, le
> +** )
> +**     ret
> +*/
> +int
> +f6 (int x, int y)
> +{
> +  return (x < y) < (y <= x);
> +}
> +
> +/*
> +** f7:
> +**     mov     w0, 1
> +**     ret
> +*/
> +int
> +f7 (int x, int y)
> +{
> +  return (x < y) <= (x <= y);
> +}
> +
> +/*
> +** f8:
> +** (
> +**     cmp     w0, w1
> +**     cset    w0, hi
> +** |
> +**     cmp     w1, w0
> +**     cset    w0, ls
> +** )
> +**     ret
> +*/
> +int
> +f8 (unsigned int x, unsigned int y)
> +{
> +  return (x < y) < (y < x);
> +}
> +
> +/*
> +** f9:
> +** (
> +**     cmp     w0, w1
> +**     cset    w0, (hs|cs)
> +** |
> +**     cmp     w1, w0
> +**     cset    w0, (lo|cc)
> +** )
> +**     ret
> +*/
> +int
> +f9 (unsigned int x, unsigned int y)
> +{
> +  return (x < y) < (y <= x);
> +}
> --
> 2.25.1
>

Reply via email to