On Mon, Jan 6, 2025 at 2:12 PM Richard Sandiford <richard.sandif...@arm.com> wrote: > > g:d882fe5150fbbeb4e44d007bb4964e5b22373021, posted at > https://gcc.gnu.org/pipermail/gcc-patches/2000-July/033786.html , > added code to treat: > > (set (reg:CC cc) (compare:CC (gt:M (reg:CC cc) 0) (lt:M (reg:CC cc) 0))) > > as a nop. This PR shows that that isn't always correct. > The compare in the set above is between two 0/1 booleans (at least > on STORE_FLAG_VALUE==1 targets), whereas the unknown comparison that > produced the incoming (reg:CC cc) is unconstrained; it could be between > arbitrary integers, or even floats. The fold is therefore replacing a > cc that is valid for both signed and unsigned comparisons with one that > is only known to be valid for signed comparisons. > > (gt (compare (gt cc 0) (lt cc 0) 0) > > does simplify to: > > (gt cc 0) > > but: > > (gtu (compare (gt cc 0) (lt cc 0) 0) > > does not simplify to: > > (gtu cc 0) > > The optimisation didn't come with a testcase, but it was added for > i386's cmpstrsi, now cmpstrnsi. That probably doesn't matter as much > as it once did, since it's now conditional on -minline-all-stringops. > But the patch is almost 25 years old, so whatever the original > motivation was, it seems likely that other things now rely on it. > > It therefore seems better to try to preserve the optimisation on rtl > rather than get rid of it. To do that, we need to look at how the > result of the outer compare is used. We'd therefore be looking at four > instructions (the gt, the lt, the compare, and the use of the compare), > but combine already allows that for 3-instruction combinations thanks > to: > > /* If the source is a COMPARE, look for the use of the comparison result > and try to simplify it unless we already have used undobuf.other_insn. > */ > > When applied to boolean inputs, a comparison operator is > effectively a boolean logical operator (AND, ANDNOT, XOR, etc.). > simplify_logical_relational_operation already had code to simplify > logical operators between two comparison results, but: > > * It only handled IOR, which doesn't cover all the cases needed here. > The others are easily added. > > * It treated comparisons of integers as having an ORDERED/UNORDERED result. > Therefore: > > * it would not treat "true for LT + EQ + GT" as "always true" for > comparisons between integers, because the mask excluded the UNORDERED > condition. > > * it would try to convert "true for LT + GT" into LTGT even for comparisons > between integers. To prevent an ICE later, the code used: > > /* Many comparison codes are only valid for certain mode classes. */ > if (!comparison_code_valid_for_mode (code, mode)) > return 0; > > However, this used the wrong mode, since "mode" is here the integer > result of the comparisons (and the mode of the IOR), not the mode of > the things being compared. Thus the effect was to reject all > floating-point-only codes, even when comparing floats. > > I think instead the code should detect whether the comparison is between > integer values and remove UNORDERED from consideration if so. It then > always produces a valid comparison (or an always true/false result), > and so comparison_code_valid_for_mode is not needed. In particular, > "true for LT + GT" becomes NE for comparisons between integers but > remains LTGT for comparisons between floats. > > * There was a missing check for whether the comparison inputs had > side effects. > > While there, it also seemed worth extending > simplify_logical_relational_operation to unsigned comparisons, since > that makes the testing easier. > > As far as that testing goes: the patch exhaustively tests all > combinations of integer comparisons in: > > (cmp1 (cmp2 X Y) (cmp3 X Y)) > > for the 10 integer comparisons, giving 1000 fold attempts in total. > It then tries all combinations of (X in {-1,0,1} x Y in {-1,0,1}) > on the result of the fold, giving 9 checks per fold, or 9000 in total. > That's probably more than is typical for self-tests, but it seems to > complete in neglible time, even for -O0 builds. > > Tested on aarch64-linux-gnu & x86_64-linux-gnu. OK to install?
OK. > The patch isn't exactly a spot fix, and the bug is ancient, so I suppose > the patch probably isn't suitable for backports. Maybe for GCC 14, but not without some soaking time of course. Thanks, Richard. > Richard > > > gcc/ > PR rtl-optimization/117186 > * rtl.h (simplify_context::simplify_logical_relational_operation): Add > an invert0_p parameter. > * simplify-rtx.cc (unsigned_comparison_to_mask): New function. > (mask_to_unsigned_comparison): Likewise. > (comparison_code_valid_for_mode): Delete. > (simplify_context::simplify_logical_relational_operation): Add > an invert0_p parameter. Handle AND and XOR. Handle unsigned > comparisons. Handle always-false results. Ignore the low bit > of the mask if the operands are always ordered and remove the > then-redundant check of comparison_code_valid_for_mode. Check > for side-effects in the operands before simplifying them away. > (simplify_context::simplify_binary_operation_1): Remove > simplification of (compare (gt ...) (lt ...)) and instead... > (simplify_context::simplify_relational_operation_1): ...handle > comparisons of comparisons here. > (test_comparisons): New function. > (test_scalar_ops): Call it. > > gcc/testsuite/ > PR rtl-optimization/117186 > * gcc.dg/torture/pr117186.c: New test. > * gcc.target/aarch64/pr117186.c: Likewise. > --- > gcc/rtl.h | 3 +- > gcc/simplify-rtx.cc | 276 ++++++++++++++------ > gcc/testsuite/gcc.dg/torture/pr117186.c | 15 ++ > gcc/testsuite/gcc.target/aarch64/pr117186.c | 128 +++++++++ > 4 files changed, 339 insertions(+), 83 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/torture/pr117186.c > create mode 100644 gcc/testsuite/gcc.target/aarch64/pr117186.c > > diff --git a/gcc/rtl.h b/gcc/rtl.h > index 3d211953b7d..3b676c46880 100644 > --- a/gcc/rtl.h > +++ b/gcc/rtl.h > @@ -3478,7 +3478,8 @@ private: > rtx simplify_byte_swapping_operation (rtx_code, machine_mode, rtx, rtx); > rtx simplify_associative_operation (rtx_code, machine_mode, rtx, rtx); > rtx simplify_distributive_operation (rtx_code, machine_mode, rtx, rtx); > - rtx simplify_logical_relational_operation (rtx_code, machine_mode, rtx, > rtx); > + rtx simplify_logical_relational_operation (rtx_code, machine_mode, rtx, > rtx, > + bool = false); > rtx simplify_binary_operation_series (rtx_code, machine_mode, rtx, rtx); > rtx simplify_distribute_over_subregs (rtx_code, machine_mode, rtx, rtx); > rtx simplify_shift_const_int (rtx_code, machine_mode, rtx, unsigned int); > diff --git a/gcc/simplify-rtx.cc b/gcc/simplify-rtx.cc > index c77ce956332..71c5d3c1b1b 100644 > --- a/gcc/simplify-rtx.cc > +++ b/gcc/simplify-rtx.cc > @@ -2446,6 +2446,61 @@ simplify_context::simplify_associative_operation > (rtx_code code, > return 0; > } > > +/* If COMPARISON can be treated as an unsigned comparison, return a mask > + that represents it (8 if it includes <, 4 if it includes > and 2 > + if it includes ==). Return 0 otherwise. */ > +static int > +unsigned_comparison_to_mask (rtx_code comparison) > +{ > + switch (comparison) > + { > + case LTU: > + return 8; > + case GTU: > + return 4; > + case EQ: > + return 2; > + > + case LEU: > + return 10; > + case GEU: > + return 6; > + > + case NE: > + return 12; > + > + default: > + return 0; > + } > +} > + > +/* Reverse the mapping in unsigned_comparison_to_mask, going from masks > + to comparisons. */ > +static rtx_code > +mask_to_unsigned_comparison (int mask) > +{ > + switch (mask) > + { > + case 8: > + return LTU; > + case 4: > + return GTU; > + case 2: > + return EQ; > + > + case 10: > + return LEU; > + case 6: > + return GEU; > + > + case 12: > + return NE; > + > + default: > + gcc_unreachable (); > + } > +} > + > /* Return a mask describing the COMPARISON. */ > static int > comparison_to_mask (enum rtx_code comparison) > @@ -2530,53 +2585,6 @@ mask_to_comparison (int mask) > } > } > > -/* Return true if CODE is valid for comparisons of mode MODE, false > - otherwise. > - > - It is always safe to return false, even if the code was valid for the > - given mode as that will merely suppress optimizations. */ > - > -static bool > -comparison_code_valid_for_mode (enum rtx_code code, enum machine_mode mode) > -{ > - switch (code) > - { > - /* These are valid for integral, floating and vector modes. */ > - case NE: > - case EQ: > - case GE: > - case GT: > - case LE: > - case LT: > - return (INTEGRAL_MODE_P (mode) > - || FLOAT_MODE_P (mode) > - || VECTOR_MODE_P (mode)); > - > - /* These are valid for floating point modes. */ > - case LTGT: > - case UNORDERED: > - case ORDERED: > - case UNEQ: > - case UNGE: > - case UNGT: > - case UNLE: > - case UNLT: > - return FLOAT_MODE_P (mode); > - > - /* These are filtered out in simplify_logical_operation, but > - we check for them too as a matter of safety. They are valid > - for integral and vector modes. */ > - case GEU: > - case GTU: > - case LEU: > - case LTU: > - return INTEGRAL_MODE_P (mode) || VECTOR_MODE_P (mode); > - > - default: > - gcc_unreachable (); > - } > -} > - > /* Canonicalize RES, a scalar const0_rtx/const_true_rtx to the right > false/true value of comparison with MODE where comparison operands > have CMP_MODE. */ > @@ -2625,17 +2633,16 @@ relational_result (machine_mode mode, machine_mode > cmp_mode, rtx res) > } > > /* Simplify a logical operation CODE with result mode MODE, operating on OP0 > - and OP1, which should be both relational operations. Return 0 if no such > - simplification is possible. */ > + and OP1, in the case where both are relational operations. Assume that > + OP0 is inverted if INVERT0_P is true. > + > + Return 0 if no such simplification is possible. */ > rtx > simplify_context::simplify_logical_relational_operation (rtx_code code, > machine_mode mode, > - rtx op0, rtx op1) > + rtx op0, rtx op1, > + bool invert0_p) > { > - /* We only handle IOR of two relational operations. */ > - if (code != IOR) > - return 0; > - > if (!(COMPARISON_P (op0) && COMPARISON_P (op1))) > return 0; > > @@ -2643,28 +2650,64 @@ > simplify_context::simplify_logical_relational_operation (rtx_code code, > && rtx_equal_p (XEXP (op0, 1), XEXP (op1, 1)))) > return 0; > > + if (side_effects_p (op0)) > + return 0; > + > enum rtx_code code0 = GET_CODE (op0); > enum rtx_code code1 = GET_CODE (op1); > > - /* We don't handle unsigned comparisons currently. */ > - if (code0 == LTU || code0 == GTU || code0 == LEU || code0 == GEU) > - return 0; > - if (code1 == LTU || code1 == GTU || code1 == LEU || code1 == GEU) > - return 0; > + /* Assume at first that the comparisons are on integers, and that the > + operands are therefore ordered. */ > + int all = 14; > + int mask0 = unsigned_comparison_to_mask (code0); > + int mask1 = unsigned_comparison_to_mask (code1); > + bool unsigned_p = (IN_RANGE (mask0 & 12, 4, 8) > + || IN_RANGE (mask1 & 12, 4, 8)); > + if (unsigned_p) > + { > + /* We only reach here when comparing integers. Reject mixtures of > signed > + and unsigned comparisons. */ > + if (mask0 == 0 || mask1 == 0) > + return 0; > + } > + else > + { > + /* See whether the operands might be unordered. */ > + if (HONOR_NANS (GET_MODE (XEXP (op0, 0)))) > + all = 15; > + mask0 = comparison_to_mask (code0) & all; > + mask1 = comparison_to_mask (code1) & all; > + } > > - int mask0 = comparison_to_mask (code0); > - int mask1 = comparison_to_mask (code1); > + if (invert0_p) > + mask0 = mask0 ^ all; > > - int mask = mask0 | mask1; > + int mask; > + if (code == AND) > + mask = mask0 & mask1; > + else if (code == IOR) > + mask = mask0 | mask1; > + else if (code == XOR) > + mask = mask0 ^ mask1; > + else > + return 0; > > - if (mask == 15) > + if (mask == all) > return relational_result (mode, GET_MODE (op0), const_true_rtx); > > - code = mask_to_comparison (mask); > + if (mask == 0) > + return relational_result (mode, GET_MODE (op0), const0_rtx); > > - /* Many comparison codes are only valid for certain mode classes. */ > - if (!comparison_code_valid_for_mode (code, mode)) > - return 0; > + if (unsigned_p) > + code = mask_to_unsigned_comparison (mask); > + else > + { > + code = mask_to_comparison (mask); > + /* LTGT and NE are arithmetically equivalent for ordered operands, > + with NE being the canonical choice. */ > + if (code == LTGT && all == 14) > + code = NE; > + } > > op0 = XEXP (op1, 0); > op1 = XEXP (op1, 1); > @@ -3217,21 +3260,6 @@ simplify_context::simplify_binary_operation_1 > (rtx_code code, > break; > > case COMPARE: > - /* Convert (compare (gt (flags) 0) (lt (flags) 0)) to (flags). */ > - if (((GET_CODE (op0) == GT && GET_CODE (op1) == LT) > - || (GET_CODE (op0) == GTU && GET_CODE (op1) == LTU)) > - && XEXP (op0, 1) == const0_rtx && XEXP (op1, 1) == const0_rtx) > - { > - rtx xop00 = XEXP (op0, 0); > - rtx xop10 = XEXP (op1, 0); > - > - if (REG_P (xop00) && REG_P (xop10) > - && REGNO (xop00) == REGNO (xop10) > - && GET_MODE (xop00) == mode > - && GET_MODE (xop10) == mode > - && GET_MODE_CLASS (mode) == MODE_CC) > - return xop00; > - } > break; > > case MINUS: > @@ -6405,6 +6433,31 @@ simplify_context::simplify_relational_operation_1 > (rtx_code code, > && INTVAL (XEXP (tmp, 1)) == GET_MODE_PRECISION (int_mode) - 1) > return simplify_gen_binary (AND, mode, XEXP (tmp, 0), const1_rtx); > } > + > + /* For two booleans A and B: > + > + A > B == ~B & A > + A >= B == ~B | A > + A < B == ~A & B > + A <= B == ~A | B > + A == B == ~A ^ B (== ~B ^ A) > + A != B == A ^ B > + > + simplify_logical_relational_operation checks whether A and B > + are booleans. */ > + if (code == GTU || code == GT) > + return simplify_logical_relational_operation (AND, mode, op1, op0, true); > + if (code == GEU || code == GE) > + return simplify_logical_relational_operation (IOR, mode, op1, op0, true); > + if (code == LTU || code == LT) > + return simplify_logical_relational_operation (AND, mode, op0, op1, true); > + if (code == LEU || code == LE) > + return simplify_logical_relational_operation (IOR, mode, op0, op1, true); > + if (code == EQ) > + return simplify_logical_relational_operation (XOR, mode, op0, op1, true); > + if (code == NE) > + return simplify_logical_relational_operation (XOR, mode, op0, op1); > + > return NULL_RTX; > } > > @@ -8493,6 +8546,63 @@ test_scalar_int_ext_ops2 (machine_mode bmode, > machine_mode mmode, > simplify_gen_unary (TRUNCATE, smode, mreg, mmode)); > } > > +/* Test comparisons of comparisons, with the inner comparisons being > + between values of mode MODE2 and producing results of mode MODE1, > + and with the outer comparisons producing results of mode MODE0. */ > + > +static void > +test_comparisons (machine_mode mode0, machine_mode mode1, machine_mode mode2) > +{ > + rtx reg0 = make_test_reg (mode2); > + rtx reg1 = make_test_reg (mode2); > + > + static const rtx_code codes[] = { > + EQ, NE, LT, LTU, LE, LEU, GE, GEU, GT, GTU > + }; > + constexpr auto num_codes = ARRAY_SIZE (codes); > + rtx cmps[num_codes]; > + rtx vals[] = { constm1_rtx, const0_rtx, const1_rtx }; > + > + for (unsigned int i = 0; i < num_codes; ++i) > + cmps[i] = gen_rtx_fmt_ee (codes[i], mode1, reg0, reg1); > + > + for (auto code : codes) > + for (unsigned int i0 = 0; i0 < num_codes; ++i0) > + for (unsigned int i1 = 0; i1 < num_codes; ++i1) > + { > + rtx cmp_res = simplify_relational_operation (code, mode0, mode1, > + cmps[i0], cmps[i1]); > + if (i0 >= 2 && i1 >= 2 && (i0 ^ i1) & 1) > + ASSERT_TRUE (cmp_res == NULL_RTX); > + else > + { > + ASSERT_TRUE (cmp_res != NULL_RTX > + && (CONSTANT_P (cmp_res) > + || (COMPARISON_P (cmp_res) > + && GET_MODE (cmp_res) == mode0 > + && REG_P (XEXP (cmp_res, 0)) > + && REG_P (XEXP (cmp_res, 1))))); > + for (rtx reg0_val : vals) > + for (rtx reg1_val : vals) > + { > + rtx val0 = simplify_const_relational_operation > + (codes[i0], mode1, reg0_val, reg1_val); > + rtx val1 = simplify_const_relational_operation > + (codes[i1], mode1, reg0_val, reg1_val); > + rtx val = simplify_const_relational_operation > + (code, mode0, val0, val1); > + rtx folded = cmp_res; > + if (COMPARISON_P (cmp_res)) > + folded = simplify_const_relational_operation > + (GET_CODE (cmp_res), mode0, > + XEXP (cmp_res, 0) == reg0 ? reg0_val : reg1_val, > + XEXP (cmp_res, 1) == reg0 ? reg0_val : reg1_val); > + ASSERT_RTX_EQ (val, folded); > + } > + } > + } > +} > + > > /* Verify some simplifications involving scalar expressions. */ > > @@ -8517,6 +8627,8 @@ test_scalar_ops () > test_scalar_int_ext_ops2 (DImode, HImode, QImode); > test_scalar_int_ext_ops2 (DImode, SImode, QImode); > test_scalar_int_ext_ops2 (DImode, SImode, HImode); > + > + test_comparisons (QImode, HImode, SImode); > } > > /* Test vector simplifications involving VEC_DUPLICATE in which the > diff --git a/gcc/testsuite/gcc.dg/torture/pr117186.c > b/gcc/testsuite/gcc.dg/torture/pr117186.c > new file mode 100644 > index 00000000000..83bcb650102 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/torture/pr117186.c > @@ -0,0 +1,15 @@ > +/* { dg-do run } */ > + > +[[gnu::noipa]] int > +f1 (int x, int y) > +{ > + return (x < y) < (y < x); > +} > + > +int > +main (void) > +{ > + if (f1 (-1, 0)) > + __builtin_abort (); > + return 0; > +} > diff --git a/gcc/testsuite/gcc.target/aarch64/pr117186.c > b/gcc/testsuite/gcc.target/aarch64/pr117186.c > new file mode 100644 > index 00000000000..d6e2a4af58c > --- /dev/null > +++ b/gcc/testsuite/gcc.target/aarch64/pr117186.c > @@ -0,0 +1,128 @@ > +/* { dg-options "-O" } */ > +/* { dg-final { check-function-bodies "**" "" "" } } */ > + > +/* > +** f1: > +** ( > +** cmp w0, w1 > +** cset w0, gt > +** | > +** cmp w1, w0 > +** cset w0, lt > +** ) > +** ret > +*/ > +int > +f1 (int x, int y) > +{ > + return (x < y) < (y < x); > +} > + > +/* > +** f2: > +** mov w0, 0 > +** ret > +*/ > +int > +f2 (int x, int y) > +{ > + return (x <= y) < (x < y); > +} > + > +/* > +** f3: > +** cmp (w0, w1|w1, w0) > +** cset w0, ne > +** ret > +*/ > +int > +f3 (int x, int y) > +{ > + return (x <= y) == (x < y); > +} > + > +/* > +** f4: > +** cmp (w0, w1|w1, w0) > +** cset w0, ne > +** ret > +*/ > +int > +f4 (int x, int y) > +{ > + return (y < x) != (x < y); > +} > + > +/* > +** f5: > +** cmp (w0, w1|w1, w0) > +** cset w0, eq > +** ret > +*/ > +int > +f5 (int x, int y) > +{ > + return (y >= x) > (y > x); > +} > + > +/* > +** f6: > +** ( > +** cmp w0, w1 > +** cset w0, ge > +** | > +** cmp w1, w0 > +** cset w0, le > +** ) > +** ret > +*/ > +int > +f6 (int x, int y) > +{ > + return (x < y) < (y <= x); > +} > + > +/* > +** f7: > +** mov w0, 1 > +** ret > +*/ > +int > +f7 (int x, int y) > +{ > + return (x < y) <= (x <= y); > +} > + > +/* > +** f8: > +** ( > +** cmp w0, w1 > +** cset w0, hi > +** | > +** cmp w1, w0 > +** cset w0, ls > +** ) > +** ret > +*/ > +int > +f8 (unsigned int x, unsigned int y) > +{ > + return (x < y) < (y < x); > +} > + > +/* > +** f9: > +** ( > +** cmp w0, w1 > +** cset w0, (hs|cs) > +** | > +** cmp w1, w0 > +** cset w0, (lo|cc) > +** ) > +** ret > +*/ > +int > +f9 (unsigned int x, unsigned int y) > +{ > + return (x < y) < (y <= x); > +} > -- > 2.25.1 >