https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117186
--- Comment #7 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The trunk branch has been updated by Richard Sandiford <rsand...@gcc.gnu.org>: https://gcc.gnu.org/g:06c4cf398947b53b4bfc65752f9f879bb2d07924 commit r15-6777-g06c4cf398947b53b4bfc65752f9f879bb2d07924 Author: Richard Sandiford <richard.sandif...@arm.com> Date: Fri Jan 10 12:51:15 2025 +0000 rtl: Remove invalid compare simplification [PR117186] g:d882fe5150fbbeb4e44d007bb4964e5b22373021, posted at https://gcc.gnu.org/pipermail/gcc-patches/2000-July/033786.html , added code to treat: (set (reg:CC cc) (compare:CC (gt:M (reg:CC cc) 0) (lt:M (reg:CC cc) 0))) as a nop. This PR shows that that isn't always correct. The compare in the set above is between two 0/1 booleans (at least on STORE_FLAG_VALUE==1 targets), whereas the unknown comparison that produced the incoming (reg:CC cc) is unconstrained; it could be between arbitrary integers, or even floats. The fold is therefore replacing a cc that is valid for both signed and unsigned comparisons with one that is only known to be valid for signed comparisons. (gt (compare (gt cc 0) (lt cc 0) 0) does simplify to: (gt cc 0) but: (gtu (compare (gt cc 0) (lt cc 0) 0) does not simplify to: (gtu cc 0) The optimisation didn't come with a testcase, but it was added for i386's cmpstrsi, now cmpstrnsi. That probably doesn't matter as much as it once did, since it's now conditional on -minline-all-stringops. But the patch is almost 25 years old, so whatever the original motivation was, it seems likely that other things now rely on it. It therefore seems better to try to preserve the optimisation on rtl rather than get rid of it. To do that, we need to look at how the result of the outer compare is used. We'd therefore be looking at four instructions (the gt, the lt, the compare, and the use of the compare), but combine already allows that for 3-instruction combinations thanks to: /* If the source is a COMPARE, look for the use of the comparison result and try to simplify it unless we already have used undobuf.other_insn. */ When applied to boolean inputs, a comparison operator is effectively a boolean logical operator (AND, ANDNOT, XOR, etc.). simplify_logical_relational_operation already had code to simplify logical operators between two comparison results, but: * It only handled IOR, which doesn't cover all the cases needed here. The others are easily added. * It treated comparisons of integers as having an ORDERED/UNORDERED result. Therefore: * it would not treat "true for LT + EQ + GT" as "always true" for comparisons between integers, because the mask excluded the UNORDERED condition. * it would try to convert "true for LT + GT" into LTGT even for comparisons between integers. To prevent an ICE later, the code used: /* Many comparison codes are only valid for certain mode classes. */ if (!comparison_code_valid_for_mode (code, mode)) return 0; However, this used the wrong mode, since "mode" is here the integer result of the comparisons (and the mode of the IOR), not the mode of the things being compared. Thus the effect was to reject all floating-point-only codes, even when comparing floats. I think instead the code should detect whether the comparison is between integer values and remove UNORDERED from consideration if so. It then always produces a valid comparison (or an always true/false result), and so comparison_code_valid_for_mode is not needed. In particular, "true for LT + GT" becomes NE for comparisons between integers but remains LTGT for comparisons between floats. * There was a missing check for whether the comparison inputs had side effects. While there, it also seemed worth extending simplify_logical_relational_operation to unsigned comparisons, since that makes the testing easier. As far as that testing goes: the patch exhaustively tests all combinations of integer comparisons in: (cmp1 (cmp2 X Y) (cmp3 X Y)) for the 10 integer comparisons, giving 1000 fold attempts in total. It then tries all combinations of (X in {-1,0,1} x Y in {-1,0,1}) on the result of the fold, giving 9 checks per fold, or 9000 in total. That's probably more than is typical for self-tests, but it seems to complete in neglible time, even for -O0 builds. gcc/ PR rtl-optimization/117186 * rtl.h (simplify_context::simplify_logical_relational_operation): Add an invert0_p parameter. * simplify-rtx.cc (unsigned_comparison_to_mask): New function. (mask_to_unsigned_comparison): Likewise. (comparison_code_valid_for_mode): Delete. (simplify_context::simplify_logical_relational_operation): Add an invert0_p parameter. Handle AND and XOR. Handle unsigned comparisons. Handle always-false results. Ignore the low bit of the mask if the operands are always ordered and remove the then-redundant check of comparison_code_valid_for_mode. Check for side-effects in the operands before simplifying them away. (simplify_context::simplify_binary_operation_1): Remove simplification of (compare (gt ...) (lt ...)) and instead... (simplify_context::simplify_relational_operation_1): ...handle comparisons of comparisons here. (test_comparisons): New function. (test_scalar_ops): Call it. gcc/testsuite/ PR rtl-optimization/117186 * gcc.dg/torture/pr117186.c: New test. * gcc.target/aarch64/pr117186.c: Likewise.