15 Regression] aarch64 wrong code for (a < b) < (b < a)

cvs-commit at gcc dot gnu.org via Gcc-bugs Fri, 10 Jan 2025 04:52:05 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117186


--- Comment #7 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The trunk branch has been updated by Richard Sandiford <rsand...@gcc.gnu.org>:

https://gcc.gnu.org/g:06c4cf398947b53b4bfc65752f9f879bb2d07924

commit r15-6777-g06c4cf398947b53b4bfc65752f9f879bb2d07924
Author: Richard Sandiford <richard.sandif...@arm.com>
Date:   Fri Jan 10 12:51:15 2025 +0000

    rtl: Remove invalid compare simplification [PR117186]

    g:d882fe5150fbbeb4e44d007bb4964e5b22373021, posted at
    https://gcc.gnu.org/pipermail/gcc-patches/2000-July/033786.html ,
    added code to treat:

      (set (reg:CC cc) (compare:CC (gt:M (reg:CC cc) 0) (lt:M (reg:CC cc) 0)))

    as a nop.  This PR shows that that isn't always correct.
    The compare in the set above is between two 0/1 booleans (at least
    on STORE_FLAG_VALUE==1 targets), whereas the unknown comparison that
    produced the incoming (reg:CC cc) is unconstrained; it could be between
    arbitrary integers, or even floats.  The fold is therefore replacing a
    cc that is valid for both signed and unsigned comparisons with one that
    is only known to be valid for signed comparisons.

      (gt (compare (gt cc 0) (lt cc 0) 0)

    does simplify to:

      (gt cc 0)

    but:

      (gtu (compare (gt cc 0) (lt cc 0) 0)

    does not simplify to:

      (gtu cc 0)

    The optimisation didn't come with a testcase, but it was added for
    i386's cmpstrsi, now cmpstrnsi.  That probably doesn't matter as much
    as it once did, since it's now conditional on -minline-all-stringops.
    But the patch is almost 25 years old, so whatever the original
    motivation was, it seems likely that other things now rely on it.

    It therefore seems better to try to preserve the optimisation on rtl
    rather than get rid of it.  To do that, we need to look at how the
    result of the outer compare is used.  We'd therefore be looking at four
    instructions (the gt, the lt, the compare, and the use of the compare),
    but combine already allows that for 3-instruction combinations thanks
    to:

      /* If the source is a COMPARE, look for the use of the comparison result
         and try to simplify it unless we already have used undobuf.other_insn.
 */

    When applied to boolean inputs, a comparison operator is
    effectively a boolean logical operator (AND, ANDNOT, XOR, etc.).
    simplify_logical_relational_operation already had code to simplify
    logical operators between two comparison results, but:

    * It only handled IOR, which doesn't cover all the cases needed here.
      The others are easily added.

    * It treated comparisons of integers as having an ORDERED/UNORDERED result.
      Therefore:

      * it would not treat "true for LT + EQ + GT" as "always true" for
        comparisons between integers, because the mask excluded the UNORDERED
        condition.

      * it would try to convert "true for LT + GT" into LTGT even for
comparisons
        between integers.  To prevent an ICE later, the code used:

           /* Many comparison codes are only valid for certain mode classes. 
*/
           if (!comparison_code_valid_for_mode (code, mode))
             return 0;

        However, this used the wrong mode, since "mode" is here the integer
        result of the comparisons (and the mode of the IOR), not the mode of
        the things being compared.  Thus the effect was to reject all
        floating-point-only codes, even when comparing floats.

      I think instead the code should detect whether the comparison is between
      integer values and remove UNORDERED from consideration if so.  It then
      always produces a valid comparison (or an always true/false result),
      and so comparison_code_valid_for_mode is not needed.  In particular,
      "true for LT + GT" becomes NE for comparisons between integers but
      remains LTGT for comparisons between floats.

    * There was a missing check for whether the comparison inputs had
      side effects.

    While there, it also seemed worth extending
    simplify_logical_relational_operation to unsigned comparisons, since
    that makes the testing easier.

    As far as that testing goes: the patch exhaustively tests all
    combinations of integer comparisons in:

      (cmp1 (cmp2 X Y) (cmp3 X Y))

    for the 10 integer comparisons, giving 1000 fold attempts in total.
    It then tries all combinations of (X in {-1,0,1} x Y in {-1,0,1})
    on the result of the fold, giving 9 checks per fold, or 9000 in total.
    That's probably more than is typical for self-tests, but it seems to
    complete in neglible time, even for -O0 builds.

    gcc/
            PR rtl-optimization/117186
            * rtl.h (simplify_context::simplify_logical_relational_operation):
Add
            an invert0_p parameter.
            * simplify-rtx.cc (unsigned_comparison_to_mask): New function.
            (mask_to_unsigned_comparison): Likewise.
            (comparison_code_valid_for_mode): Delete.
            (simplify_context::simplify_logical_relational_operation): Add
            an invert0_p parameter.  Handle AND and XOR.  Handle unsigned
            comparisons.  Handle always-false results.  Ignore the low bit
            of the mask if the operands are always ordered and remove the
            then-redundant check of comparison_code_valid_for_mode.  Check
            for side-effects in the operands before simplifying them away.
            (simplify_context::simplify_binary_operation_1): Remove
            simplification of (compare (gt ...) (lt ...)) and instead...
            (simplify_context::simplify_relational_operation_1): ...handle
            comparisons of comparisons here.
            (test_comparisons): New function.
            (test_scalar_ops): Call it.

    gcc/testsuite/
            PR rtl-optimization/117186
            * gcc.dg/torture/pr117186.c: New test.
            * gcc.target/aarch64/pr117186.c: Likewise.

[Bug rtl-optimization/117186] [12/13/14/15 Regression] aarch64 wrong code for (a < b) < (b < a)

Reply via email to