Richard Biener <richard.guent...@gmail.com> writes:
> On Mon, Jan 6, 2025 at 2:12 PM Richard Sandiford
> <richard.sandif...@arm.com> wrote:
>>
>> g:d882fe5150fbbeb4e44d007bb4964e5b22373021, posted at
>> https://gcc.gnu.org/pipermail/gcc-patches/2000-July/033786.html ,
>> added code to treat:
>>
>>   (set (reg:CC cc) (compare:CC (gt:M (reg:CC cc) 0) (lt:M (reg:CC cc) 0)))
>>
>> as a nop.  This PR shows that that isn't always correct.
>> The compare in the set above is between two 0/1 booleans (at least
>> on STORE_FLAG_VALUE==1 targets), whereas the unknown comparison that
>> produced the incoming (reg:CC cc) is unconstrained; it could be between
>> arbitrary integers, or even floats.  The fold is therefore replacing a
>> cc that is valid for both signed and unsigned comparisons with one that
>> is only known to be valid for signed comparisons.
>>
>>   (gt (compare (gt cc 0) (lt cc 0) 0)
>>
>> does simplify to:
>>
>>   (gt cc 0)
>>
>> but:
>>
>>   (gtu (compare (gt cc 0) (lt cc 0) 0)
>>
>> does not simplify to:
>>
>>   (gtu cc 0)
>>
>> The optimisation didn't come with a testcase, but it was added for
>> i386's cmpstrsi, now cmpstrnsi.  That probably doesn't matter as much
>> as it once did, since it's now conditional on -minline-all-stringops.
>> But the patch is almost 25 years old, so whatever the original
>> motivation was, it seems likely that other things now rely on it.
>>
>> It therefore seems better to try to preserve the optimisation on rtl
>> rather than get rid of it.  To do that, we need to look at how the
>> result of the outer compare is used.  We'd therefore be looking at four
>> instructions (the gt, the lt, the compare, and the use of the compare),
>> but combine already allows that for 3-instruction combinations thanks
>> to:
>>
>>   /* If the source is a COMPARE, look for the use of the comparison result
>>      and try to simplify it unless we already have used undobuf.other_insn.  
>> */
>>
>> When applied to boolean inputs, a comparison operator is
>> effectively a boolean logical operator (AND, ANDNOT, XOR, etc.).
>> simplify_logical_relational_operation already had code to simplify
>> logical operators between two comparison results, but:
>>
>> * It only handled IOR, which doesn't cover all the cases needed here.
>>   The others are easily added.
>>
>> * It treated comparisons of integers as having an ORDERED/UNORDERED result.
>>   Therefore:
>>
>>   * it would not treat "true for LT + EQ + GT" as "always true" for
>>     comparisons between integers, because the mask excluded the UNORDERED
>>     condition.
>>
>>   * it would try to convert "true for LT + GT" into LTGT even for comparisons
>>     between integers.  To prevent an ICE later, the code used:
>>
>>        /* Many comparison codes are only valid for certain mode classes.  */
>>        if (!comparison_code_valid_for_mode (code, mode))
>>          return 0;
>>
>>     However, this used the wrong mode, since "mode" is here the integer
>>     result of the comparisons (and the mode of the IOR), not the mode of
>>     the things being compared.  Thus the effect was to reject all
>>     floating-point-only codes, even when comparing floats.
>>
>>   I think instead the code should detect whether the comparison is between
>>   integer values and remove UNORDERED from consideration if so.  It then
>>   always produces a valid comparison (or an always true/false result),
>>   and so comparison_code_valid_for_mode is not needed.  In particular,
>>   "true for LT + GT" becomes NE for comparisons between integers but
>>   remains LTGT for comparisons between floats.
>>
>> * There was a missing check for whether the comparison inputs had
>>   side effects.
>>
>> While there, it also seemed worth extending
>> simplify_logical_relational_operation to unsigned comparisons, since
>> that makes the testing easier.
>>
>> As far as that testing goes: the patch exhaustively tests all
>> combinations of integer comparisons in:
>>
>>   (cmp1 (cmp2 X Y) (cmp3 X Y))
>>
>> for the 10 integer comparisons, giving 1000 fold attempts in total.
>> It then tries all combinations of (X in {-1,0,1} x Y in {-1,0,1})
>> on the result of the fold, giving 9 checks per fold, or 9000 in total.
>> That's probably more than is typical for self-tests, but it seems to
>> complete in neglible time, even for -O0 builds.
>>
>> Tested on aarch64-linux-gnu & x86_64-linux-gnu.  OK to install?
>
> OK.

Thanks!

>> The patch isn't exactly a spot fix, and the bug is ancient, so I suppose
>> the patch probably isn't suitable for backports.
>
> Maybe for GCC 14, but not without some soaking time of course.

OK, I'll leave the PR open for that.

JTFR, I noticed while reading the patch back that I'd fluffed the
attempt to future-proof the aarch64 test.  The ls and lo conditions in:

>> +/*
>> +** f8:
>> +** (
>> +**     cmp     w0, w1
>> +**     cset    w0, hi
>> +** |
>> +**     cmp     w1, w0
>> +**     cset    w0, ls
>> +** )
>> +**     ret
>> +*/
>> +int
>> +f8 (unsigned int x, unsigned int y)
>> +{
>> +  return (x < y) < (y < x);
>> +}
>> +
>> +/*
>> +** f9:
>> +** (
>> +**     cmp     w0, w1
>> +**     cset    w0, (hs|cs)
>> +** |
>> +**     cmp     w1, w0
>> +**     cset    w0, (lo|cc)
>> +** )
>> +**     ret
>> +*/
>> +int
>> +f9 (unsigned int x, unsigned int y)
>> +{
>> +  return (x < y) < (y <= x);
>> +}
>> --
>> 2.25.1
>>

were the wrong way around.  I've pushed the patch with that change.

Richard

Reply via email to