Richard Biener <richard.guent...@gmail.com> writes: > On Mon, Jan 6, 2025 at 2:12 PM Richard Sandiford > <richard.sandif...@arm.com> wrote: >> >> g:d882fe5150fbbeb4e44d007bb4964e5b22373021, posted at >> https://gcc.gnu.org/pipermail/gcc-patches/2000-July/033786.html , >> added code to treat: >> >> (set (reg:CC cc) (compare:CC (gt:M (reg:CC cc) 0) (lt:M (reg:CC cc) 0))) >> >> as a nop. This PR shows that that isn't always correct. >> The compare in the set above is between two 0/1 booleans (at least >> on STORE_FLAG_VALUE==1 targets), whereas the unknown comparison that >> produced the incoming (reg:CC cc) is unconstrained; it could be between >> arbitrary integers, or even floats. The fold is therefore replacing a >> cc that is valid for both signed and unsigned comparisons with one that >> is only known to be valid for signed comparisons. >> >> (gt (compare (gt cc 0) (lt cc 0) 0) >> >> does simplify to: >> >> (gt cc 0) >> >> but: >> >> (gtu (compare (gt cc 0) (lt cc 0) 0) >> >> does not simplify to: >> >> (gtu cc 0) >> >> The optimisation didn't come with a testcase, but it was added for >> i386's cmpstrsi, now cmpstrnsi. That probably doesn't matter as much >> as it once did, since it's now conditional on -minline-all-stringops. >> But the patch is almost 25 years old, so whatever the original >> motivation was, it seems likely that other things now rely on it. >> >> It therefore seems better to try to preserve the optimisation on rtl >> rather than get rid of it. To do that, we need to look at how the >> result of the outer compare is used. We'd therefore be looking at four >> instructions (the gt, the lt, the compare, and the use of the compare), >> but combine already allows that for 3-instruction combinations thanks >> to: >> >> /* If the source is a COMPARE, look for the use of the comparison result >> and try to simplify it unless we already have used undobuf.other_insn. >> */ >> >> When applied to boolean inputs, a comparison operator is >> effectively a boolean logical operator (AND, ANDNOT, XOR, etc.). >> simplify_logical_relational_operation already had code to simplify >> logical operators between two comparison results, but: >> >> * It only handled IOR, which doesn't cover all the cases needed here. >> The others are easily added. >> >> * It treated comparisons of integers as having an ORDERED/UNORDERED result. >> Therefore: >> >> * it would not treat "true for LT + EQ + GT" as "always true" for >> comparisons between integers, because the mask excluded the UNORDERED >> condition. >> >> * it would try to convert "true for LT + GT" into LTGT even for comparisons >> between integers. To prevent an ICE later, the code used: >> >> /* Many comparison codes are only valid for certain mode classes. */ >> if (!comparison_code_valid_for_mode (code, mode)) >> return 0; >> >> However, this used the wrong mode, since "mode" is here the integer >> result of the comparisons (and the mode of the IOR), not the mode of >> the things being compared. Thus the effect was to reject all >> floating-point-only codes, even when comparing floats. >> >> I think instead the code should detect whether the comparison is between >> integer values and remove UNORDERED from consideration if so. It then >> always produces a valid comparison (or an always true/false result), >> and so comparison_code_valid_for_mode is not needed. In particular, >> "true for LT + GT" becomes NE for comparisons between integers but >> remains LTGT for comparisons between floats. >> >> * There was a missing check for whether the comparison inputs had >> side effects. >> >> While there, it also seemed worth extending >> simplify_logical_relational_operation to unsigned comparisons, since >> that makes the testing easier. >> >> As far as that testing goes: the patch exhaustively tests all >> combinations of integer comparisons in: >> >> (cmp1 (cmp2 X Y) (cmp3 X Y)) >> >> for the 10 integer comparisons, giving 1000 fold attempts in total. >> It then tries all combinations of (X in {-1,0,1} x Y in {-1,0,1}) >> on the result of the fold, giving 9 checks per fold, or 9000 in total. >> That's probably more than is typical for self-tests, but it seems to >> complete in neglible time, even for -O0 builds. >> >> Tested on aarch64-linux-gnu & x86_64-linux-gnu. OK to install? > > OK.
Thanks! >> The patch isn't exactly a spot fix, and the bug is ancient, so I suppose >> the patch probably isn't suitable for backports. > > Maybe for GCC 14, but not without some soaking time of course. OK, I'll leave the PR open for that. JTFR, I noticed while reading the patch back that I'd fluffed the attempt to future-proof the aarch64 test. The ls and lo conditions in: >> +/* >> +** f8: >> +** ( >> +** cmp w0, w1 >> +** cset w0, hi >> +** | >> +** cmp w1, w0 >> +** cset w0, ls >> +** ) >> +** ret >> +*/ >> +int >> +f8 (unsigned int x, unsigned int y) >> +{ >> + return (x < y) < (y < x); >> +} >> + >> +/* >> +** f9: >> +** ( >> +** cmp w0, w1 >> +** cset w0, (hs|cs) >> +** | >> +** cmp w1, w0 >> +** cset w0, (lo|cc) >> +** ) >> +** ret >> +*/ >> +int >> +f9 (unsigned int x, unsigned int y) >> +{ >> + return (x < y) < (y <= x); >> +} >> -- >> 2.25.1 >> were the wrong way around. I've pushed the patch with that change. Richard