Eric Botcazou <ebotca...@adacore.com> writes: >> For code: >> >> unsigned char foo (unsigned char c) >> { >> return (c >= '0') && (c <= '9'); >> } >> >> we end up generating: >> >> sub r0, r0, #48 >> uxtb r0, r0 >> cmp r0, #9 >> movhi r0, #0 >> movls r0, #1 >> bx lr >> >> The extra uxtb (extend) is causing the test to fail. We started generating >> the extra extend when a particular optimisation went in with (revision >> r191928). > > That's PR rtl-optimization/58295. We have pessimizations on the SPARC too. > >> The comment in simplify-rtx.c says it transforms >> (truncate:SI (op:DI (x:DI) (y:DI))) >> >> into >> >> (op:SI (truncate:SI (x:DI)) (truncate:SI (x:DI))) >> >> but from what I can see it also transforms >> >> (truncate:QI (op:SI (x:SI) (y:SI))) >> >> into >> >> (op:QI (truncate:QI (x:SI)) (truncate:QI (x:SI))) >> >> From the combine dump I see that the sub and extend operations come from >> the RTL: >> >> (insn 6 3 7 2 (set (reg:SI 116) >> (plus:SI (reg:SI 0 r0 [ c ]) >> (const_int -48 [0xffffffffffffffd0]))) >> >> (insn 7 6 8 2 (set (reg:SI 117) >> (zero_extend:SI (subreg:QI (reg:SI 116) 0))) >> >> >> If I add a QImode compare pattern to the arm backend it gets matched and the >> extra extend goes away, but it seems to me that that's not the correct >> solution. Ideally, if a QImode operation is performed as an SImode >> operation on a target (like the sub and compare operations on arm) then we >> should not be doing this optimisation? > > Yes, that's my opinion as well, but Richard (CCed) seemed to disagree. In > any > case, that's certainly not a simplification. > >> My question is, how does one express that information in the simplify-rtx.c >> code? It seems that the PR that optimisation fixed (54457) only cared about >> DI -> SI truncations, so perhaps we should disable it for conversions >> between other modes where it's not beneficial altogether? > > I can think of 3 possible solutions: > - WORD_REGISTER_OPERATIONS, > - promote_mode, > - optabs. > > The 3rd solution seems to be the most straightforward, but this would be the > first time that we test optabs from simplify-rtx.c.
I don't think this is the way to go. AIUI the problem here isn't that RISC architectures don't have QImode adds as such. If we were going to combine insn 6 and insn 7 _in isolation_ then we would have either: (zero_extend:SI (subreg:QI (plus:SI (subreg:QI (reg:SI R)) (const_int X)))) before the patch or: (zero_extend:SI (plus:QI (reg:QI R) (const_int X))) after the patch. And IMO the second form is a simplification over the first. (Despite being a RISC port, MIPS Octeon does have a pattern for zero-extending byte addition, so this isn't entirely academic.) It sounds like the problem is instead that we're relying on combine to remove redundant zero extensions and that combine's no longer able to do that when folding insns 6 and 7 into the comparison. Is that right? I think it would better to use a dedicated global pass to remove redundant extensions instead. IIRC there were various attempts to do that. IMO combine should just be about instruction selection. The patch still seems good to me from that POV. Thanks, Richard