https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124766
--- Comment #1 from GCC Commits <cvs-commit at gcc dot gnu.org> --- The master branch has been updated by Jeff Law <[email protected]>: https://gcc.gnu.org/g:13040879a85435edc05bef860cc8530f51a133b5 commit r17-293-g13040879a85435edc05bef860cc8530f51a133b5 Author: Jeff Law <[email protected]> Date: Sun May 3 22:10:59 2026 -0600 [V2][RISC-V][PR rtl-optimization/124766] Simplify x + y == y into x == 0 So Richard S. noticed 3 issues in the V1 patch. Specifically it should have been using rtx_equal_p rather than just testing pointer equality. That's not a correctness issue, but could potentially allow the pattern to apply more often. Second we should be checking for !side_effects_p on the operand we're dropping. Easy to fix. Finally there was a const0_rtx use that should have been CONST0_RTX. Given how often I mention that one to others, I'm embarrassed I missed it. Bootstrapped on x86 and retested on the various embedded platforms. Bootstraps on riscv platforms, aarch64, armv7 and sh4eb are in flight. -- So this is derived from S_regmatch in spec2017, so fairly hot. long frob (unsigned short *y, long z) { long ret = (*y << 2) + z; if (ret != z) return 0; return ret; } It generates this code on riscv: lhu a5,0(a0) sh2add a5,a5,a1 sub a1,a1,a5 czero.nez a0,a5,a1 ret That's not bad, but the sh2add and sub are not actually needed. This may look familiar to a case Daniel was recently discussing, the major difference are the types of the function args which I got wrong the first time I reduced this case. czero instructions check their condition for zero/nonzero status. So we just need to know if a1 has a zero/nonzero value at the czero instruction. So working backwards: a1 = a1 - a5 // sub instruction a1 = a1 - ((a5 << 2) + a1) // substitute from sh2add a1 = a5 << 2 // a1 terms cancel out So we just need the nonzero state of a5 << 2. Now since a5 was set by the lhu instruction, the upper 48 bits are already known zero, so critically we know the upper 2 bits are zero. Meaning that we can just test a5 as set by the lhu instruction for zero/nonzero. The net is we can generate this code instead: lhu a0,0(a0) czero.nez a0,a1,a0 ret It's a small, but visible instruction count savings and likely a small performance improvement on most designs. So the trick to get there is a small simplify-rtx improvement. We just need to simplify (eq/ne (plus (x) (y)) (y)) -> (eq/ne (x) (0)) And all the right things just happen. Bootstrapped and regression tested on a variety of native platforms including x86, aarch64, riscv and tested across the various embedded targets in my tester. I'll wait for the RISC-V pre-commit CI tester to render a verdict before going forward. PR rtl-optimization/124766 gcc/ * simplify-rtx.cc (simplify_context::simplify_relational_operation_1): Simplify x + y == y constructs. gcc/testsuite/ * gcc.target/riscv/pr124766.c: New test.
