https://gcc.gnu.org/bugzilla/show_bug.cgi?id=124766

--- Comment #1 from GCC Commits <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Jeff Law <[email protected]>:

https://gcc.gnu.org/g:13040879a85435edc05bef860cc8530f51a133b5

commit r17-293-g13040879a85435edc05bef860cc8530f51a133b5
Author: Jeff Law <[email protected]>
Date:   Sun May 3 22:10:59 2026 -0600

    [V2][RISC-V][PR rtl-optimization/124766] Simplify x + y == y into x == 0

    So Richard S. noticed 3 issues in the V1 patch.  Specifically it should
have
    been using rtx_equal_p rather than just testing pointer equality.  That's
not a
    correctness issue, but could potentially allow the pattern to apply more
often.

    Second we should be checking for !side_effects_p on the operand we're
dropping.
    Easy to fix.

    Finally there was a const0_rtx use that should have been CONST0_RTX.  Given
how
    often I mention that one to others, I'm embarrassed I missed it.

    Bootstrapped on x86 and retested on the various embedded platforms. 
Bootstraps
    on riscv platforms, aarch64, armv7 and sh4eb are in flight.

    --

    So this is derived from S_regmatch in spec2017, so fairly hot.

    long
    frob (unsigned short *y, long z)
    {
      long ret = (*y << 2) + z;
      if (ret != z)
        return 0;
      return ret;
    }

    It generates this code on riscv:

            lhu     a5,0(a0)
            sh2add  a5,a5,a1
            sub     a1,a1,a5
            czero.nez       a0,a5,a1
            ret

    That's not bad, but the sh2add and sub are not actually needed. This may
look
    familiar to a case Daniel was recently discussing, the major difference are
the
    types of the function args which I got wrong the first time I reduced this
    case.

    czero instructions check their condition for zero/nonzero status. So we
just
    need to know if a1 has a zero/nonzero value at the czero instruction.  So
    working backwards:

    a1 = a1 - a5                // sub instruction
    a1 = a1 - ((a5 << 2) + a1)  // substitute from sh2add
    a1 = a5 << 2                // a1 terms cancel out

    So we just need the nonzero state of a5 << 2.  Now since a5 was set by the
lhu
    instruction, the upper 48 bits are already known zero, so critically we
know
    the upper 2 bits are zero. Meaning that we can just test a5 as set by the
lhu
    instruction for zero/nonzero.  The net is we can generate this code
instead:

            lhu     a0,0(a0)
            czero.nez       a0,a1,a0
            ret

    It's a small, but visible instruction count savings and likely a small
    performance improvement on most designs.

    So the trick to get there is a small simplify-rtx improvement. We just need
to
    simplify
    (eq/ne (plus (x) (y)) (y)) ->  (eq/ne (x) (0))

    And all the right things just happen.  Bootstrapped and regression tested
on a
    variety of native platforms including x86, aarch64, riscv and tested across
the
    various embedded targets in my tester.  I'll wait for the RISC-V pre-commit
CI
    tester to render a verdict before going forward.

            PR rtl-optimization/124766

    gcc/

            * simplify-rtx.cc
(simplify_context::simplify_relational_operation_1):
            Simplify x + y == y constructs.

    gcc/testsuite/

            * gcc.target/riscv/pr124766.c: New test.
  • [Bug rtl-optimization/124766] S... cvs-commit at gcc dot gnu.org via Gcc-bugs

Reply via email to