On 5/28/25 9:05 PM, Jiawei wrote:


This seems like it would be much better as a combine pattern.   In fact, I'm a bit surprised that combine didn't simplify this series of operations into a IOR.  So I'd really like to see the .combine dump with and without this hunk for the relevant testcase.

Here is the dump log, using trunk(7fca794e0199baff8f07140a950ba3374c6aa634), more details please see https://godbolt.org/z/3hfzdz3Ks

===========================================================================================

~/rv/bin/riscv64-unknown-linux-gnu-g++ -march=rv64gc_zba_zbb_zbs -O2  -S -fdump-rtl-all redundant-bitmap-2.C

before combine in .ext_dce
Thanks!   That was helpful.

I was looking for the full .combine dump -- the full dump includes information about patterns that were tried and failed. That often will point the way to a better solution.

In particular in the .combine dump we have this nugget:

Trying 15 -> 18:
   15: r151:DI=0xfffffffffffffffe<-<r156:DI#0&r149:DI
      REG_DEAD r149:DI
   18: r154:DI=0x1<<r156:DI#0^r151:DI
      REG_DEAD r156:DI
      REG_DEAD r151:DI
Failed to match this instruction:
(set (reg:DI 154)
    (xor:DI (and:DI (rotate:DI (const_int -2 [0xfffffffffffffffe])
                (subreg:QI (reg:DI 156 [ b ]) 0))
            (reg:DI 149 [ *a_10(D) ]))
        (ashift:DI (const_int 1 [0x1])
            (subreg:QI (reg:DI 156 [ b ]) 0))))

I think combine really should have simplified that before querying the target. That really should have been simpified to a bit insertion idiom or perhaps an simpler ior.

More generally, the question we should first ask is whether or not the source should have simplified independent of the target. I think the answer is yes in this case, which means we should try to fix that problem first since it'll improve every target rather than just RISC-V.

When we do find ourselves needing to write new target patterns, a define_insn will generally be preferable to a define_peephole.

The define_insn will match when there's a data dependency within a basic block. A define_peephole requires the insns to be consecutive in the IL. Thus the define_insn will tend to match more often and is those preferable to a define_peephole.

Anyway, to recap, I think the better solution is to improve simplify_binary_operation or one of its children or perhaps simplify_compound_operation its related functions.

jeff


Reply via email to