On 10/14/22 09:56, Vineet Gupta wrote:
Hi,

When analyzing coremark build for RISC-V, noticed redundant constants not being eliminated. While this is a recurrent issue with RV, this specific instance is not unique to RV as I can trigger similar output on aarch64 with -fno-if-conversion, hence something which could be addressed in common passes.

-O3 -march=rv64gc_zba_zbb_zbc_zbs

crcu8:
    xor    a3,a0,a1
    andi    a3,a3,1
    srli    a4,a0,1
    srli    a5,a1,1
    beq    a3,zero,.L2

    li    a3,-24576    # 0xFFFF_A000
    addi    a3,a3,1        # 0xFFFF_A001
    xor    a5,a5,a3
    zext.h    a5,a5

.L2:
    xor    a4,a4,a5
    andi    a4,a4,1
    srli    a3,a0,2
    srli    a5,a5,1
    beq    a4,zero,.L3

    li    a4,-24576    # 0xFFFF_A000
    addi    a4,a4,1        # 0xFFFF_A001
    xor    a5,a5,a4
    zext.h    a5,a5

.L3:
    xor    a3,a3,a5
    andi    a3,a3,1
    srli    a4,a0,3
    srli    a5,a5,1
    beq    a3,zero,.L4

    li    a3,-24576    # 0xFFFF_A000
    addi    a3,a3,1        # 0xFFFF_A001
...
...

I see that with small tests cse1 is able to substitute redundant constant reg with equivalent old reg.

I find it easier to reason about this stuff with a graphical CFG, so a bit of ascii art...


          2
        /    \
     3 ---> 4
             /    \
         5 --->  6


Where BB4 corresponds to .L2 and BB6 corresponds to .L3. Evaluation of the constants occurs in BB3 and BB5.

CSE isn't going to catch this.  The best way to think about CSE's capabilities is that it can work on extended basic blocks.     An extended basic block can have jumps out, but not jumps in.  There are 3 EBBs in this code.  (1,2), (4,5) and 6.    So BB4 is in a different EBB than BB3.  So the evaluation in BB3 can't be used by CSE in the EBB containing BB4, BB5.


PRE/GCSE is better suited for this scenario, but it has a critical constraint.  In particular our PRE formulation is never allowed to put an evaluation of an expression on a path that didn't have one before.  So while there clearly a redundancy on the path 2->3->4->5 (BB3 and BB5), there is nowhere we could put an evaluation that would reduce the number of evaluation on that path without introducing an evaluation on paths that didn't have one.  So consider 2->4->6.  On that path there are zero evaluations.  So we can't place an eval in BB2 because that will cause evaluations on 2->4->6 which didn't have any evaluations.

There isn't a great place in GCC to handle this right now.  If the constraints were relaxed in PRE, then we'd have a chance, but getting the cost model right is going to be tough.


Jeff


Reply via email to