[Bug target/52898] SH Target: Inefficient DImode comparisons

olegendo at gcc dot gnu.org Sun, 01 May 2016 05:26:30 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52898


--- Comment #10 from Oleg Endo <olegendo at gcc dot gnu.org> ---
For a case like 

int test3 (long long a)
{
  return a == 40;
}

what happens on SH2+ during RTL expansion:

cstoredi4
 -> sh_emit_compare_and_set
     -> sh_emit_scc_to_t
          -> force operands to regs
          -> emit cmpeqdi_t insn

Then combine tries e.g.

Trying 6 -> 7:
Failed to match this instruction:
(set (reg:SI 147 t)
    (eq:SI (reg:DI 4 r4 [ a ])
        (const_int 40 [0x28])))

and in split1 this pattern

(define_split
  [(set (reg:SI T_REG)
        (eq:SI (match_operand:DI 0 "arith_reg_operand" "")
               (match_operand:DI 1 "arith_reg_or_0_operand" "")))]

splits everything up and the resulting code becomes:

        mov     #0,r3
        cmp/eq  r3,r5
        bt.s    .L5
        mov     #40,r2
        rts
        movt    r0
        .align 1
.L5:
        cmp/eq  r2,r4
        rts
        movt    r0

if the split pattern is disabled, the cmpeqdi_t pattern survives until the end:

        mov     #40,r2
        mov     #0,r3
        cmp/eq  r3,r5
        bf      0f
        cmp/eq  r2,r4
0:
        rts
        movt    r0

which is obviously less code, but has one more branch in the execution path. 
This pattern probably should be used when optimizing for size or when
zero-displacement branches are fast.


On SH1 the cstoredi4 pattern is disabled because it might result in e.g.
cmpgtdi_t which needs branches with delay slots.  Because of that the middle
end expands some target independent code like:

        mov     #40,r1
        xor     r1,r4
        or      r4,r5
        tst     r5,r5
        rts
        movt    r0

which is actually a good branch-less alternative.

[Bug target/52898] SH Target: Inefficient DImode comparisons

Reply via email to