https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52898
--- Comment #10 from Oleg Endo <olegendo at gcc dot gnu.org> ---
For a case like
int test3 (long long a)
{
return a == 40;
}
what happens on SH2+ during RTL expansion:
cstoredi4
-> sh_emit_compare_and_set
-> sh_emit_scc_to_t
-> force operands to regs
-> emit cmpeqdi_t insn
Then combine tries e.g.
Trying 6 -> 7:
Failed to match this instruction:
(set (reg:SI 147 t)
(eq:SI (reg:DI 4 r4 [ a ])
(const_int 40 [0x28])))
and in split1 this pattern
(define_split
[(set (reg:SI T_REG)
(eq:SI (match_operand:DI 0 "arith_reg_operand" "")
(match_operand:DI 1 "arith_reg_or_0_operand" "")))]
splits everything up and the resulting code becomes:
mov #0,r3
cmp/eq r3,r5
bt.s .L5
mov #40,r2
rts
movt r0
.align 1
.L5:
cmp/eq r2,r4
rts
movt r0
if the split pattern is disabled, the cmpeqdi_t pattern survives until the end:
mov #40,r2
mov #0,r3
cmp/eq r3,r5
bf 0f
cmp/eq r2,r4
0:
rts
movt r0
which is obviously less code, but has one more branch in the execution path.
This pattern probably should be used when optimizing for size or when
zero-displacement branches are fast.
On SH1 the cstoredi4 pattern is disabled because it might result in e.g.
cmpgtdi_t which needs branches with delay slots. Because of that the middle
end expands some target independent code like:
mov #40,r1
xor r1,r4
or r4,r5
tst r5,r5
rts
movt r0
which is actually a good branch-less alternative.