Hi all,

I'm looking at a case on aarch64 that's not if-converted to use conditional 
moves:


typedef unsigned char uint8_t;
typedef unsigned int uint16_t;

uint8_t foo(const uint8_t byte, const uint16_t generator)
{
  if (byte & 0x80) {
    return (byte << 1) ^ (generator & 0xff);
  } else {
    return byte << 1;
  }
}

For aarch64 we fail to if-convert and generate:
foo:
        uxtb    w2, w0
        lsl     w3, w2, 1
        uxtb    w0, w3
        tbnz    x2, 7, .L5
        ret
        .p2align 3
.L5:
        eor     w0, w3, w1
        uxtb    w0, w0
        ret


whereas on x86 we if convert successfully and use a conditional move/select:
        leal    (%rdi,%rdi), %eax
        xorl    %eax, %esi
        testb   %dil, %dil
        cmovs   %esi, %eax
        ret



After fixing some of the branch costs in aarch64 and a bogus cost calculation 
in cheap_bb_rtx_cost_p
I'm stuck on noce_process_if_block (in ifcvt.c) and what I think is a 
restriction that the THEN-block contents have to be only a single set insn. 
This fails on aarch64 because we get an extra zero_extend.

In particular, the following check in noce_process_if_block triggers:
  insn_a = first_active_insn (then_bb);
  if (! insn_a
      || insn_a != last_active_insn (then_bb, FALSE)
      || (set_a = single_set (insn_a)) == NULL_RTX)
    return FALSE;

Is there any particular reason why the code shouldn't be able to handle 
arbitrarily large contents
in then_bb (within a sane limit)?

Thanks,
Kyrill

Reply via email to