https://gcc.gnu.org/bugzilla/show_bug.cgi?id=45215
Andrew Pinski <pinskia at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Component|tree-optimization |rtl-optimization --- Comment #2 from Andrew Pinski <pinskia at gcc dot gnu.org> --- Note on the trunk I have change the code slightly to get a cmove done. With the cmove we could simplify the following RTL: Trying 27, 28 -> 29: 27: {flags:CCZ=cmp(r86:SI&0x100,0);r82:SI=r86:SI&0x100;} REG_DEAD r86:SI 28: r85:SI=0xffffffffffffffe6 29: r82:SI={(flags:CCZ==0)?r82:SI:r85:SI} REG_DEAD r85:SI REG_DEAD flags:CCZ Failed to match this instruction: (set (reg/v:SI 82 [ tt ]) (if_then_else:SI (eq (zero_extract:SI (reg:SI 86) (const_int 1 [0x1]) (const_int 8 [0x8])) (const_int 0 [0])) (and:SI (reg:SI 86) (const_int 256 [0x100])) (const_int -26 [0xffffffffffffffe6]))) But that would be a 3->3 combine which I don't know if combine does. I know it does 3->1 and 3->2 andl $256, %edi movl $-26, %eax cmovne %eax, %edi I also don't know what the cost of doing cmov vs the shifts here though. I know for aarch64, it is worse but that should have been modeled already.