https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118012
--- Comment #4 from Georg-Johann Lay <gjl at gcc dot gnu.org> --- It's even crazier when the device doesn't have MUL instruction. In that case, a libgcc function is used. With -Os the call consumes less code than the bit-extract + extend + neg + and, so a library call is invoked: $ avr-gcc -S -Os gcc.dg/tree-ssa/branchless-cond.c -dp f1: /* prologue: function */ mov r18,r22 ; 32 [c=4 l=2] *movhi/0 mov r19,r23 mov r22,r20 ; 33 [c=4 l=1] movqi_insn/0 mov r23,r21 ; 34 [c=4 l=1] movqi_insn/0 andi r24,1 ; 35 [c=8 l=2] *andhi3/2 clr r25 rcall __mulhi3 ; 36 [c=4 l=1] *mulhi3_call eor r24,r18 ; 40 [c=4 l=1] *xorqi3 eor r25,r19 ; 41 [c=4 l=1] *xorqi3 /* epilogue start */ ret ; 44 [c=0 l=1] return The move to accommodate for the ABI eat up all size gains, and the call introduces more register pressure / clobbers.