https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118012

--- Comment #4 from Georg-Johann Lay <gjl at gcc dot gnu.org> ---
It's even crazier when the device doesn't have MUL instruction.  In that case,
a libgcc function is used.  With -Os the call consumes less code than the
bit-extract + extend + neg + and, so a library call is invoked:

$ avr-gcc -S -Os gcc.dg/tree-ssa/branchless-cond.c -dp

f1:
/* prologue: function */
        mov r18,r22      ;  32  [c=4 l=2]  *movhi/0
        mov r19,r23
        mov r22,r20      ;  33  [c=4 l=1]  movqi_insn/0
        mov r23,r21      ;  34  [c=4 l=1]  movqi_insn/0
        andi r24,1       ;  35  [c=8 l=2]  *andhi3/2
        clr r25 
        rcall __mulhi3   ;  36  [c=4 l=1]  *mulhi3_call
        eor r24,r18      ;  40  [c=4 l=1]  *xorqi3
        eor r25,r19      ;  41  [c=4 l=1]  *xorqi3
/* epilogue start */
        ret              ;  44  [c=0 l=1]  return

The move to accommodate for the ABI eat up all size gains, and the call
introduces more register pressure / clobbers.

Reply via email to