https://gcc.gnu.org/bugzilla/show_bug.cgi?id=60145

Georg-Johann Lay <gjl at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           See Also|                            |https://gcc.gnu.org/bugzill
                   |                            |a/show_bug.cgi?id=84211

--- Comment #5 from Georg-Johann Lay <gjl at gcc dot gnu.org> ---
(In reply to Matthijs Kooijman from comment #0)
> uint32_t join4 (uint8_t a, uint8_t b, uint8_t c, uint8_t d)
> {
>     return ((uint32_t)a << 24) | ((uint32_t)b << 16) | ((uint32_t)c <<
> 8) | d;
> }

With PR84211 in place, this test case improves a bit in v15:

Previous (without PR84211 resp -O3 -mno-fuse-move):

join4:
        push r14                 ;  50  [c=4 l=1]  pushqi1/0
        push r15                 ;  51  [c=4 l=1]  pushqi1/0
        push r16                 ;  52  [c=4 l=1]  pushqi1/0
        push r17                 ;  53  [c=4 l=1]  pushqi1/0
/* prologue: function */
        mov r25,r20      ;  34  [c=4 l=1]  movqi_insn/0
        ldi r19,0                ;  40  [c=4 l=1]  movqi_insn/0
        ldi r20,0                ;  41  [c=4 l=2]  *movhi/1
        ldi r21,0       
        or r19,r25               ;  43  [c=4 l=1]  *iorqi3/0
        or r20,r22               ;  45  [c=4 l=1]  *iorqi3/0
        movw r14,r18     ;  46  [c=4 l=2]  *movsi/0
        movw r16,r20
        or r17,r24               ;  48  [c=4 l=1]  *iorqi3/0
        movw r24,r16     ;  49  [c=4 l=2]  *movsi/0
        movw r22,r14
/* epilogue start */
        pop r17          ;  56  [c=4 l=1]  popqi
        pop r16          ;  57  [c=4 l=1]  popqi
        pop r15          ;  58  [c=4 l=1]  popqi
        pop r14          ;  59  [c=4 l=1]  popqi
        ret              ;  60  [c=0 l=1]  return_from_epilogue

With PR84211 / -mfuse-move:

join4:
        push r14                 ;  50  [c=4 l=1]  pushqi1/0
        push r15                 ;  51  [c=4 l=1]  pushqi1/0
        push r16                 ;  52  [c=4 l=1]  pushqi1/0
        push r17                 ;  53  [c=4 l=1]  pushqi1/0
/* prologue: function */
        mov r19,r20      ;  66  [c=4 l=1]  movqi_insn/0
        mov r16,r22      ;  70  [c=4 l=1]  movqi_insn/0
        mov r17,r24      ;  72  [c=4 l=1]  movqi_insn/0
        movw r22,r18     ;  78  [c=4 l=1]  *movhi/0
        movw r24,r16     ;  79  [c=4 l=1]  *movhi/0
/* epilogue start */
        pop r17          ;  56  [c=4 l=1]  popqi
        pop r16          ;  57  [c=4 l=1]  popqi
        pop r15          ;  58  [c=4 l=1]  popqi
        pop r14          ;  59  [c=4 l=1]  popqi
        ret              ;  60  [c=0 l=1]  return_from_epilogue

So the body shinks from 11 instructions to 5.

Unfortunately, IRA does only a sub-optimal job at register allocation.  And
PR84211 runs after reg-alloc, so the register pressure / prologue-epilogue size
won't go down :-(

Reply via email to