https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118089

            Bug ID: 118089
           Summary: [12/13/14 regression] arm thumb2 return sequence is
                    suboptimal, especially at -O2
           Product: gcc
           Version: 4.8.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: normal
          Priority: P3
         Component: target
          Assignee: unassigned at gcc dot gnu.org
          Reporter: rearnsha at gcc dot gnu.org
  Target Milestone: ---
            Target: arm

This is a regression that originally appeared in gcc-4.8.  

For
int f(void);
int *x;
int g(void)
{
  volatile int y = f();
  return *x = y;
}

When compiled at -Os we would previously generate:

        push    {r0, r1, r2, lr}        
        bl      f   
        ldr     r3, .L2
        str     r0, [sp, #4]
        ldr     r0, [sp, #4]
        ldr     r3, [r3, #0]
        str     r0, [r3, #0]
        pop     {r1, r2, r3, pc}

But we now get
        push    {r0, r1, r2, lr}       
        bl      f   
        ldr     r3, .L2
        str     r0, [sp, #4]
        ldr     r0, [sp, #4]
        ldr     r3, [r3]  
        str     r0, [r3]  
        add     sp, sp, #12      <-- not merged into return insn 
        ldr     pc, [sp], #4     <-- 4, not 2-byte instruction

I suspect this is a consequence of moving to an rtl-based prologue. 

When optimizing for speed we want to keep the 'add sp', but we should still use
 'pop {pc}' to return.

compile options:
-Os -march=armv7-m -mthumb

Reply via email to