https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118089
Bug ID: 118089 Summary: [12/13/14 regression] arm thumb2 return sequence is suboptimal, especially at -O2 Product: gcc Version: 4.8.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: rearnsha at gcc dot gnu.org Target Milestone: --- Target: arm This is a regression that originally appeared in gcc-4.8. For int f(void); int *x; int g(void) { volatile int y = f(); return *x = y; } When compiled at -Os we would previously generate: push {r0, r1, r2, lr} bl f ldr r3, .L2 str r0, [sp, #4] ldr r0, [sp, #4] ldr r3, [r3, #0] str r0, [r3, #0] pop {r1, r2, r3, pc} But we now get push {r0, r1, r2, lr} bl f ldr r3, .L2 str r0, [sp, #4] ldr r0, [sp, #4] ldr r3, [r3] str r0, [r3] add sp, sp, #12 <-- not merged into return insn ldr pc, [sp], #4 <-- 4, not 2-byte instruction I suspect this is a consequence of moving to an rtl-based prologue. When optimizing for speed we want to keep the 'add sp', but we should still use 'pop {pc}' to return. compile options: -Os -march=armv7-m -mthumb