http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45937
Summary: unnecessary push/pop to reserve stack memory Product: gcc Version: 4.6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target AssignedTo: unassig...@gcc.gnu.org ReportedBy: car...@google.com CC: car...@google.com Host: i686-linux Target: arm-eabi Build: i686-linux Created attachment 21995 --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=21995 test case Compile the attached source code with options -march=armv7-a -mthumb -Os, gcc generates: tt: push {r4, r5, r6, r7, lr} sub sp, sp, #20 mov r5, r2 ldr r4, [sp, #40] cbz r4, .L1 movs r2, #20 muls r2, r1, r2 adds r6, r3, r2 ldr r3, [r3, r2] cbz r3, .L1 ldr lr, [r6, #8] ldr r7, [r6, #12] ldr r3, [r6, #16] ldr r2, [r6, #4] ldr r6, .L5 str lr, [sp, #0] cmp r3, #0 it eq moveq r3, r6 str r7, [sp, #4] str r3, [sp, #8] mov r3, r5 blx r4 .L1: add sp, sp, #20 pop {r4, r5, r6, r7, pc} Notice that this function uses only 12 bytes of stack memory to pass parameters, but it allocates 20 bytes and the other 8 bytes is never used. So the function prologue and epilogue can be rewritten as following and reduce 2 instructions. tt: push {r1, r2, r3, r4, r5, r6, r7, lr} ... pop {r1, r2, r3, r4, r5, r6, r7, pc} The root cause of this problem is the memory is separately allocated and aligned for out going arguments and the callee saved registers. In function expand_call() 12 bytes is needed and 16 bytes is allocated to align to 8 bytes. In function arm_get_frame_offsets() 20 bytes is needed and 24 bytes is allocated to save registers. So this function needs 40 bytes of stack, exceeds the capability of push/pop, extra sub/add instructions are needed to adjust sp. Actually the function uses only 32 bytes of stack and no data element is 8 bytes aligned, simple push/pop should be enough.