------- Comment #4 from ramana at gcc dot gnu dot org 2009-07-02 09:39 ------- (In reply to comment #3) > Is there a C test case? Can you add objdump of the gcc-generated asm and the > fixed asm to show the impact on code size? (/me is surprised that 3*"add > r0,sp,4" is smaller than 1**"add r0,sp,4"+3*"mov r0,r4"... Thumb is amazing > :-)
The length of add r0,sp,4 and mov r0,r4 is the same for Thumb1 (16 bits). I suppose the ideal code generated would be something like this modulo errors with stack alignments in the prologue and the epilogue. We also don't need r4 in that case :) . So we can save a load, a store as well as 1 instruction over all. Smaller and faster by 1 instruction and reduced register usage. push {lr} sub sp, sp, #12 (8 byte stack alignment ) add r0, sp, 4 // add r0, sp, 4 bl _ZN1XC1Ev add r0, sp, #4 // add r0, sp, 4 bl _Z3barP1X add r0, sp, #4 // add r0, sp, 4 bl _ZN1XD1Ev add sp, sp, #12 (8 byte stack alignment ) @ sp needed for prologue pop {pc} -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=40615