http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50771
Bug #: 50771 Summary: redundant argument passing code (x64) + inefficient use of stack space Classification: Unclassified Product: gcc Version: 4.6.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassig...@gcc.gnu.org ReportedBy: crusader.m...@gmail.com I was checking how by-value argument passing looks in comparison with by-reference and found strange things (looks like a compiler bug to me). This is tested on multiple versions of GCC (up to and including 4.6.1) and every version produced similar results. Here is test code: #include <cstdio> using namespace std; struct Foo { size_t m1, m2; Foo(size_t a1, size_t a2) : m1(a1), m2(a2) {} // ~Foo() {} }; void foo1(size_t, Foo, size_t) __attribute__((noinline)); void foo1(size_t, Foo, size_t) { asm("nop"); } void foo2(size_t, Foo const&, size_t) __attribute__((noinline)); void foo2(size_t, Foo const&, size_t) { asm("nop"); } int main() { foo1(1, Foo(2, 3), 4); foo2(5, Foo(6, 7), 8); return 16; } Now if you compile this code ('gcc -O3 -S' on x64 platform) with ~Foo() commented out, you get: main: .LFB17: subq $32, %rsp .LCFI0: movl $4, %ecx movl $2, %esi movl $3, %edx movl $1, %edi movq $2, (%rsp) # why? movq $3, 8(%rsp) # why? call _Z4foo1m3Foom leaq 16(%rsp), %rsi movl $8, %edx movl $5, %edi movq $6, 16(%rsp) movq $7, 24(%rsp) call _Z4foo2mRK3Foom movl $16, %eax addq $32, %rsp .LCFI1: ret there are two unnecessary commands marked "why?" -- they look like a leftover code that optimizer forgot to take away. Now if we uncomment ~Foo() asm code will look like: main: .LFB20: subq $32, %rsp .LCFI0: movl $4, %edx movl $1, %edi movq %rsp, %rsi movq $2, (%rsp) # <-- here movq $3, 8(%rsp) # <-- here call _Z4foo1m3Foom leaq 16(%rsp), %rsi movl $8, %edx movl $5, %edi movq $6, 16(%rsp) movq $7, 24(%rsp) call _Z4foo2mRK3Foom movl $16, %eax addq $32, %rsp .LCFI1: ret Here same two ops are present and are actually required. It also came as a shock to me to discover that trivial implicit destructor means different thing for compiler than trivial explicit one. I would really like to know why if I specify empty dtor -- my value ends up being passed (effectively) by reference (I would expect this to be slower). I recall discovering in GCC 4.0.3 similar problem when not declaring destructor meant 'no RVO for you', even if dtor is empty. Related notes: - it is clear that stack is not used efficiently -- arguments for foo2 call could be packed into the same location on the stack as arguments for foo1 call - for some reason unused function arguments were not optimized away (-O3 should do this by default since GCC 4.5, afaik). Looks like a problem - surprisingly, if Foo structure contains only one member variable (just comment out m2) asm code looks better (no redundant ops, but still passing by reference if ~Foo is defined, unused args are not removed): 1 member, +dtor: main: .LFB20: subq $32, %rsp .LCFI0: movl $4, %edx movl $1, %edi movq %rsp, %rsi # why by-reference? movq $2, (%rsp) call _Z4foo1m3Foom leaq 16(%rsp), %rsi movl $8, %edx movl $5, %edi movq $6, 16(%rsp) call _Z4foo2mRK3Foom movl $16, %eax addq $32, %rsp .LCFI1: ret 1 member, -dtor: main: .LFB17: subq $16, %rsp .LCFI0: movl $4, %edx movl $2, %esi movl $1, %edi # exactly how it should be call _Z4foo1m3Foom movq %rsp, %rsi movl $8, %edx movl $5, %edi movq $6, (%rsp) call _Z4foo2mRK3Foom movl $16, %eax addq $16, %rsp .LCFI1: ret