The following block of code appears to produce an un-needed memcpy on both Intel and PowerPC platforms. There is no aliasing or side-effect that I can think of that could possibly force such copying to occur -- the problem seems to be that gcc is not aware of the lifetime of large structures kept on stack.
The full source code: class TV { private: float truth; float confidence; int stuff[444]; public: TV(void); float getT(void); }; extern TV my_tv_maker(float tr); extern void other(TV *); float my_subr(float tr) { TV tv; other (&tv); // force constructor TV::TV to run first tv = my_tv_maker(434.23); // over-write previous tv. return tv.getT(); } PowerPC assembly, created with gcc -S -O2 -c .L._Z7my_subrf: .LFB2: mflr 0 .LCFI0: std 28,-32(1) .LCFI1: std 29,-24(1) .LCFI2: std 0,16(1) .LCFI3: stdu 1,-3728(1) make room for two instances of TV on stack .LCFI4: addi 29,1,112 one instance of TV addi 28,1,1904 second instance of TV mr 3,29 bl _ZN2TVC1Ev call constructor on instance 1 nop mr 3,29 bl _Z5otherP2TV call other() on instance 1 nop lfs 1,....@toc(2) mr 3,28 bl _Z11my_tv_makerf call my_tv_make on instance 2 nop mr 4,28 mr 3,29 li 5,1784 bl memcpy copy instance 2 over to 1! waste of CPU! nop mr 3,29 bl _ZN2TV4getTEv call method on instance 1 nop addi 1,1,3728 ld 0,16(1) ld 28,-32(1) ld 29,-24(1) mtlr 0 blr The missed optimizations are: -- two copies of the instance are not needed; the copy is not needed either. For large structures, this can be a significant time-waster. Exactly the same problem shows up in Intel as well: _Z7my_subrf: .LFB2: pushl %ebp .LCFI0: movl %esp, %ebp .LCFI1: subl $3608, %esp .LCFI2: movl %ebx, -8(%ebp) .LCFI3: leal -1792(%ebp), %ebx instance 1 of TV movl %esi, -4(%ebp) .LCFI4: leal -3592(%ebp), %esi instance 2 of TV movl %ebx, (%esp) call _ZN2TVC1Ev call constructor on instance 1 movl %ebx, (%esp) call _Z5otherP2TV call other() on instance 1 movl %esi, (%esp) movl $0x43d91d71, 4(%esp) call _Z11my_tv_makerf call my_tv_maker on instance 2 subl $4, %esp movl %esi, 4(%esp) movl %ebx, (%esp) movl $1784, 8(%esp) call memcpy copy instance 2 to instance 1 movl %ebx, (%esp) call _ZN2TV4getTEv call getT() on instance 1 movl -8(%ebp), %ebx movl -4(%ebp), %esi movl %ebp, %esp popl %ebp ret -- Summary: missed optimization: un-needed copy of structure. Product: gcc Version: 4.1.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: linasvepstas at gmail dot com GCC build triplet: powerpc64-unknown-linux-gnu GCC host triplet: powerpc64-unknown-linux-gnu GCC target triplet: powerpc64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39081