Consider this simple class: class TV { private: float truth; float confidence; public: TV(float, float); float getT(void); };
extern TV my_tv_maker(float tr); float my_subr(float tr) { TV tv = my_tv_maker(434.23); return tv.getT(); } On powerPC, when compiled with gcc -O2 -S -c generates the annotated assembly, below. Noticable is a copy of the entire structure, which is 8 bytes long, to and from the stack, which is completely un-needed. When the structure is *larger*, then no copy is performed, *even when no optimization is done (i.e. even when -O2 is not specified.)*. So basically, large structures work great; but small structures cause some extra, un-neccesary cycles to be wasted. Problem is still there for -O3. .L._Z7my_subrf: .LFB3: mflr 0 r0 = link reg value .LCFI4: lfs 1,....@toc(2) fpr1 = 434.23 std 0,16(1) save link reg value .LCFI5: stdu 1,-144(1) incr stack ptr .LCFI6: addi 3,1,128 alloc TV on stack. Note sizeof(TV) is 8 bytes. bl _Z11my_tv_makerf call ld 0,128(1) r0 = 8 bytes = copy of entire instance of class TV addi 3,1,112 r3 = ptr to new location std 0,112(1) store 8-byte copy of the instance to new location on stack ... this is really not needed. The old copy, at offset 128, could have been used .. there was no point to making this copy. bl _ZN2TV4getTEv call nop addi 1,1,144 decr stack ld 0,16(1) get link reg val mtlr 0 move to link reg blr branch to link reg. gcc -v Using built-in specs. Target: powerpc64-unknown-linux-gnu Configured with: /var/tmp/portage/sys-devel/gcc-4.1.2/work/gcc-4.1.2/configure --prefix=/usr --bindir=/usr/powerpc64-unknown-linux-gnu/gcc-bin/4.1.2 --includedir=/usr/lib/gcc/powerpc64-unknown-linux-gnu/4.1.2/include --datadir=/usr/share/gcc-data/powerpc64-unknown-linux-gnu/4.1.2 --mandir=/usr/share/gcc-data/powerpc64-unknown-linux-gnu/4.1.2/man --infodir=/usr/share/gcc-data/powerpc64-unknown-linux-gnu/4.1.2/info --with-gxx-include-dir=/usr/lib/gcc/powerpc64-unknown-linux-gnu/4.1.2/include/g++-v4 --host=powerpc64-unknown-linux-gnu --build=powerpc64-unknown-linux-gnu --enable-altivec --enable-nls --without-included-gettext --with-system-zlib --disable-checking --disable-werror --enable-secureplt --disable-libunwind-exceptions --disable-multilib --enable-libmudflap --disable-libssp --enable-objc-gc --enable-languages=c,c++,java,objc,obj-c++,fortran --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu Thread model: posix gcc version 4.1.2 (Gentoo 4.1.2 p1.0.2) Bug is reproducible on: Welcome to hazelnut.osuosl.org Server Address : 140.211.167.137 Hardware : 8 x POWER5+ (gs), CHRP IBM,9133-55A, 7794 MB RAM Operating System : Gentoo Linux (default-linux/ppc/ppc64/2007.0/64bit-userland/power5) Support : supp...@osuosl.org Website : http://powerdev.osuosl.org Mailing List : http://lists.osuosl.org/mailman/listinfo/powerdev -- Summary: [PPC64] un-needed copy generated for small structs kept on stack Product: gcc Version: 4.1.2 Status: UNCONFIRMED Severity: normal Priority: P3 Component: c++ AssignedTo: unassigned at gcc dot gnu dot org ReportedBy: linasvepstas at gmail dot com GCC build triplet: powerpc64-unknown-linux-gnu GCC host triplet: powerpc64-unknown-linux-gnu GCC target triplet: powerpc64-unknown-linux-gnu http://gcc.gnu.org/bugzilla/show_bug.cgi?id=39067