https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118076

            Bug ID: 118076
           Summary: Missed Optimization: Inefficient Stack Usage in
                    Creating and Passing Large Struct Argument
           Product: gcc
           Version: 14.2.1
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: jonathan.gruber.jg at gmail dot com
  Target Milestone: ---

Created attachment 59885
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=59885&action=edit
Minimal test case.

When passing creating a large struct and passing it as an argument, gcc makes
inefficient use of the stack. A simple test case is in the attached file
test.c. I observed this bug on (non-cross-compiling) gcc as well as the aarch64
and riscv64 cross-compilers (I did not check any other cross-compilers,
though). The minimal command-line options to reproduce the bug are -O2, -O3,
-Os, or -Oz.

Below is the test case in the attached test.c, for your convenience:
struct S {
    void *x, *y, *z, *w;
};

extern int extern_func(struct S s);

int fwd_func(void *x, void *y, void *z, void *w) {
    struct S s = { x, y, z, w };

    return extern_func(s);
}

Below is the generated assembly for gcc with -O3:
sub    rsp,0x48
mov    QWORD PTR [rsp+0x20],rdi
mov    QWORD PTR [rsp+0x28],rsi
movdqa xmm0,XMMWORD PTR [rsp+0x20]
mov    QWORD PTR [rsp+0x30],rdx
mov    QWORD PTR [rsp+0x38],rcx
movups XMMWORD PTR [rsp],xmm0
movdqa xmm0,XMMWORD PTR [rsp+0x30]
movups XMMWORD PTR [rsp+0x10],xmm0
call   extern_func
add    rsp,0x48
ret

Below is the generated assembly for aarch64-linux-gnu-gcc with -O3:
stp x29, x30, [sp, #-80]!
mov x29, sp
stp x0, x1, [sp, #48]
add x0, sp, #0x10
stp x2, x3, [sp, #64]
ldp q30, q31, [sp, #48]
str q30, [sp, #16]
str q31, [x0, #16]
bl  extern_func
ldp x29, x30, [sp], #80
ret

And below is the generated assembly for riscv64-linux-gnu-gcc with -O3:
addi  sp,sp,-80
mv    a5,a0
mv    a0,sp
sd    ra,72(sp)
sd    a5,0(sp)
sd    a1,8(sp)
sd    a2,16(sp)
sd    a3,24(sp)
auipc ra,0x0
jalr  ra
ld    ra,72(sp)
addi  sp,sp,80
ret

On x86_64 and aarch64, gcc needlessly stores the parameters to fwd_func on the
stack before storing them again on the stack to create the struct parameter to
extern_func, thereby storing the parameters twice on the stack rather than
once. Apparently, riscv64 attempted to do the same thing, since it decreased sp
way more than necessary, but seems to have optimized away the redundant store
instructions themselves. 


Host system type: Arch Linux, x86_64

gcc information:
Version: 14.2.1 20240910 (GCC)
Configured with: /build/gcc/src/gcc/configure
--enable-languages=ada,c,c++,d,fortran,go,lto,m2,objc,obj-c++,rust
--enable-bootstrap --prefix=/usr --libdir=/usr/lib --libexecdir=/usr/lib
--mandir=/usr/share/man --infodir=/usr/share/info
--with-bugurl=https://gitlab.archlinux.org/archlinux/packaging/packages/gcc/-/issues
--with-build-config=bootstrap-lto --with-linker-hash-style=gnu
--with-system-zlib --enable-__cxa_atexit --enable-cet=auto
--enable-checking=release --enable-clocale=gnu --enable-default-pie
--enable-default-ssp --enable-gnu-indirect-function --enable-gnu-unique-object
--enable-libstdcxx-backtrace --enable-link-serialization=1
--enable-linker-build-id --enable-lto --enable-multilib --enable-plugin
--enable-shared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch
--disable-werror

aarch64-linux-gnu-gcc information:
Version: 14.2.0
Configured with: /build/aarch64-linux-gnu-gcc/src/gcc-14.2.0/configure
--prefix=/usr --program-prefix=aarch64-linux-gnu-
--with-local-prefix=/usr/aarch64-linux-gnu
--with-sysroot=/usr/aarch64-linux-gnu
--with-build-sysroot=/usr/aarch64-linux-gnu
--with-native-system-header-dir=/include --libdir=/usr/lib
--libexecdir=/usr/lib --target=aarch64-linux-gnu --host=x86_64-pc-linux-gnu
--build=x86_64-pc-linux-gnu --disable-nls --enable-default-pie
--enable-languages=c,c++,fortran --enable-shared --enable-threads=posix
--with-system-zlib --with-isl --enable-__cxa_atexit
--disable-libunwind-exceptions --enable-clocale=gnu --disable-libstdcxx-pch
--disable-libssp --enable-gnu-unique-object --enable-linker-build-id
--enable-lto --enable-plugin --enable-install-libiberty
--with-linker-hash-style=gnu --enable-gnu-indirect-function --disable-multilib
--disable-werror --enable-checking=release

riscv64-linux-gnu-gcc information:
Version: 14.2.0
Configured with: /build/riscv64-linux-gnu-gcc/src/gcc-14.2.0/configure
--prefix=/usr --program-prefix=riscv64-linux-gnu-
--with-local-prefix=/usr/riscv64-linux-gnu
--with-sysroot=/usr/riscv64-linux-gnu
--with-build-sysroot=/usr/riscv64-linux-gnu --libdir=/usr/lib
--libexecdir=/usr/lib --target=riscv64-linux-gnu --host=x86_64-pc-linux-gnu
--build=x86_64-pc-linux-gnu --with-system-zlib --with-isl
--with-linker-hash-style=gnu --disable-nls --disable-libunwind-exceptions
--disable-libstdcxx-pch --disable-libssp --disable-multilib --disable-werror
--enable-languages=c,c++ --enable-shared --enable-threads=posix
--enable-__cxa_atexit --enable-clocale=gnu --enable-gnu-unique-object
--enable-linker-build-id --enable-lto --enable-plugin
--enable-install-libiberty --enable-gnu-indirect-function --enable-default-pie
--enable-checking=release

Reply via email to