https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105519
Bug ID: 105519 Summary: Unnecessary memcpy() copy for empty asm volatile Product: gcc Version: 11.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: alexgpg at gmail dot com Target Milestone: --- Created attachment 52937 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52937&action=edit Source code preprocessed file Use asm volatile("" : "+m,r"(value) : : "memory") with specific constraints ("+m,r") with large objects(array or structs with size > 8192) cause full copy for object and that copy stay unused. Maybe it's an issue in the register allocator. The compiler uses "r" instead of "m" even for large objects. The following code cause caused unnecessary copying with memcpy() ``` struct Large { int arr[2049]; }; extern Large obj; namespace { template <class Tp> inline __attribute__((always_inline)) void DoNotOptimize(Tp& value) { asm volatile("" : "+m,r"(value) : : "memory"); } } void foo() { DoNotOptimize(obj); } ``` Generate assembly code g++ -Wall -Wextra -O3 -save-temps -fno-stack-protector -S do_not_optimize.cpp Generated assembly code (x86_64) ``` foo(): subq $8216, %rsp # 1. Extend stack size movl $8196, %edx # 2. %edx = 8196 = sizeof(Large) = sizeof(int) * 2049. # Prepare 3d arg (n - size) for memcpy() leaq obj(%rip), %rsi # 3. %rsi = &obj. Prepare 2d arg (src) for memcpy(). movq %rsp, %rdi # 4. %rdi = %rsp. %rdi points to the top of the stack. # Prepace 1-st arg (dest) for memcpy(). call memcpy@PLT # 5. Call memcpy(dest=%rdi, src=%rsi ,n=%edx=8196) addq $8216, %rsp # 6. Reduce stack size ret # 7. Return from function ``` What code do? 1. Extent stack size (line 1) 2. Copy data to new extended space in stack (line 2-7) 3. Reduce stack size back (line 6) Looks like copy is not needed. Notes * -fno-stack-protector used just for small assembly code. Enable stack protection doesn't change behavior * gcc generates memcpy() call only if size of object > 8192 * https://godbolt.org/z/hPYfcrqbW - Godbolt Compiler Explorer Playground Versions: 11.2.0, but looks like versions 4.1.2 - 11.12(all available on godbolt) is also affected # gcc -v Using built-in specs. COLLECT_GCC=gcc COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/11.2.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: /build/gcc/src/gcc/configure --enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --enable-bootstrap --prefix=/usr --libdir= /usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man --infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/ --with-linker-hash-style =gnu --with-system-zlib --enable-__cxa_atexit --enable-cet=auto --enable-checking=release --enable-clocale=gnu --enable-default-pie --enable-defaul t-ssp --enable-gnu-indirect-function --enable-gnu-unique-object --enable-linker-build-id --enable-lto --enable-multilib --enable-plugin --enable-sh ared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch --disable-werror --with-build-config=bootstrap-lto --enable-link-serialization =1 gdc_include_dir=/usr/include/dlang/gdc Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 11.2.0 (GCC)