https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105519

            Bug ID: 105519
           Summary: Unnecessary memcpy() copy for empty asm volatile
           Product: gcc
           Version: 11.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: rtl-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: alexgpg at gmail dot com
  Target Milestone: ---

Created attachment 52937
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=52937&action=edit
Source code preprocessed file

Use asm volatile("" : "+m,r"(value) : : "memory") with specific constraints
("+m,r") with large objects(array or structs with size > 8192) cause full copy
for object and that copy stay unused.

Maybe it's an issue in the register allocator. The compiler uses "r" instead of
"m" even for large objects.

The following code cause caused unnecessary copying with memcpy()

```
struct Large {
 int arr[2049];
};

extern Large obj;

namespace {

template <class Tp>
inline __attribute__((always_inline)) void DoNotOptimize(Tp& value) {
  asm volatile("" : "+m,r"(value) : : "memory");
}

}

void foo() {
  DoNotOptimize(obj);
}
```

Generate assembly code

g++ -Wall -Wextra -O3 -save-temps -fno-stack-protector -S do_not_optimize.cpp

Generated assembly code (x86_64)

```
foo():
  subq  $8216, %rsp      # 1. Extend stack size
  movl  $8196, %edx      # 2. %edx = 8196 = sizeof(Large) = sizeof(int) * 2049.
                         #    Prepare 3d arg (n - size) for memcpy()
  leaq  obj(%rip), %rsi  # 3. %rsi = &obj. Prepare 2d arg (src) for memcpy().
  movq  %rsp, %rdi       # 4. %rdi = %rsp. %rdi points to the top of the stack.
                         #    Prepace 1-st arg (dest) for memcpy().
  call  memcpy@PLT       # 5. Call memcpy(dest=%rdi, src=%rsi ,n=%edx=8196)
  addq  $8216, %rsp      # 6. Reduce stack size
  ret                    # 7. Return from function
```

What code do?

1. Extent stack size (line 1)
2. Copy data to new extended space in stack (line 2-7)
3. Reduce stack size back (line 6)

Looks like copy is not needed.

Notes

* -fno-stack-protector used just for small assembly code. Enable stack
protection doesn't change behavior

* gcc generates memcpy() call only if size of object > 8192

* https://godbolt.org/z/hPYfcrqbW - Godbolt Compiler Explorer Playground

Versions: 11.2.0, but looks like versions 4.1.2 - 11.12(all available on
godbolt) is also affected

# gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/x86_64-pc-linux-gnu/11.2.0/lto-wrapper
Target: x86_64-pc-linux-gnu
Configured with: /build/gcc/src/gcc/configure
--enable-languages=c,c++,ada,fortran,go,lto,objc,obj-c++,d --enable-bootstrap
--prefix=/usr --libdir=
/usr/lib --libexecdir=/usr/lib --mandir=/usr/share/man
--infodir=/usr/share/info --with-bugurl=https://bugs.archlinux.org/
--with-linker-hash-style
=gnu --with-system-zlib --enable-__cxa_atexit --enable-cet=auto
--enable-checking=release --enable-clocale=gnu --enable-default-pie
--enable-defaul
t-ssp --enable-gnu-indirect-function --enable-gnu-unique-object
--enable-linker-build-id --enable-lto --enable-multilib --enable-plugin
--enable-sh
ared --enable-threads=posix --disable-libssp --disable-libstdcxx-pch
--disable-werror --with-build-config=bootstrap-lto --enable-link-serialization
=1 gdc_include_dir=/usr/include/dlang/gdc
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 11.2.0 (GCC)

Reply via email to