https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68086
Bug ID: 68086 Summary: Expression explicitly defined outside the loop is moved inside the loop by the optimizer Product: gcc Version: 5.2.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: rtl-optimization Assignee: unassigned at gcc dot gnu.org Reporter: igusarov at mail dot ru Target Milestone: --- Created attachment 36578 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36578&action=edit Single function to reproduce the results Compileable C source in "ex324_core.c" does not include any header files. It consists of a single function whose performance is spoiled by the optimizer. Please read explanatory comments in that file. "ex324.c" is a compileable test program build around the same core function. It merely measures the amount of CPU clock ticks taken by that core function. It includes system headers for printf and mmap, and is provided just for convenience of testing. The problem was first discovered in x86_64 gcc 5.2.0 compiler. Brief regression research showed that 4.8.3 has this problem too. 4.7.4 seems to be good. Problem in a nutshell. Let's start with this loop: // Case 1 for (i = 0; i < size; ++i) accumulator += data[i]; and rewrite it in this equivalent form: // Case 2 int* rebased = data + size; for (i = -size; i; ++i) accumulator += rebased[i]; It looks like the forward propagation pass decides not to allocate a register for variable 'rebased', but rather compute its value every time it is used in the loop. This results in assembly output which, if written in terms of C, would look like this: for (i = -size; i; ++i) accumulator += *(data + (size + i)); Extra operation inside the loop only slows the program down. This happens at any optimization level above -O0. Command line: x86_64-unknown-freebsd9.0_5.2.0-gcc -O2 -S ex324_core.c Compiler: x86_64-unknown-freebsd9.0_5.2.0-gcc -v Using built-in specs. COLLECT_GCC=x86_64-unknown-freebsd9.0_5.2.0-gcc COLLECT_LTO_WRAPPER=/usr/toolchain/x86_64-unknown-freebsd9.0_5.2.0/libexec/gcc/x86_64-unknown-freebsd9.0/5.2.0/lto-wrapper Target: x86_64-unknown-freebsd9.0 Configured with: /mnt/hdd/usr/home/toolbuilder/build_scripts/x86_64-unknown-freebsd9.0_5.2.0/build_scripts/../tools_build/x86_64-unknown-freebsd9.0_5.2.0/gcc-5.2.0/configure --target=x86_64-unknown-freebsd9.0 --prefix=/usr/toolchain/x86_64-unknown-freebsd9.0_5.2.0 --with-local-prefix=/usr/local --with-sysroot=/usr/toolchain/x86_64-unknown-freebsd9.0_5.2.0/sysroot --program-prefix=x86_64-unknown-freebsd9.0_5.2.0- --with-gnu-as --with-gnu-ld --with-as=/usr/toolchain/x86_64-unknown-freebsd9.0_5.2.0/bin/x86_64-unknown-freebsd9.0_5.2.0-as --with-ld=/usr/toolchain/x86_64-unknown-freebsd9.0_5.2.0/bin/x86_64-unknown-freebsd9.0_5.2.0-ld --with-nm=/usr/toolchain/x86_64-unknown-freebsd9.0_5.2.0/bin/x86_64-unknown-freebsd9.0_5.2.0-nm --with-objdump=/usr/toolchain/x86_64-unknown-freebsd9.0_5.2.0/bin/x86_64-unknown-freebsd9.0_5.2.0-objdump --with-gmp=/mnt/hdd/usr/home/toolbuilder/build_scripts/x86_64-unknown-freebsd9.0_5.2.0/build_scripts/../tools_build/x86_64-unknown-freebsd9.0_5.2.0/gmp-root --with-mpfr=/mnt/hdd/usr/home/toolbuilder/build_scripts/x86_64-unknown-freebsd9.0_5.2.0/build_scripts/../tools_build/x86_64-unknown-freebsd9.0_5.2.0/mpfr-root --with-mpc=/mnt/hdd/usr/home/toolbuilder/build_scripts/x86_64-unknown-freebsd9.0_5.2.0/build_scripts/../tools_build/x86_64-unknown-freebsd9.0_5.2.0/mpc-root --disable-__cxa_atexit --enable-languages=c,c++ --disable-multilib --disable-nls --enable-shared=libstdc++ --enable-static --enable-threads Thread model: posix gcc version 5.2.0 (GCC) Operating system: amd64 FreeBSD 9.0-RELEASE CPU: Intel(R) Core(TM) i7-2700K CPU @ 3.50GHz (3500.10-MHz K8-class CPU) Origin = "GenuineIntel" Id = 0x206a7 Family = 6 Model = 2a Stepping = 7 Features=0xbfebfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE> Features2=0x179ae3bf<SSE3,PCLMULQDQ,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM,PCID,SSE4.1,SSE4.2,POPCNT,TSCDLT,AESNI,XSAVE,AVX> AMD Features=0x28100800<SYSCALL,NX,RDTSCP,LM> AMD Features2=0x1<LAHF>