https://gcc.gnu.org/bugzilla/show_bug.cgi?id=86632
Bug ID: 86632 Summary: Incorrect value copied into output array with -O3 ftree-loop-vectorize Product: gcc Version: 6.3.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: ketan.surender at gmail dot com Target Milestone: --- I am observing incorrect results for the following function with -O3 static void mDiscBlocks2_repro_output(void) { /* local scratch DWork variables */ int32_T ForEach_itr_p; real_T rtb_ImpAsg_InsertedFor_Out1_a_d[6]; real_T rtb_ImpAsg_InsertedFor_Out2_a_o[3]; int32_T i; for (ForEach_itr_p = 0; ForEach_itr_p < 3; ForEach_itr_p++) { rtb_ImpAsg_InsertedFor_Out2_a_o[ForEach_itr_p] = mDiscBlocks2_repro_DW.CoreSubsys[ForEach_itr_p].Memory1_PreviousInput; rtb_ImpAsg_InsertedFor_Out1_a_d[ForEach_itr_p << 1] = mDiscBlocks2_repro_DW.CoreSubsys[ForEach_itr_p].Memory_PreviousInput[0]; rtb_ImpAsg_InsertedFor_Out1_a_d[1 + (ForEach_itr_p << 1)] = mDiscBlocks2_repro_DW.CoreSubsys[ForEach_itr_p].Memory_PreviousInput[1]; } /* KS REQUIRED */ for (i = 0; i < 6; i++) { mDiscBlocks2_repro_Y.Out14[i] = rtb_ImpAsg_InsertedFor_Out1_a_d[i]; } /* KS REQUIRED */ mDiscBlocks2_repro_Y.Out15[0] = rtb_ImpAsg_InsertedFor_Out2_a_o[0]; mDiscBlocks2_repro_Y.Out15[1] = rtb_ImpAsg_InsertedFor_Out2_a_o[1]; mDiscBlocks2_repro_Y.Out15[2] = rtb_ImpAsg_InsertedFor_Out2_a_o[2]; } This code copies some global data to a local array, then copies the local array to a global. mDiscBlocks2_repro_DW.CoreSubsys[0-2].Memory1_PreviousInput --> rtb_ImpAsg_InsertedFor_Out2_a_o[0-2] --> mDiscBlocks2_repro_Y.Out15[0-2] mDiscBlocks2_repro_DW.CoreSubsys[0-2].Memory_PreviousInput[0] --> rtb_ImpAsg_InsertedFor_Out1_a_d[0,2,4] --> mDiscBlocks2_repro_Y.Out14[0,2,4] mDiscBlocks2_repro_DW.CoreSubsys[0-2].Memory_PreviousInput[1] --> rtb_ImpAsg_InsertedFor_Out1_a_d[1,3,5] --> mDiscBlocks2_repro_Y.Out14[1,3,5] For the global 'mDiscBlocks2_repro_Y.Out14' I am observing the incorrect value at index 2. The issue goes away if I add the switch -fno-tree-loop-vectorize. I looked at the generated asm a little and can see the incorrect assignment. For some reason it writes to element 2 before writing the remaining elements. Here is my gcc info Using built-in specs. COLLECT_GCC=[SNIP]/glnxa64/gcc-6.3.0/bin/gcc COLLECT_LTO_WRAPPER=[SNIP]/glnxa64/gcc-6.3.0/bin/../libexec/gcc/x86_64-pc-linux-gnu/6.3.0/lto-wrapper Target: x86_64-pc-linux-gnu Configured with: [SNIP]/sources/gcc-6.3/configure --with-gmp=[SNIP]/glnxa64/gcc-6.3/gmp-4.3 --with-mpfr=[SNIP]/gcc-6.3/mpfr --with-mpc=[SNIP]/gcc-6.3/mpc --enable-languages=c,c++,fortran --with-bugurl=[SNIP],_Debugging --enable-shared --enable-linker-build-id --enable-plugin --enable-checking=release --enable-multiarch --enable-gold --enable-ld=default --enable-libstdcxx-time=no --prefix=[SNIP]/glnxa64/gcc-6.3.0 --with-pkgversion='MW GCC 6.3.0-GLIBC2.12' --with-tune=generic --with-system-zlib --enable-multilib --with-multilib-list=m32,m64 --with-arch-directory=amd64 --with-arch-32=i586 --with-abi=m64 Thread model: posix gcc version 6.3.0 (MW GCC 6.3.0-GLIBC2.12)