https://gcc.gnu.org/bugzilla/show_bug.cgi?id=114676
Bug ID: 114676 Summary: [12/13/14 Regression] DSE removes assignment that is used later Product: gcc Version: 12.1.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: aleksei.nikiforov at linux dot ibm.com Target Milestone: --- Created attachment 57916 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=57916&action=edit GridSamplerKernel.cpp.ZVECTOR.cpp.o.prep2.cpp.bz2 When building pytorch on s390x with gcc >= 12, resulting pytorch application crashes in some tests. It doesn't happen with gcc <= 11. I've bisected gcc, and issue first appears with gcc commit 32955416d8040b1fa1ba21cd4179b3264e6c5bd6. I've also found in which object file miscompilation happens. gcc configuration: /bin/sh /var/tmp/portage/sys-devel/gcc-12.3.9999/work/gcc-12.3.9999/configure --host=s390x-ibm-linux-gnu --build=s390x-ibm-linux-gnu --prefix=/usr --bindir=/usr/s390x-ibm-linux-gnu/gcc-bin/12 --includedir=/usr/lib/gcc/s390x-ibm-linux-gnu/12/include --datadir=/usr/share/gcc-data/s390x-ibm-linux-gnu/12 --mandir=/usr/share/gcc-data/s390x-ibm-linux-gnu/12/man --infodir=/usr/share/gcc-data/s390x-ibm-linux-gnu/12/info --with-gxx-include-dir=/usr/lib/gcc/s390x-ibm-linux-gnu/12/include/g++-v12 --disable-silent-rules --disable-dependency-tracking --with-python-dir=/share/gcc-data/s390x-ibm-linux-gnu/12/python --enable-languages='c,c++,fortran' --enable-obsolete --enable-secureplt --disable-werror --with-system-zlib --enable-nls --without-included-gettext --disable-libunwind-exceptions --enable-checking=release --with-bugurl='https://bugs.gentoo.org/' --with-pkgversion='Gentoo 12.0.0, commit 32955416d8040b1fa1ba21cd4179b3264e6c5bd6' --with-gcc-major-version-only --enable-libstdcxx-time --enable-lto --disable-libstdcxx-pch --enable-shared --enable-threads=posix --enable-__cxa_atexit --enable-clocale=gnu --disable-multilib --disable-fixed-point --enable-libgomp --disable-libssp --disable-libada --disable-cet --disable-systemtap --disable-valgrind-annotations --disable-vtable-verify --disable-libvtv --without-zstd --without-isl --disable-libsanitizer --enable-default-pie --enable-default-ssp --with-arch=z15 I'm attaching preprocessed file. Full compilation command is: /usr/bin/g++-12 -DAT_PER_OPERATOR_HEADERS -DCAFFE2_BUILD_MAIN_LIB -DFMT_HEADER_ONLY=1 -DHAVE_MALLOC_USABLE_SIZE=1 -DHAVE_MMAP=1 -DHAVE_SHM_OPEN=1 -DHAVE_SHM_UNLINK=1 -DMINIZ_DISABLE_ZIP_READER_CRC32_CHECKS -DONNXIFI_ENABLE_EXT=1 -DONNX_ML=1 -DONNX_NAMESPACE=onnx_torch -DUSE_C10D_GLOO -DUSE_DISTRIBUTED -DUSE_EXTERNAL_MZCRC -DUSE_RPC -DUSE_TENSORPIPE -D_FILE_OFFSET_BITS=64 -Dtorch_cpu_EXPORTS -I/home/user/work12/pytorch/build/aten/src -I/home/user/work12/pytorch/aten/src -I/home/user/work12/pytorch/build -I/home/user/work12/pytorch -I/home/user/work12/pytorch/cmake/../third_party/benchmark/include -I/home/user/work12/pytorch/third_party/onnx -I/home/user/work12/pytorch/build/third_party/onnx -I/home/user/work12/pytorch/third_party/foxi -I/home/user/work12/pytorch/build/third_party/foxi -I/home/user/work12/pytorch/torch/csrc/api -I/home/user/work12/pytorch/torch/csrc/api/include -I/home/user/work12/pytorch/caffe2/aten/src/TH -I/home/user/work12/pytorch/build/caffe2/aten/src/TH -I/home/user/work12/pytorch/build/caffe2/aten/src -I/home/user/work12/pytorch/build/caffe2/../aten/src -I/home/user/work12/pytorch/torch/csrc -I/home/user/work12/pytorch/third_party/miniz-2.1.0 -I/home/user/work12/pytorch/third_party/kineto/libkineto/include -I/home/user/work12/pytorch/third_party/kineto/libkineto/src -I/home/user/work12/pytorch/aten/src/ATen/.. -I/home/user/work12/pytorch/c10/.. -I/home/user/work12/pytorch/third_party/FP16/include -I/home/user/work12/pytorch/third_party/tensorpipe -I/home/user/work12/pytorch/build/third_party/tensorpipe -I/home/user/work12/pytorch/third_party/tensorpipe/third_party/libnop/include -I/home/user/work12/pytorch/third_party/fmt/include -I/home/user/work12/pytorch/third_party/flatbuffers/include -isystem /home/user/work12/pytorch/build/third_party/gloo -isystem /home/user/work12/pytorch/cmake/../third_party/gloo -isystem /home/user/work12/pytorch/cmake/../third_party/tensorpipe/third_party/libuv/include -isystem /home/user/work12/pytorch/cmake/../third_party/googletest/googlemock/include -isystem /home/user/work12/pytorch/cmake/../third_party/googletest/googletest/include -isystem /home/user/work12/pytorch/third_party/protobuf/src -isystem /home/user/work12/pytorch/cmake/../third_party/eigen -isystem /home/user/work12/pytorch/build/include -march=z15 -D_GLIBCXX_USE_CXX11_ABI=1 -fvisibility-inlines-hidden -DNDEBUG -DUSE_KINETO -DLIBKINETO_NOCUPTI -DLIBKINETO_NOROCTRACER -DSYMBOLICATE_MOBILE_DEBUG_HANDLE -O2 -fPIC -Wall -Wextra -Wnarrowing -Wno-missing-field-initializers -Wno-type-limits -Wno-array-bounds -Wno-unknown-pragmas -Wno-unused-parameter -Wno-unused-function -Wno-unused-result -Wno-strict-overflow -Wno-strict-aliasing -Wno-stringop-overflow -Wsuggest-override -Wno-psabi -Wno-error=pedantic -Wno-error=old-style-cast -Wno-missing-braces -fdiagnostics-color=always -faligned-new -Wno-unused-but-set-variable -Wno-maybe-uninitialized -fno-math-errno -fno-trapping-math -Wno-stringop-overflow -DHAVE_ZVECTOR_CPU_DEFINITION -O3 -DNDEBUG -DNDEBUG -std=gnu++17 -fPIC -DTORCH_USE_LIBUV -DCAFFE2_USE_GLOO -Wall -Wextra -Wdeprecated -Wno-unused-parameter -Wno-unused-function -Wno-missing-field-initializers -Wno-unknown-pragmas -Wno-type-limits -Wno-array-bounds -Wno-strict-overflow -Wno-strict-aliasing -Wno-maybe-uninitialized -fvisibility=hidden -O2 -fopenmp -O3 -mvx -mzvector -march=z15 -mtune=z15 -DCPU_CAPABILITY=ZVECTOR -DCPU_CAPABILITY_ZVECTOR -MD -MT caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/GridSamplerKernel.cpp.ZVECTOR.cpp.o -MF caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/GridSamplerKernel.cpp.ZVECTOR.cpp.o.d -o caffe2/CMakeFiles/torch_cpu.dir/__/aten/src/ATen/native/cpu/GridSamplerKernel.cpp.ZVECTOR.cpp.o -c /home/user/work12/pytorch/build/aten/src/ATen/native/cpu/GridSamplerKernel.cpp.ZVECTOR.cpp There are following lines in file around line 121590: integer_t mask_arr[iVec::size()]; mask.store(mask_arr); scalar_t gInp_corner_arr[Vec::size()]; delta.store(gInp_corner_arr); mask_scatter_add(gInp_corner_arr, data, i_gInp_offset_arr, mask_arr, len); store call (lines 117929-117940): void __attribute__((__always_inline__)) inline store(void* ptr, int count = size()) const { if (count == size()) { # 421 "/home/user/work12/pytorch/aten/src/ATen/cpu/vec/vec256/zarch/vec256_zarch.h" 3 4 __builtin_s390_vec_xst # 421 "/home/user/work12/pytorch/aten/src/ATen/cpu/vec/vec256/zarch/vec256_zarch.h" (_vec0, offset0, reinterpret_cast<ElementType*>(ptr)); # 422 "/home/user/work12/pytorch/aten/src/ATen/cpu/vec/vec256/zarch/vec256_zarch.h" 3 4 __builtin_s390_vec_xst # 422 "/home/user/work12/pytorch/aten/src/ATen/cpu/vec/vec256/zarch/vec256_zarch.h" (_vec1, offset16, reinterpret_cast<ElementType*>(ptr)); mask.store(mask_arr) is first replaced by 2 corresponding calls to __builtin_s390_vec_xst, and those are later incorrectly removed by DSE. I've also ran compilation command with -fdump-tree-all-all -fdump-rtl-all-all. In file *.040t.dse1 I've found following lines: ;; Function at::native::{anonymous}::ApplyGridSample<double, 2, at::native::detail::GridSamplerInterpolation::Bicubic, at::native::detail::GridSamplerPadding::Border, true>::add_value_bounded (_ZNK2at6nat ive12_GLOBAL__N_115ApplyGridSampleIdLi2ELNS0_6detail24GridSamplerInterpolationE2ELNS3_18GridSamplerPaddingE1ELb1EE17add_value_boundedEPdlRKNS_3vec7ZVECTOR10VectorizedIdvEESD_SD_, funcdef_no=13629, decl_ui d=274419, cgraph_uid=8075, symbol_order=9478) Pass statistics of "dse": ---------------- Deleted dead store: # .MEM_369 = VDEF <.MEM_368> MEM <const vtypeD.254540> [(ElementTypeD.254545 *)&mask_arrD.383153 + 16B] = _244; Deleted dead store: # .MEM_368 = VDEF <.MEM_360> MEM <const vtypeD.254540> [(ElementTypeD.254545 *)&mask_arrD.383153] = _242;