https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95397
Kirill Chilikin <chilikin.k at gmail dot com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |chilikin.k at gmail dot com --- Comment #4 from Kirill Chilikin <chilikin.k at gmail dot com> --- I found another case of incorrect results depending on the presence of the "acc loop vector" directive (and the original example also fails): $ cat test1.f90 PROGRAM TEST USE, INTRINSIC :: ISO_FORTRAN_ENV IMPLICIT NONE INTEGER, PARAMETER :: N1 = 32 INTEGER, PARAMETER :: N2 = 32 REAL(REAL64), DIMENSION(N1, N2) :: A INTEGER I1, I2 !$ACC PARALLEL COPYOUT(A) !$ACC LOOP WORKER DO I2 = 1, N2 BLOCK REAL(REAL64), DIMENSION(N1) :: V !$ACC LOOP VECTOR DO I1 = 1, N1 V(I1) = REAL(I1) ENDDO !$ACC LOOP VECTOR DO I1 = 1, N1 A(I1,I2) = V(I1) ENDDO END BLOCK ENDDO !$ACC END PARALLEL PRINT *, A(:, 1) END PROGRAM $ gfortran -o test1 test1.f90 -fopenacc -foffload=nvptx-none -foffload-options="-misa=sm_35" chilikin@comp1:/mnt/raid/chilikin/binp/belle2/analysis_physics/psipik/externals/test_offloading/vectorization$ ./test1 1.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 0.0000000000000000 If the first "acc loop vector" directive is removed (test2.f90), then $ gfortran -o test2 test2.f90 -fopenacc -foffload=nvptx-none -foffload-options="-misa=sm_35" $ ./test2 1.0000000000000000 2.0000000000000000 3.0000000000000000 4.0000000000000000 5.0000000000000000 6.0000000000000000 7.0000000000000000 8.0000000000000000 9.0000000000000000 10.000000000000000 11.000000000000000 12.000000000000000 13.000000000000000 14.000000000000000 15.000000000000000 16.000000000000000 17.000000000000000 18.000000000000000 19.000000000000000 20.000000000000000 21.000000000000000 22.000000000000000 23.000000000000000 24.000000000000000 25.000000000000000 26.000000000000000 27.000000000000000 28.000000000000000 29.000000000000000 30.000000000000000 31.000000000000000 32.000000000000000 GPU is Nvidia GeForce GT 710, compiler is $ gfortran -v Using built-in specs. COLLECT_GCC=gfortran COLLECT_LTO_WRAPPER=/path/offloading/libexec/gcc/x86_64-pc-linux-gnu/14.2.0/lto-wrapper OFFLOAD_TARGET_NAMES=nvptx-none Target: x86_64-pc-linux-gnu Configured with: /source_path/gcc-14.2.0/configure --prefix=/path/offloading --with-gmp=/path --with-mpfr=/path --with-mpc=/path --with-isl=/path --disable-multilib --enable-offload-targets=nvptx-none Thread model: posix Supported LTO compression algorithms: zlib zstd gcc version 14.2.0 (GCC)