https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95397

Kirill Chilikin <chilikin.k at gmail dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |chilikin.k at gmail dot com

--- Comment #4 from Kirill Chilikin <chilikin.k at gmail dot com> ---
I found another case of incorrect results depending on the presence of the "acc
loop vector" directive (and the original example also fails):

$ cat test1.f90 
PROGRAM TEST
  USE, INTRINSIC :: ISO_FORTRAN_ENV
  IMPLICIT NONE
  INTEGER, PARAMETER :: N1 = 32
  INTEGER, PARAMETER :: N2 = 32
  REAL(REAL64), DIMENSION(N1, N2) :: A
  INTEGER I1, I2
  !$ACC PARALLEL COPYOUT(A)
  !$ACC LOOP WORKER
  DO I2 = 1, N2
    BLOCK
      REAL(REAL64), DIMENSION(N1) :: V
      !$ACC LOOP VECTOR
      DO I1 = 1, N1
        V(I1) = REAL(I1)
      ENDDO
      !$ACC LOOP VECTOR
      DO I1 = 1, N1
        A(I1,I2) = V(I1)
      ENDDO
    END BLOCK
  ENDDO
  !$ACC END PARALLEL
  PRINT *, A(:, 1)
END PROGRAM

$ gfortran -o test1 test1.f90 -fopenacc -foffload=nvptx-none
-foffload-options="-misa=sm_35"
chilikin@comp1:/mnt/raid/chilikin/binp/belle2/analysis_physics/psipik/externals/test_offloading/vectorization$
./test1 
   1.0000000000000000        0.0000000000000000        0.0000000000000000      
 0.0000000000000000        0.0000000000000000        0.0000000000000000       
0.0000000000000000        0.0000000000000000        0.0000000000000000       
0.0000000000000000        0.0000000000000000        0.0000000000000000       
0.0000000000000000        0.0000000000000000        0.0000000000000000       
0.0000000000000000        0.0000000000000000        0.0000000000000000       
0.0000000000000000        0.0000000000000000        0.0000000000000000       
0.0000000000000000        0.0000000000000000        0.0000000000000000       
0.0000000000000000        0.0000000000000000        0.0000000000000000       
0.0000000000000000        0.0000000000000000        0.0000000000000000       
0.0000000000000000        0.0000000000000000     

If the first "acc loop vector" directive is removed (test2.f90), then

$ gfortran -o test2 test2.f90 -fopenacc -foffload=nvptx-none
-foffload-options="-misa=sm_35"
$ ./test2
   1.0000000000000000        2.0000000000000000        3.0000000000000000      
 4.0000000000000000        5.0000000000000000        6.0000000000000000       
7.0000000000000000        8.0000000000000000        9.0000000000000000       
10.000000000000000        11.000000000000000        12.000000000000000       
13.000000000000000        14.000000000000000        15.000000000000000       
16.000000000000000        17.000000000000000        18.000000000000000       
19.000000000000000        20.000000000000000        21.000000000000000       
22.000000000000000        23.000000000000000        24.000000000000000       
25.000000000000000        26.000000000000000        27.000000000000000       
28.000000000000000        29.000000000000000        30.000000000000000       
31.000000000000000        32.000000000000000     

GPU is Nvidia GeForce GT 710, compiler is
$ gfortran -v
Using built-in specs.
COLLECT_GCC=gfortran
COLLECT_LTO_WRAPPER=/path/offloading/libexec/gcc/x86_64-pc-linux-gnu/14.2.0/lto-wrapper
OFFLOAD_TARGET_NAMES=nvptx-none
Target: x86_64-pc-linux-gnu
Configured with: /source_path/gcc-14.2.0/configure --prefix=/path/offloading
--with-gmp=/path --with-mpfr=/path --with-mpc=/path --with-isl=/path
--disable-multilib --enable-offload-targets=nvptx-none
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 14.2.0 (GCC)

Reply via email to