Issue 153886
Summary [flang][OpenMP] Wrong answer in OpenMP target teams loop min reduction at "-g", and no optimization cases.
Labels flang
Assignees
Reporter scamp-nvidia
    The following test case, reduced from a SPEC Accel code, gives wrong answers when compiled with OpenMP enabled, but using "-g" or "-O0" flags. 

```
MODULE mod_kernel
CONTAINS
SUBROUTINE sub_kernel(ix_min, ix_max, iy_min, iy_max, param_min, arr_min, val_min)
  IMPLICIT NONE
  INTEGER :: ix_min, ix_max, iy_min, iy_max
  REAL(KIND=8) :: param_min, val_min
 REAL(KIND=8), DIMENSION(ix_min-2:ix_max+3,iy_min-2:iy_max+3) :: arr_min
 INTEGER :: jj, kk
  REAL(KIND=8) :: tmp_min
  val_min = 10
!$omp target teams loop
  DO kk = iy_min, iy_max
!$omp loop
    DO jj = ix_min, ix_max
       arr_min(jj,kk) = 1.0
    ENDDO
  ENDDO
  WRITE(*,*) "Minimum before", MINVAL(arr_min(ix_min:ix_max,iy_min:iy_max))
!$omp target teams
  !$omp loop REDUCTION(min:val_min) private(tmp_min)
  DO kk = iy_min, iy_max
    tmp_min = val_min
    DO jj = ix_min, ix_max
      IF (arr_min(jj,kk) .LT. tmp_min) tmp_min = arr_min(jj,kk)
    ENDDO
    IF (tmp_min .LT. val_min) val_min = tmp_min
  ENDDO
!$omp end target teams
 WRITE(*,*) "Claimed minimum", val_min
END SUBROUTINE sub_kernel
END MODULE mod_kernel

PROGRAM prog_main
  USE mod_kernel
  IMPLICIT NONE
 INTEGER, PARAMETER :: nx = 4
  INTEGER, PARAMETER :: ny = 4
  INTEGER, PARAMETER :: ix_min = 1, ix_max = nx
  INTEGER, PARAMETER :: iy_min = 1, iy_max = ny
  REAL(KIND=8), DIMENSION(ix_min-2:ix_max+3,iy_min-2:iy_max+3) :: arr_min
  REAL(KIND=8) :: val_min
  REAL(KIND=8), PARAMETER :: param_min = 0.0000001_8
  CALL sub_kernel(ix_min, ix_max, iy_min, iy_max, param_min, arr_min, val_min)
END PROGRAM prog_main
```
Compiling and running it with a recent build of Flang:

```
[scamp:$ flang -v test.F90 -o test -fopenmp -g
flang version 22.0.0git (https://github.com/llvm/llvm-project 5c51a88f193a4753818b31ca186b3a1ef1a07ecf)
Target: x86_64-unknown-linux-gnu
Thread model: posix
Build config: +assertions
Found candidate GCC installation: /opt/rh/gcc-toolset-12/root/usr/lib/gcc/x86_64-redhat-linux/12
Selected GCC installation: /opt/rh/gcc-toolset-12/root/usr/lib/gcc/x86_64-redhat-linux/12
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: .;@m64
 "llvm/Linux_x86_64/llvm-5810/bin/flang" -fc1 -triple x86_64-unknown-linux-gnu -emit-obj -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -target-cpu x86-64 -debug-info-kind=standalone -fopenmp -resource-dir clang/22 -mframe-pointer=all -o /tmp/test-dc369d.o -x f95 test.F90
warning: loc("test.F90":26:7): Detected standalone OpenMP `loop` directive with thread binding, the associated loop will be rewritten to `simd`.
 "/opt/rh/gcc-toolset-12/root/usr/lib/gcc/x86_64-redhat-linux/12/../../../../bin/ld" --hash-style=gnu --eh-frame-hdr -m elf_x86_64 -pie -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o test /lib/../lib64/Scrt1.o /lib/../lib64/crti.o /opt/rh/gcc-toolset-12/root/usr/lib/gcc/x86_64-redhat-linux/12/crtbeginS.o -L./llvm/Linux_x86_64/llvm-5810/bin/../lib/x86_64-unknown-linux-gnu -L./llvm/Linux_x86_64/llvm-5810/lib/clang/22/lib/x86_64-unknown-linux-gnu -L/opt/rh/gcc-toolset-12/root/usr/lib/gcc/x86_64-redhat-linux/12 -L/opt/rh/gcc-toolset-12/root/usr/lib/gcc/x86_64-redhat-linux/12/../../../../lib64 -L/lib/../lib64 -L/usr/lib64 -L/lib -L/usr/lib /tmp/test-dc369d.o -L./lib -lflang_rt.quadmath --as-needed -lquadmath --no-as-needed -lflang_rt.runtime -latomic -lm -lomp -Lllvm/Linux_x86_64/llvm-5810/lib -lgcc --as-needed -lgcc_s --no-as-needed -lpthread -lc -lgcc --as-needed -lgcc_s --no-as-needed /opt/rh/gcc-toolset-12/root/usr/lib/gcc/x86_64-redhat-linux/12/crtendS.o /lib/../lib64/crtn.o
[scamp:$ ./test
 Minimum  1.
 Claimed minimum 0.
[scamp:$ ./test
 Minimum  1.
 Claimed minimum 0.
[scamp:$ ./test
 Minimum  1.
 Claimed minimum 1.
[scamp:$ ./test
 Minimum  1.
 Claimed minimum 1.
[scamp:$ ./test
 Minimum  1.
 Claimed minimum 0.
```
In this case, the minimum and claimed minimum should be the same, but I find that we get the wrong answer most of the time - though we do occasionally get the right answer. If I set OMP_NUM_THREADS=1, then I get the correct behavior. I see similar incorrect behavior if I use "-O0" or include no optimization. If I set optimization to "-O1" or higher, then I get the correct behavior. 

I tested this with gfortran (14.1), and it similarly has problems with this code at all optimization levels - but I consider this a gfortran issue. NVHPC 25.7 has no issues getting the right answer with this pattern. 
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to