Issue |
153886
|
Summary |
[flang][OpenMP] Wrong answer in OpenMP target teams loop min reduction at "-g", and no optimization cases.
|
Labels |
flang
|
Assignees |
|
Reporter |
scamp-nvidia
|
The following test case, reduced from a SPEC Accel code, gives wrong answers when compiled with OpenMP enabled, but using "-g" or "-O0" flags.
```
MODULE mod_kernel
CONTAINS
SUBROUTINE sub_kernel(ix_min, ix_max, iy_min, iy_max, param_min, arr_min, val_min)
IMPLICIT NONE
INTEGER :: ix_min, ix_max, iy_min, iy_max
REAL(KIND=8) :: param_min, val_min
REAL(KIND=8), DIMENSION(ix_min-2:ix_max+3,iy_min-2:iy_max+3) :: arr_min
INTEGER :: jj, kk
REAL(KIND=8) :: tmp_min
val_min = 10
!$omp target teams loop
DO kk = iy_min, iy_max
!$omp loop
DO jj = ix_min, ix_max
arr_min(jj,kk) = 1.0
ENDDO
ENDDO
WRITE(*,*) "Minimum before", MINVAL(arr_min(ix_min:ix_max,iy_min:iy_max))
!$omp target teams
!$omp loop REDUCTION(min:val_min) private(tmp_min)
DO kk = iy_min, iy_max
tmp_min = val_min
DO jj = ix_min, ix_max
IF (arr_min(jj,kk) .LT. tmp_min) tmp_min = arr_min(jj,kk)
ENDDO
IF (tmp_min .LT. val_min) val_min = tmp_min
ENDDO
!$omp end target teams
WRITE(*,*) "Claimed minimum", val_min
END SUBROUTINE sub_kernel
END MODULE mod_kernel
PROGRAM prog_main
USE mod_kernel
IMPLICIT NONE
INTEGER, PARAMETER :: nx = 4
INTEGER, PARAMETER :: ny = 4
INTEGER, PARAMETER :: ix_min = 1, ix_max = nx
INTEGER, PARAMETER :: iy_min = 1, iy_max = ny
REAL(KIND=8), DIMENSION(ix_min-2:ix_max+3,iy_min-2:iy_max+3) :: arr_min
REAL(KIND=8) :: val_min
REAL(KIND=8), PARAMETER :: param_min = 0.0000001_8
CALL sub_kernel(ix_min, ix_max, iy_min, iy_max, param_min, arr_min, val_min)
END PROGRAM prog_main
```
Compiling and running it with a recent build of Flang:
```
[scamp:$ flang -v test.F90 -o test -fopenmp -g
flang version 22.0.0git (https://github.com/llvm/llvm-project 5c51a88f193a4753818b31ca186b3a1ef1a07ecf)
Target: x86_64-unknown-linux-gnu
Thread model: posix
Build config: +assertions
Found candidate GCC installation: /opt/rh/gcc-toolset-12/root/usr/lib/gcc/x86_64-redhat-linux/12
Selected GCC installation: /opt/rh/gcc-toolset-12/root/usr/lib/gcc/x86_64-redhat-linux/12
Candidate multilib: .;@m64
Candidate multilib: 32;@m32
Selected multilib: .;@m64
"llvm/Linux_x86_64/llvm-5810/bin/flang" -fc1 -triple x86_64-unknown-linux-gnu -emit-obj -fcolor-diagnostics -mrelocation-model pic -pic-level 2 -pic-is-pie -target-cpu x86-64 -debug-info-kind=standalone -fopenmp -resource-dir clang/22 -mframe-pointer=all -o /tmp/test-dc369d.o -x f95 test.F90
warning: loc("test.F90":26:7): Detected standalone OpenMP `loop` directive with thread binding, the associated loop will be rewritten to `simd`.
"/opt/rh/gcc-toolset-12/root/usr/lib/gcc/x86_64-redhat-linux/12/../../../../bin/ld" --hash-style=gnu --eh-frame-hdr -m elf_x86_64 -pie -dynamic-linker /lib64/ld-linux-x86-64.so.2 -o test /lib/../lib64/Scrt1.o /lib/../lib64/crti.o /opt/rh/gcc-toolset-12/root/usr/lib/gcc/x86_64-redhat-linux/12/crtbeginS.o -L./llvm/Linux_x86_64/llvm-5810/bin/../lib/x86_64-unknown-linux-gnu -L./llvm/Linux_x86_64/llvm-5810/lib/clang/22/lib/x86_64-unknown-linux-gnu -L/opt/rh/gcc-toolset-12/root/usr/lib/gcc/x86_64-redhat-linux/12 -L/opt/rh/gcc-toolset-12/root/usr/lib/gcc/x86_64-redhat-linux/12/../../../../lib64 -L/lib/../lib64 -L/usr/lib64 -L/lib -L/usr/lib /tmp/test-dc369d.o -L./lib -lflang_rt.quadmath --as-needed -lquadmath --no-as-needed -lflang_rt.runtime -latomic -lm -lomp -Lllvm/Linux_x86_64/llvm-5810/lib -lgcc --as-needed -lgcc_s --no-as-needed -lpthread -lc -lgcc --as-needed -lgcc_s --no-as-needed /opt/rh/gcc-toolset-12/root/usr/lib/gcc/x86_64-redhat-linux/12/crtendS.o /lib/../lib64/crtn.o
[scamp:$ ./test
Minimum 1.
Claimed minimum 0.
[scamp:$ ./test
Minimum 1.
Claimed minimum 0.
[scamp:$ ./test
Minimum 1.
Claimed minimum 1.
[scamp:$ ./test
Minimum 1.
Claimed minimum 1.
[scamp:$ ./test
Minimum 1.
Claimed minimum 0.
```
In this case, the minimum and claimed minimum should be the same, but I find that we get the wrong answer most of the time - though we do occasionally get the right answer. If I set OMP_NUM_THREADS=1, then I get the correct behavior. I see similar incorrect behavior if I use "-O0" or include no optimization. If I set optimization to "-O1" or higher, then I get the correct behavior.
I tested this with gfortran (14.1), and it similarly has problems with this code at all optimization levels - but I consider this a gfortran issue. NVHPC 25.7 has no issues getting the right answer with this pattern.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs