Issue |
136357
|
Summary |
[flang][OpenMP] flaky firstprivate/lastprivate behavior due to misplaced barriers
|
Labels |
flang
|
Assignees |
|
Reporter |
eugeneepshteyn
|
(Thanks goes to @ebaskakov for doing the actual investigation.)
The variables in firstprivate/lastprivate don't get set properly in some cases, resulting in flaky behavior.
Consider the following test:
```
implicit none
integer :: i, n
logical :: first
first = .true.
n = 42
!$omp parallel do firstprivate(n) firstprivate(first) lastprivate(n)
do i=1,10
if (first) then
if (n/=42) stop -1*n
first = .false.
end if
n = 100
end do
!$omp end parallel do
if (n/=100) stop -2*n
print *,'passed'
end
```
Using the following flang on x86_64:
```
$ flang --version
flang version 21.0.0git (https://github.com/eugeneepshteyn/llvm-project.git 31ddaef8d18d643ff4c343d03ddfe2edae7d22a2)
Target: x86_64-unknown-linux-gnu
Thread model: posix
Build config: +unoptimized, +assertions
```
... results in the following behavior:
```
$ ./a.out
Fortran STOP: code -100
Fortran STOP: code -100
Fortran STOP: code -100
$ ./a.out
Fortran STOP: code -100
$ ./a.out
Fortran STOP: code -100
$ ./a.out
Fortran STOP: code -100
Fortran STOP: code -100
$ ./a.out
passed
```
It seems that `n` is already set to 100, while it should still have the value of 42.
Looking at LLVM IR output of `flang -fopenmp -g -O0 -S -emit-llvm firstprivate.f90`:
Load `first` and `n` passed as the structure of two fields:
```
define internal void @_QQmain..omp_par(ptr noalias %tid.addr, ptr noalias %zero.addr, ptr %0) #1 !dbg !26 {
omp.par.entry:
%gep_ = getelementptr { ptr, ptr }, ptr %0, i32 0, i32 0
%loadgep_ = load ptr, ptr %gep_, align 8, !align !29
%gep_1 = getelementptr { ptr, ptr }, ptr %0, i32 0, i32 1
%loadgep_2 = load ptr, ptr %gep_1, align 8, !align !29
...
```
Bbarrier:
```
omp.par.region1: ; preds = %omp.par.region
%omp_global_thread_num2 = call i32 @__kmpc_global_thread_num(ptr @4)
call void @__kmpc_barrier(ptr @3, i32 %omp_global_thread_num2)
br label %omp.private.init, !dbg !30
omp.private.init: ; preds = %omp.par.region1
br label %omp.private.copy, !dbg !30
```
... but then the values for `first` and `n` are loaded via `loadgep_*` pointers:
```
omp.private.copy: ; preds = %omp.private.init
%2 = load i32, ptr %loadgep_, align 4, !dbg !31
store i32 %2, ptr %omp.private.alloc, align 4, !dbg !31
%3 = load i32, ptr %loadgep_2, align 4, !dbg !32
store i32 %3, ptr %omp.private.alloc4, align 4, !dbg !32
br label %omp.wsloop.region, !dbg !32
```
Since the load happens after the barrier, some threads could have already set the value of `n` to 100.
It seems that the loads of the values from the passed structure should happen before the barrier.
_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs