Tobias Burnus wrote:
I had also a glance at the patch - and it looks reasonable; in
particular, I failed to generate a failing test case.
Actually, the test case is *not* OK.
If one compiles the original test case of the PR (or your
workshare2.f90) with "-O" and looks at "-fdump-tree-original", one finds:
#pragma omp parallel default(shared)
{
{
real(kind=4) __var_1;
{
#pragma omp single
{
__var_1 = __builtin_cosf (b[0])
}
...
#pragma omp for schedule(static) nowait
for (S.1 = 1; S.1 <= 5; S.1 = S.1 + 1)
{
a[S.1 + -1] = a[S.1 + -1] * D.1730 + a[S.1 + -1] *
D.1731;
Thus, __var_1 is a thread-local variable; however, COS() is not executed
in all threads but only in one due to the omp single: "The single
construct specifies that the associated structured block is executed by
only one of the threads in the team" (2.5.3 single Construct, OpenMP 3.1).
Jakub remarks that omp single is what we expand to omp workshare if it
is not simple enough for us.
* * *
With the test case below, the dump looks OK, but the FE optimization
does not combine the two cos() calls - I have no idea why. The dump
looks as:
#pragma omp parallel default(shared)
{
D.1743 = __builtin_cosf (b[0]);
D.1745 = __builtin_cosf (b[0]);
...
#pragma omp for schedule(static) nowait
for (S.2 = 1; S.2 <= 10; S.2 = S.2 + 1)
a[S.2 + D.1750] = a[S.2 + D.1748] * D.1743 +
a[S.2 + D.1749] * D.1745;
Tobias
PS: The test case is:
program workshare
implicit none
real, parameter :: eps = 3e-7
integer :: j
real :: A(10,5), B(5)
B(1) = 3.344
call random_number(a)
!$omp parallel default(shared)
!$omp workshare
forall (j=1:5)
A(:,j) = A(:,j)*cos(B(1))+A(:,j)*cos(B(1))
end forall
!$omp end workshare
!$omp end parallel
print *, A
end program workshare
subroutine parallel_workshare
implicit none
real, parameter :: eps = 3e-7
integer :: j
real :: A(10,5), B(5)
B(1) = 3.344
call random_number(a)
!$omp parallel workshare default(shared)
forall (j=1:5)
A(:,j) = A(:,j)*cos(B(1))+A(:,j)*cos(B(1))
end forall
!$omp end parallel workshare
print *, A
end subroutine parallel_workshare