https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80859

--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> ---
(In reply to Thorsten Kurth from comment #7)
> Hello Jakub,
> 
> thanks for your comment but I think the parallel for is not racey. Every
> thread is working a block of i-indices so that is fine. The dotprod kernel
> is actually a kernel from the OpenMP standard documentation and I am sure
> that this is not racey. 

I was not talking about the parallel for, but about the parallel I've cited.
Even if you write the same value from all threads, at least pedantically it is
racy, even when you might get away with that.  Which is why you should assign
it just once, e.g. through #pragma omp master or single.

> The example with the regions you mentioned I do not see a problem with that
> either. By default, everything is shared so the variable is updated by all
> the threads/teams with the same value. 

The omp target I've cited above is by default handled in OpenMP 4.0 as
#pragma omp target teams map(tofrom:num_teams)
and will work that way, although it is again pedantically racy, multiple teams
write the same value.
In OpenMP 4.5 it is
#pragma omp target teams firstprivate(num_teams)
and you will always end up with 1, even if there is accelerator that has say
1024 teams by default.  So you really need explicit map(from:num_teams) or
similar to get the value back.  And to be pedantically correct also
assign it only once, e.g. by doing the assignment only if (omp_get_team_num ()
== 0).

> Concerning splitting distribute and parallel: I tried both combinations and
> found that they behave the same. But in the end I split it so that I could
> comment out the distribute section to see if that makes a performance
> difference (and it does).

I was just asking why you are doing it, I haven't yet analyzed the code if
there is something that could be easily improved.

Reply via email to