https://gcc.gnu.org/bugzilla/show_bug.cgi?id=80859
--- Comment #10 from Jakub Jelinek <jakub at gcc dot gnu.org> --- (In reply to Thorsten Kurth from comment #7) > Hello Jakub, > > thanks for your comment but I think the parallel for is not racey. Every > thread is working a block of i-indices so that is fine. The dotprod kernel > is actually a kernel from the OpenMP standard documentation and I am sure > that this is not racey. I was not talking about the parallel for, but about the parallel I've cited. Even if you write the same value from all threads, at least pedantically it is racy, even when you might get away with that. Which is why you should assign it just once, e.g. through #pragma omp master or single. > The example with the regions you mentioned I do not see a problem with that > either. By default, everything is shared so the variable is updated by all > the threads/teams with the same value. The omp target I've cited above is by default handled in OpenMP 4.0 as #pragma omp target teams map(tofrom:num_teams) and will work that way, although it is again pedantically racy, multiple teams write the same value. In OpenMP 4.5 it is #pragma omp target teams firstprivate(num_teams) and you will always end up with 1, even if there is accelerator that has say 1024 teams by default. So you really need explicit map(from:num_teams) or similar to get the value back. And to be pedantically correct also assign it only once, e.g. by doing the assignment only if (omp_get_team_num () == 0). > Concerning splitting distribute and parallel: I tried both combinations and > found that they behave the same. But in the end I split it so that I could > comment out the distribute section to see if that makes a performance > difference (and it does). I was just asking why you are doing it, I haven't yet analyzed the code if there is something that could be easily improved.