https://gcc.gnu.org/bugzilla/show_bug.cgi?id=82895
--- Comment #5 from Harald Anlauf <anlauf at gmx dot de> --- (In reply to Arjen Markus from comment #4) > It was precisely this sort of issues that led me to write this small sample > program. And I realised this morning, that indeed the host variables need to > be avoided. The OpenMP standard says that host or use associated variables are shared. For q, this means that different threads would access this variable, i.e. you get a data race. > But does: > - loop_body accesses the host-associated, non-threadprivate variables q, > x, and id, and you get data races > > mean that it is utterly unsafe to access even the shared variables (x and id > in this case)? That would defy the intentions of using a(n internal) routine > altogether. I need to have access to the entire array and the pattern of > iterations avoids data collisions. For your code and for your iteration pattern (different threads will use different j ranges), you could declare q in the main as threadprivate, (now gfortran-7 complains that it should be save'd), and you should pass the private variables i and j to your worker subroutine. Try the following (checked with valgrind): ! chkloop_internal.f90 -- ! I want to know: ! - OpenMP-enabled outer loops with a small number of iterations (but a large body with long loops) ! - Combination with internal routines ! program chkloop_internal use omp_lib implicit none integer, parameter :: noelems = 100000 real, dimension(:), allocatable :: x, y, z integer, dimension(:), allocatable :: id integer :: i, j, noperlayer, nolayers !$omp threadprivate(q) real, save :: q allocate( x(noelems), y(noelems), z(noelems), id(noelems) ) id = 0 call random_number( x ) call random_number( y ) call random_number( z ) nolayers = 10 noperlayer = noelems / nolayers !$omp parallel private(i, j) !$omp do schedule(dynamic) do i = 1,nolayers do j = noperlayer * (i - 1) + 1, noperlayer * i call loop_body (i,j) enddo enddo !$omp end do !$omp end parallel do i = 1,omp_get_max_threads() write(*,*) i, count( id == i-1 ) enddo contains subroutine loop_body (i,j) integer :: i, j ! real :: q ! use local variables if possible q = z(j) x(j) = x(j) + q * y(j) id(j) = omp_get_thread_num() write(80+id(j),*) i,j,x(j),y(j),q end subroutine loop_body end program chkloop_internal