https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83017
--- Comment #7 from Richard Biener <rguenth at gcc dot gnu.org> --- If I "fix" GCC to consider the loop you annotate parallel: do concurrent (i = 1:nsplit) pi(i) = sum(compute( low(i), high(i) )) end do then we arrive at computing 4 iterations of that loop and with 2 threads and MIN_PER_THREAD 100 (arbitrary define) we run into if (!flag_loop_parallelize_all && !oacc_kernels_p && ((estimated != -1 && estimated <= (HOST_WIDE_INT) n_threads * MIN_PER_THREAD) /* Do not bother with loops in cold areas. */ || optimize_loop_nest_for_size_p (loop))) continue; (estimated is 4). With -floop-parallelize-all I then get: > ./f951 -quiet t.f90 -Ofast -ftree-parallelize-loops=2 > -fdump-tree-parloops-details -floop-parallelize-all -fopt-info-loop t.f90:28:0: note: loop with 5 iterations completely unrolled (header execution count 375) t.f90:26:0: note: loop with 5 iterations completely unrolled (header execution count 1500) t.f90:38:0: note: loop with 5 iterations completely unrolled (header execution count 1500) t.f90:18:0: note: loop with 4 iterations completely unrolled (header execution count 375) t.f90:15:0: note: loop with 5 iterations completely unrolled (header execution count 375) t.f90:26:0: note: parallelizing outer loop 3 t.f90:24:0: note: basic block vectorized t.f90:41:0: note: basic block vectorized t.f90:41:0: note: basic block vectorized yay.