https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83064
--- Comment #7 from Dominique d'Humieres <dominiq at lps dot ens.fr> --- > I looked at the IL from the Fortran FE and it clearly uses a single memory > area for tmp for each outer loop iteration. That is, the memory is allocated > by the caller. I confirm that using pik = compute( low(i), high(i) ) pi(i) = sum(pik) gives the right result. Does it means that the 'sum' in 'sum(compute( low(i), high(i) ))' is not part of the parallelization? > > Do you understand why the code is not parallelized with > > -ftree-parallelize-loops=4? > Because the outer loop has four iterations and we statically require > at least two per thread for outer loops. Why is it so? and is it documented?