https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83017

--- Comment #12 from Dominique d'Humieres <dominiq at lps dot ens.fr> ---
> Please use -fopt-info-loop to verify the loop is parallelized. You have
> to use -floop-parallelize-all as well due to the cost model issue. 

If I use the commented loop I get with/without the patch

% gfc -Ofast -ftree-parallelize-loops=4 -fopt-info-loop pr83017_db.f90
pr83017_db.f90:28:0: note: loop with 5 iterations completely unrolled (header
execution count 379)
pr83017_db.f90:26:0: note: loop with 5 iterations completely unrolled (header
execution count 1515)
pr83017_db.f90:38:0: note: loop with 5 iterations completely unrolled (header
execution count 1515)
pr83017_db.f90:18:0: note: loop with 4 iterations completely unrolled (header
execution count 379)
pr83017_db.f90:15:0: note: loop with 5 iterations completely unrolled (header
execution count 379)
pr83017_db.f90:47:0: note: parallelizing inner loop 6
pr83017_db.f90:24:0: note: basic block vectorized
pr83017_db.f90:47:0: note: basic block vectorized
% time ./a.out
 PI   2.98875999    
 PI   3.14159274    
4.027u 0.015s 0:01.02 395.0%    0+0k 0+0io 7pf+0w

i.e., a loop is parallelized, and with -floop-parallelize-all

% gfc -Ofast -ftree-parallelize-loops=4 -floop-parallelize-all -fopt-info-loop
pr83017_db.f90
pr83017_db.f90:28:0: note: loop with 5 iterations completely unrolled (header
execution count 379)
pr83017_db.f90:26:0: note: loop with 5 iterations completely unrolled (header
execution count 1515)
pr83017_db.f90:38:0: note: loop with 5 iterations completely unrolled (header
execution count 1515)
pr83017_db.f90:18:0: note: loop with 4 iterations completely unrolled (header
execution count 379)
pr83017_db.f90:15:0: note: loop with 5 iterations completely unrolled (header
execution count 379)
pr83017_db.f90:26:0: note: parallelizing outer loop 3
pr83017_db.f90:24:0: note: basic block vectorized
% time ./a.out 
 PI   2.98876095    
 PI   3.14159274    
4.152u 0.011s 0:04.16 100.0%    0+0k 0+0io 0pf+0w

i.e., the report says the loop is parallelized, but this is not reflected at
run time (for the original test as well).

Reply via email to