https://gcc.gnu.org/bugzilla/show_bug.cgi?id=51119
--- Comment #33 from Jerry DeLisle <jvdelisle at gcc dot gnu.org> --- With #pragma GCC optimize ( "-O3" ) $ gfc -static -O2 -finline-matmul-limit=0 compare.f90 $ ./a.out ========================================================= ================ MEASURED GIGAFLOPS = ========================================================= Matmul Matmul fixed Matmul variable Size Loops explicit refMatmul assumed explicit ========================================================= 2 2000 0.055 0.051 0.038 0.056 4 2000 0.408 0.274 0.318 0.408 8 2000 0.644 0.711 1.287 1.831 16 2000 2.507 2.591 2.521 2.579 32 2000 3.573 2.300 3.506 3.573 64 2000 4.628 2.196 4.462 4.629 128 2000 5.030 2.393 5.304 5.054 256 477 4.802 2.367 5.573 4.854 512 59 3.907 1.856 5.234 4.035 1024 7 3.891 1.178 5.222 4.022 2048 1 3.901 1.500 5.238 4.033 and with no #pragma it is better than the -O3 version $ gfc -static -O2 -finline-matmul-limit=0 compare.f90 $ ./a.out ========================================================= ================ MEASURED GIGAFLOPS = ========================================================= Matmul Matmul fixed Matmul variable Size Loops explicit refMatmul assumed explicit ========================================================= 2 2000 0.054 0.052 0.043 0.057 4 2000 0.397 0.281 0.316 0.414 8 2000 0.691 0.773 1.831 1.995 16 2000 2.493 2.691 2.521 2.512 32 2000 3.629 2.301 3.623 3.572 64 2000 4.557 2.072 4.568 4.468 128 2000 5.282 2.387 5.291 5.284 256 477 5.629 2.369 5.620 5.605 512 59 5.215 1.874 5.240 5.216 1024 7 5.212 1.174 5.217 5.217 2048 1 5.230 1.499 5.234 5.229 Still a good improvement over gfortran6 on the larger matrices.