http://gcc.gnu.org/bugzilla/show_bug.cgi?id=48636
--- Comment #22 from Dominique d'Humieres <dominiq at lps dot ens.fr> 2012-10-16 20:58:58 UTC --- With the patch I see a ~10% slowdown in the Test4 - Lapack 2 (1001x1001) of test_fpu.f90 compared to revision 192449 [macbook] lin/test% time /opt/gcc/gcc4.8c/bin/gfortran -fprotect-parens -Ofast -funroll-loops test_lap.f90 6.742u 0.097s 0:06.87 99.4% 0+0k 0+20io 0pf+0w [macbook] lin/test% a.out Benchmark running, hopefully as only ACTIVE task Test4 - Lapack 2 (1001x1001) inverts 2.6 sec Err= 0.000000000000250 total = 2.6 sec [macbook] lin/test% time gfc -fprotect-parens -Ofast -funroll-all-loops test_lap.f90 9.489u 0.116s 0:09.62 99.6% 0+0k 0+16io 0pf+0w [macbook] lin/test% a.out Benchmark running, hopefully as only ACTIVE task Test4 - Lapack 2 (1001x1001) inverts 2.8 sec Err= 0.000000000000250 total = 2.8 sec This looks similar to what I saw in comment #5. However now dgetri is never inlined while dgetrf is inlined with the patch. Also dtrmv and dscal are inlined with the patch (respectively 20 and 21 occurrences without the patch). The last difference I see is 35 occurrences of dswap with the patch compared to 32 without.