------- Comment #5 from burnus at gcc dot gnu dot org 2008-11-29 16:18 ------- (In reply to comment #4) > Timings on x86_64-unknown-linux-gnu: > matmul = 12.840802 s > subroutine without explicit interface: 0.88805580 s > subroutine with explicit interface: 0.87605572 s > inline with sum 2.0721283 s
With -O2 I get: matmul = 10.724670 s subroutine without explicit interface: 7.7324829 s subroutine with explicit interface: 7.8684921 s inline with sum 7.7684860 s Only with I get with -O3 -ffast-math -march=native on AMD64 the following: matmul = 10.656666 s subroutine without explicit interface: 0.91205692 s subroutine with explicit interface: 0.82805157 s inline with sum 2.4521542 s For comparison with ifort ("loop was vectorized" in lines 40, 41, 43): matmul = 2.660166 s subroutine without explicit interface: 0.0000000E+00 s subroutine with explicit interface: 0.0000000E+00 s inline with sum 0.0000000E+00 s and openf95 -O3: matmul = 1.26807904 s (-O2: 28.2537651 s) subroutine without explicit interface: 1.07606697 s (4.07225418) subroutine with explicit interface: 1.05206609 s (4.08025742) inline with sum 0.748046875 s (3.7522316) -- burnus at gcc dot gnu dot org changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |burnus at gcc dot gnu dot | |org http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37131