------- Comment #5 from burnus at gcc dot gnu dot org  2008-11-29 16:18 -------
(In reply to comment #4)
> Timings on x86_64-unknown-linux-gnu:
>  matmul =    12.840802      s
>  subroutine without explicit interface:   0.88805580      s
>  subroutine with explicit interface:   0.87605572      s
>  inline with sum   2.0721283      s

With -O2 I get:
 matmul =    10.724670      s
 subroutine without explicit interface:    7.7324829      s
 subroutine with explicit interface:    7.8684921      s
 inline with sum   7.7684860      s

Only with I get with -O3 -ffast-math -march=native on AMD64 the following:
 matmul =    10.656666      s
 subroutine without explicit interface:   0.91205692      s
 subroutine with explicit interface:   0.82805157      s
 inline with sum   2.4521542      s

For comparison with ifort ("loop was vectorized" in lines 40, 41, 43):
 matmul =    2.660166      s
 subroutine without explicit interface:   0.0000000E+00  s
 subroutine with explicit interface:   0.0000000E+00  s
 inline with sum  0.0000000E+00  s
and openf95 -O3:
 matmul =  1.26807904  s  (-O2: 28.2537651  s)
 subroutine without explicit interface:  1.07606697  s (4.07225418)
 subroutine with explicit interface:  1.05206609  s (4.08025742)
 inline with sum 0.748046875  s (3.7522316)


-- 

burnus at gcc dot gnu dot org changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |burnus at gcc dot gnu dot
                   |                            |org


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=37131

Reply via email to