Hello!

I just got interested and did a test myself. Comparing gcc 4.0 (-O2
-funroll-loops -D__NO_MATH_INLINES -ffast-math -march=pentium4
-mfpmath=sse -ftree-vectorize)
and icc 9.0 beta (-O3 -xW -ip):


Here are the results of scimark with '-O3 -march=pentium4 -mfpmath=... -funroll-loops -ftree-vectorize -ffast-math -D__NO_MATH_INLINES -fomit-frame-pointer' and various -mfpmath settings:

-mpfmath=sse:

Composite Score:          664.47
FFT             Mflops:   371.12    (N=1024)
SOR             Mflops:   511.13    (100 x 100)
MonteCarlo:     Mflops:   130.94
Sparse matmult  Mflops:   856.68    (N=1000, nz=5000)
LU              Mflops:  1452.48    (M=100, N=100)

-mfpmath=387:

Composite Score:          624.14
FFT             Mflops:   391.09    (N=1024)
SOR             Mflops:   465.45    (100 x 100)
MonteCarlo:     Mflops:   188.38
Sparse matmult  Mflops:   811.59    (N=1000, nz=5000)
LU              Mflops:  1264.20    (M=100, N=100)

-mfpmath=sse,387:

Composite Score:          665.51
FFT             Mflops:   372.70    (N=1024)
SOR             Mflops:   509.78    (100 x 100)
MonteCarlo:     Mflops:   148.72
Sparse matmult  Mflops:   832.20    (N=1000, nz=5000)
LU              Mflops:  1464.16    (M=100, N=100)

I think that the results will be even better once PR18463 (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463) is fixed. The LU benchmark is one of tescases where these problems were found. You can check asm code for sequences like:

   leal    0(,%ecx,8), %edx
   movsd    (%ebx,%edx), %xmm0

instead of:

   movsd   (%ebx,%ecx,8), %xmm0

Uros.



Reply via email to