Hello!
I just got interested and did a test myself. Comparing gcc 4.0 (-O2Here are the results of scimark with '-O3 -march=pentium4 -mfpmath=... -funroll-loops -ftree-vectorize -ffast-math -D__NO_MATH_INLINES -fomit-frame-pointer' and various -mfpmath settings:
-funroll-loops -D__NO_MATH_INLINES -ffast-math -march=pentium4
-mfpmath=sse -ftree-vectorize)
and icc 9.0 beta (-O3 -xW -ip):
-mpfmath=sse:
Composite Score: 664.47 FFT Mflops: 371.12 (N=1024) SOR Mflops: 511.13 (100 x 100) MonteCarlo: Mflops: 130.94 Sparse matmult Mflops: 856.68 (N=1000, nz=5000) LU Mflops: 1452.48 (M=100, N=100)
-mfpmath=387:
Composite Score: 624.14 FFT Mflops: 391.09 (N=1024) SOR Mflops: 465.45 (100 x 100) MonteCarlo: Mflops: 188.38 Sparse matmult Mflops: 811.59 (N=1000, nz=5000) LU Mflops: 1264.20 (M=100, N=100)
-mfpmath=sse,387:
Composite Score: 665.51 FFT Mflops: 372.70 (N=1024) SOR Mflops: 509.78 (100 x 100) MonteCarlo: Mflops: 148.72 Sparse matmult Mflops: 832.20 (N=1000, nz=5000) LU Mflops: 1464.16 (M=100, N=100)
I think that the results will be even better once PR18463 (http://gcc.gnu.org/bugzilla/show_bug.cgi?id=18463) is fixed. The LU benchmark is one of tescases where these problems were found. You can check asm code for sequences like:
leal 0(,%ecx,8), %edx movsd (%ebx,%edx), %xmm0
instead of:
movsd (%ebx,%ecx,8), %xmm0
Uros.