Biagio Lucini wrote:
I run for my personal pleasure (since I am a number cruncher) theThanks for reporting this. Although it would be more usefull if you made some analysis what is wrong with gcc. For example, icc reports loop vectorization. Or may be it is a memory heirarchy optimization, or usage of better standard function like random function (I am not familiar well with MC). Usually vectorization is a reason of such big difference. People in gcc community work on vectorization. Although I don't know when it will be used for x86. We have no such resources as Intel has (several hundred engineers mainly working on optimizations only for 3 their architectures).
Scimark2 tests on my P4 Linux machine. I tested GCC 4.0 (today's CVS) vs. GCC 3.4.1 vs. Intel's ICC 8.1
For GCC, I used in both cases the flags -march=pentium4 -mfpmath=sse -O3 -fomit-frame-pointer -ffast-math
Should be of some interest, for ICC I used -ipo -tpp7 -xW -align -Zp16 -O3
The results were surprisingly bad, and this is why I am writing this message:
GCC 4.0 GCC 3.4.1 ICC Composite Score: 270.51 345.28 430.47 FFT Mflops: 192.10 203.77 206.66 SOR Mflops: 257.61 252.88 258.30 MC Mflops: 58.61 67.96 312.13 matmult Mflops: 376.64 557.75 564.97 LU Mflops: 467.58 644.03 810.29
I leave aside any personal comments, except that being involved in Monte Carlo calculations, I would love if GCC were not outperformed by a factor of ~ 4.5 in MC by ICC.
I also would like to ask whether you see anything wrong with those benchmarks and/or you have suggestions to improve them.
As for gcc4 vs gcc3.4, degradataion on x86 architecture is most probably because of higher register pressure created with more aggressive SSA optimizations in gcc4. The current register allocator does not deal well with such problem. So code generated by gcc4 can be worse for architectures with few registers. For architectures with many registers (like ia64), gcc4 generates a better code than gcc3.4. Again gcc community works on the register allocator problem too.
Vlad