------- Comment #9 from uros at kss-loka dot si 2006-06-01 08:43 ------- The benchmark run on a Pentium4 3.2G/800MHz FSB (32bit):
vendor_id : GenuineIntel cpu family : 15 model : 2 model name : Intel(R) Pentium(R) 4 CPU 3.20GHz stepping : 9 cpu MHz : 3191.917 cache size : 512 KB shows even more interesting results: gcc version 3.4.6 vs. gcc version 4.2.0 20060601 (experimental) -fomit-frame-pointer -O -msse2 -mfpmath=sse GCC 3.x performance: ./xmm_gcc ALGORITHM NB REPS TIME MFLOPS ========= ===== ===== ========== ========== atlasmm 60 1000 0.162 2664.87 GCC 4.x performance: ./xmm_gc4 ALGORITHM NB REPS TIME MFLOPS ========= ===== ===== ========== ========== atlasmm 60 1000 0.164 2633.13 and -fomit-frame-pointer -O -mfpmath=387 GCC 3.x performance: ./xmm_gcc ALGORITHM NB REPS TIME MFLOPS ========= ===== ===== ========== ========== atlasmm 60 1000 0.160 2697.37 GCC 4.x performance: ./xmm_gc4 ALGORITHM NB REPS TIME MFLOPS ========= ===== ===== ========== ========== atlasmm 60 1000 0.164 2633.15 There is a small performance drop on gcc-4.x, but nothing critical. I can confirm, that code indeed runs >50% slower on 64bit athlon. Perhaps the problem is in the order of instructions (Software Optimization Guide for AMD Athlon 64, Section 10.2). The gcc-3.4 code looks similar to the example, how things should be, and gcc-4.2 code looks similar to the example, how things should _NOT_ be. BTW: Did you try to run the benchmark on AMD target with -march=k8? The effects of this flag are devastating on Pentium4 CPU: -O -msse2 -mfpmath=sse -march=k8 ./xmm_gcc ALGORITHM NB REPS TIME MFLOPS ========= ===== ===== ========== ========== atlasmm 60 1000 0.836 516.79 GCC 4.x performance: ./xmm_gc4 ALGORITHM NB REPS TIME MFLOPS ========= ===== ===== ========== ========== atlasmm 60 1000 0.287 1504.66 -- uros at kss-loka dot si changed: What |Removed |Added ---------------------------------------------------------------------------- Status|UNCONFIRMED |NEW Ever Confirmed|0 |1 Last reconfirmed|0000-00-00 00:00:00 |2006-06-01 08:43:34 date| | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827