For GCC, I used in both cases the flags
-march=pentium4 -mfpmath=sse -O3 -fomit-frame-pointer -ffast-math
>
As for gcc4 vs gcc3.4, degradataion on x86 architecture is most probably because of higher register pressure created with more aggressive SSA optimizations in gcc4.

Try these five combinations:

-O2 -fomit-frame-pointer -ffast-math
-O2 -fomit-frame-pointer -ffast-math -fno-tree-pre
-O2 -fomit-frame-pointer -ffast-math -fno-tree-pre -fno-gcse

-O3 -fomit-frame-pointer -ffast-math -fno-tree-pre
-O3 -fomit-frame-pointer -ffast-math -fno-tree-pre -fno-gcse

You may also want to try -mfpmath=sse,387 in case your benchmarks use sin, cos and other trascendental functions that GCC knows about when using 387 instructions.

Paolo

Reply via email to