------- Additional Comments From uros at kss-loka dot si  2005-01-27 09:07 
-------
I don't think that this has anything to do with regstack and sched2. The fact
is, that for fp-intensive applications, 8 FP regs (either stacked x87 or
non-stack SSE type) is not enough. When there is a shorthage of registers, gcc
starts to swap registers to and from memory.

Please note that reg/reg and reg/mem fops have the same latency/throuhput on P4,
but moving FP registers to and from memory introduces a big performance penalty
and these moves should be minimised as much as possible.

There are some measurements to prove this (-O2 only to avoid fast-math intrinsic
shortcuts, P4-3.2 timings):

a) -march=pentium -mfpmath=387: scheduling and reg-stack interactions:
real    0m34.073s
user    0m33.756s
sys     0m0.018s

b) -march=pentium -msse2 -mfpmath=sse: scheduling and no reg-stack:
real    0m35.063s
user    0m34.674s
sys     0m0.076s

c) -march=pentium4 -mfpmath=387: no scheduling with reg-stack:
real    0m33.720s
user    0m33.348s
sys     0m0.037s

d) -march=pentium4 -mfpmath=sse: no scheduling and no reg-stack:
real    0m35.399s
user    0m35.016s
sys     0m0.035s

The question I would like to ask: is there a functionality in gcc to optimise
register moving, considering the cost of reg/reg vs. reg/mem FP operators and
the cost of register<->mem move?

-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=8126

Reply via email to