Richard Henderson wrote:
I do not see the point why you should discourage the register allocator
from using mmx registers, move through memory is clearly inefficent and
enlarges resulting code (if the function containing moves is inlined in
several places, even more so).
First, what you think is "clearly inefficient" is at least two cycles
faster, at least for AMD (Intel hasn't published anything as useful as
instruction latencies since early PentiumPro). I'm not sure what sort
of pipeline bypasses are or are not responsible, but *all* cross function
unit moves are discouraged.
I see, i did a speed test with GCC 4.1.0 and 3.4.4 on my athlon-xp and
you are right. Direct moves between genregs and MMX are still useful
when optimizing for size. GCC could do a bit more sophisticated guesses
whether to use secondary memory for such moves or not, right now it just
disables them all.
Second, proper use of MMX requires proper placement of emms instructions.
Allowing the register allocator to use MMX registers at will breaks that.
r~
That is true, but would register allocator choose MMX regs for anything
else than MMX ops ?
Regards,
Vahur