When registers are saved in the prologue, there can be stalls if there
are lingering latencies because the values haven't been computed or loaded
yet.  Likewise, when the epilogue restores registers, there will be stalls
if the last one (or one of the last few, depending on latency) is
accessed immediately after the call.

While Iinterprocedural optimization could in principle address this,
you would have to recompile all the libraries to do this, and also
somehow include all the lib1funcs into the IPO framework.  I wouldn't
say that's impossible, but it doesn't seem feasible in the short term,
nor for general use even in the medium to long term.

If we arrange for the registers to be saved first / restored last to be
from a small subset of registers, could the register allocator take
into account of this by not allocating values that need to be set with
lingering latencies before calls / used immediately after a call to these
registers.  Of course it does no good to generally avoid these registers
if you need any call-saved registers, because one register *has* to be saved
first / restored last; if the register allocator fails to use one of the
right set, reorg would have to renumber the registers to find something -
or otherwise save/restore a register first/last which is not supposed to be
in that role.

Reply via email to