Dave Hudson wrote:
I did some more digging into this over the last couple of days and it
seems that the RA appears to over-emphasize the benefits of
propagating hard regs - in this case one or more hard regs that are
holding incoming function arguments. The RA ends up believing that
the hard reg (a data register here) has the same cost as copying and
using an address register, but this means we end up with reload trying
to clean up the result after the RA has assigned a register that can't
actually be used for a MEM.
OK. Note that the debugging dumps will tell us something about the
coalescing that's going on as well as the contents of the conflict graph
and some cost information.
As an experiment I modified the weighting for a MEM in
record_operand_costs(), multiplying frequency by 33/16 instead of 2
and that shifts the balance sufficiently to get the correct choice.
Obviously thought that's not a solution - it just indicates something
about the problem.
Right. Also note that even with IRA, register allocation is often
imperfect. It's a fundamental nature of the problem. Also note that
aggressive propagation sometimes helps and sometimes hurts code -- it
helps more often than it hurts, but it's sometimes hard to predict how a
particular register coalesce is going to affect the final register
allocation.
Curiously though (and seemingly quite unrelated to what I'm seeing) I
did noticed that if the RA chooses to use a callee-saved register it
only adjusts the cost associated with that register by a very small
amount - the comment in assign_hard_reg() in ira-color.c mentions the
need to increase the cost of the hard reg because of the need to
save/restore in the prologue/epilogue (actually it appears to have
these back-to-front in the comment), but the value is never adjusted
by any frequency, just the sum of the 2 memory move costs - 1.
The small amount is to account for the store in the prologue and the
load in the epilogue that will be necessary if we use a callee-saved
register. It shouldn't be frequency adjusted because the prologue and
epilogue are executed once.
It might help if you could provide some debugging dumps. The .ira
dump in particular contains a wealth of information about the
decisions the allocator is making.
They don't show anything terribly helpful unfortunately because they
don't appear to show any of the costs associated with the hard regs -
I ended up instrumenting the code to work out what I've found so far.
Yea, we could easily be missing hard reg costing information in the
dumps, but it should show coalescing decisions, which may be playing a
role here.
Jeff