On 11/19/09 08:40, Ian Bolton wrote:
Jeff Law wrote:
On 11/16/09 10:33, Ian Bolton wrote:
The question is: how to fix this? I have initially put the
REG_ALLOC_ORDER
back to how it was and changed the operand constraints in our MD
file,
so each of the "apathetic" instructions, will either take a 't'
(TOP_CREG)
or '?b' (BOTTOM_REG). The '?' shows that this alternative is
slightly more
costly than using 't'. On the benchmark that benefitted the most
from
the new REG_ALLOC_ORDER, these constraints are almost achieving the
same
thing. It is only "almost" there because I am struggling with how to
show
two alternatives for loads and stores, which currently have an 'm'
constraint.
I'm not aware of any way to describe this to IRA. In theory I guess
IRA
could be twiddled to use TARGET_ADDRESS_COST to derive some kind of
cost difference based on the registers used in the MEM, but it seems
rather hackish.
I found somewhere in record_address_reg to achieve what I needed:
for (k = 0; k< cost_classes_num; k++)
{
i = cost_classes[k];
pp->cost[k]
+= (ira_get_may_move_cost (Pmode, i, rclass, true) * scale) / 2;
/* Slightly nudge memory addresses away from using BOTTOM_REGS and
C_REGS, so they take TOP_CREGS instead - should this pseudo later
need BOTTOM_REGS, there will be a higher cost to use TOP_CREGS
and it will still get BOTTOM_REGS. This is equivalent to adding a
?b on each instruction that currently has a 'm' constraint.
Writing this generically might look something like:
pp->cost[k] += TARGET_ADDRESS_EXTRA_COST_P(cost_classes[k])
? (scale/2) : 0;
*/
if (cost_classes[k] == BOTTOM_REGS || cost_classes[k] == C_REGS)
pp->cost[k] += (scale) / 2;
}
Right. My point was that I wasn't aware of an _existing_ way to handle
this. I'm seeing similar issues with costing model problems on x86_64
where it is _sometimes_ more costly to use the new GPRs.
I don't have a good handle on how to model this for x86_64 right now
(and that problem is several deep on my queue).
I was then able to alter all our register-agnostic instructions in our
.md file to take either a 't' for TOP_CREGS for a '?b' for BOTTOM_REGS.
Sounds right.
Initial results showed that IRA was moving input arguments out of their
BOTTOM_REGS (e.g. $c1) into TOP_CREGS to do work on them, since it
thought TOP_CREGS were less costly to use, despite the cost of the move
instruction to get the input argument into a TOP_CREG.
That may indicate a cost scaling issue or more general weaknesses in
IRA's cost modeling.
I addressed this problem by splitting my register bank a little
differently: instead of making a distinction between BOTTOM_REGS and
TOP_CREGS, I made it so there was only a penalty if you used one of the
non-argument BOTTOM_REGS (i.e. a callee-save BOTTOM_REG). This meant
that IRA was happy to leave input arguments in their BOTTOM_REGS but
erred towards using TOP_CREGS once the caller-save BOTTOM_REGS had run
out. This was an improvement, but there was still a case where these
'?' penalties were not aligned with reality:
T1 = A + B; // can use any register, TOP_CREGS appears cheaper
T2 = A - C; // can use any register, TOP_CREGS appears cheaper
T3 = A& D; // must use BOTTOM_REGS
The constraints for the first two instructions show that TOP_CREGS is
cheaper, but then you have to plant a move to get A into a BOTTOM_REG
to do the AND; in reality, we know it cheaper to have A in a BOTTOM_REG
all along, but the '?' constraint suggests there is a cost in doing this
for the ADD and SUB and so IRA will put A in a TOP_CREG at first and
incur the cost of the move because it is still cheaper than the costs I
have defined in with my constraints. I don't believe there is a way to
communicate a conditional cost, so I'm thinking that constraints are not
the solution for me at this time. What are your thoughts?
See above. This might be a problem with scaling or weaknesses in IRA's
costing model. In theory IRA accumulates the cost of using each class
over the set of insns using a particular pseudo.
You might try something like this:
1. Crank up the callee-saved register cost adjustment in
assign_hard_reg so that it's scaled based on REG_FREQ. That will
probably lead to some regressions based on my experiments.
2. Then add a check that completely avoids the cost adjustment in
cases where we pushed a MAY_SPILL_P allocno. This was on my todo list,
but I haven't got to it yet.
If you wanted to get fancy, you could track the maximum number of
neighbors in each class as allocnos are pushed and use that to adjust
how many registers are cost adjusted in assign_hard_reg. The idea
being
the more neighbors the allocno has, the more callee-saved regsiters
we're likely to need.
You could also try to account for the fact that once allocated, the
callee saved regsiter is available "for free" to non-conflicting
allocnos. So you could iterate over those and decrease the penalty
for
using a callee saved register on the current allocno. Given the
interfaces provided by IRA, this could be compile-time expensive.
Your "small improvement to IRA allocations" patch posted yesterday
achieves something very similar to this, right?
Not really. That small improvement is a well known heuristic to make it
more likely that a node with a high degree can be colored by encouraging
the use of the same color for conflicting nodes which do not conflict
with each other.
It may help in some of your cases, but they really aren't the motivation
behind the change.
In the case of our BOTTOM_REGS and TOP_CREGS split, whether this pays
off depends on which allocno is colored first.
Absolutely. Note that we use a tiny adjustment to costing for this
heuristic so that, in general, this heuristic only applies when when
other costing heuristics don't result in a particular register preference.
[ ... ]
Obviously, this will make things even more expensive for compilation, but
I'm not sure I can get what I require without some kind of "look-ahead"
mechanism like in the two ideas above. Unless I simply change the
coloring algorithm so that we always color those that need BOTTOM_REGS
first? I think this is effectively what I was achieving with my early
hacks involving the cost adjustments of 6 and 20.
Well, always coloring BOTTOM_REGs first simply means that you're
bypassing the costing metrics rather than fixing or enhancing them.
The class of problems you're running into aren't unique to your
processor and thus I'd rather see you helping with fixing the costing
heuristics which will help just about everyone.
It's also worth remembering that allocation is based on heuristics,
which often don't lead to perfect allocation, merely a good one.
Jeff