On 05/16/2011 05:00 PM, Pat Haugen wrote:
I'm seeing some odd behavior in ira for PowerPC, starting with the big
ira merge best I can tell (r171649).
void foo(float *f1, float*f2) {
*f1 = *f2;
}
If I compile with gcc -S -m64 -O3 -mcpu=power7 and look at the ira
dump, I see that the pseudo used to copy the data, r120, is spilled.
Reload comes along and fixes up this simple example so we end up with
just a load/store for the copy, but spilling when we have plenty of
available registers is obviously wrong.
Portion of the ira dump:
Pass 0 for finding pseudo/allocno costs
r120 costs: BASE_REGS:0 GENERAL_REGS:0 FLOAT_REGS:0 VSX_REGS:2000000
NON_SPECIAL_REGS:16000 LINK_REGS:4000 CTR_REGS:4000
LINK_OR_CTR_REGS:4000 SPECIAL_REGS:4000 SPEC_OR_GEN_REGS:4000
NON_FLOAT_REGS:2000000 ALL_REGS:2000000 MEM:8000
Pass 1 for finding pseudo/allocno costs
r122: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
r121: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
r120: preferred SPEC_OR_GEN_REGS, alternative NO_REGS, allocno
SPEC_OR_GEN_REGS
r119: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
r118: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
r117: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
r116: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
r115: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
r114: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
r120 costs: VSX_REGS:2000000 NON_SPECIAL_REGS:16000 LINK_REGS:4000
CTR_REGS:4000 LINK_OR_CTR_REGS:4000 SPECIAL_REGS:4000
SPEC_OR_GEN_REGS:4000 NON_FLOAT_REGS:2000000 ALL_REGS:2000000 MEM:8000
Building IRA IR
starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
init_insns for 120: (insn_list:REG_DEP_TRUE 8 (nil))
Pass 1 for finding pseudo/allocno costs
r120: preferred NO_REGS, alternative NO_REGS, allocno NO_REGS
a0 (r120,l0) best NO_REGS, allocno NO_REGS
a0(r120,l0) costs: FLOAT_REGS:16000,16000 VSX_REGS:2000000,2000000
NON_SPECIAL_REGS:16000,16000 SPEC_OR_GEN_REGS:4000,4000
NON_FLOAT_REGS:2000000,2000000 ALL_REGS:2000000,2000000 MEM:0,0
...
**** Allocnos coloring:
Loop 0 (parent -1, header bb0, depth 0)
bbs: 2
all: 0r120
modified regnos: 120
border:
Pressure: NON_FLOAT_REGS=2
Hard reg set forest:
0:( 0 3-12 14-63 65 66 68-72 74 75 77-108)@0
Spill a0(r120,l0)
Disposition:
0:r120 l0 mem
Things start to go wrong during the first pass of
find_costs_and_classes, while walking the list of cost_classes to find
the best. If two classes have the same cost (such as GENERAL_REGS and
FLOAT_REGS in this example) the following portion of code grabs a
union of them.
else if (i_costs[k] == best_cost)
best = ira_reg_class_subunion[best][rclass];
In this case that class is NON_SPECIAL_REGS, which has a cost greater
than both due to the fact that move cost for GPR<->FPR needs to go
through memory, and may_move_[in|out]_cost use maximal cost when
computing cost such as NON_SPECIAL<->[GENERAL|FLOAT]. Picking
NON_SPECIAL for the best class during the first iteration then affects
subsequent iterations until it's decided that memory is best.
The following change fixes the problem by not updating the best_cost
if the union has a greater cost. Is this the correct approach or is
there more to it than this?
Thanks for pointing this out, Pat. Your patch could fix this particular
problem but using GENERAL_REGS only is wrong. The final allocno class
should be NON_SPECIAL_REGS. I will search for a better solution.
Unfortunately, such changes in the code should be benchmarked on a few
major targets. So it will take some time (a weak or two) to fix the
problem.
===================================================================
--- gcc/ira-costs.c (revision 173392)
+++ gcc/ira-costs.c (working copy)
@@ -1697,7 +1697,14 @@ find_costs_and_classes (FILE *dump_file)
best = (enum reg_class) rclass;
}
else if (i_costs[k] == best_cost)
- best = ira_reg_class_subunion[best][rclass];
+ {
+ enum reg_class temp_class;
+ temp_class = ira_reg_class_subunion[best][rclass];
+ if (cost_classes_ptr->index[temp_class] != -1
+ && i_costs[cost_classes_ptr->index[temp_class]]
+ <= best_cost)
+ best = temp_class;
+ }
if (pass == flag_expensive_optimizations
&& i_costs[k] < i_mem_cost
&& (reg_class_size[reg_class_subunion[alt_class][rclass]]
One thing I did notice with this change is that we'll now pick
GENERAL_REGS as best on the first pass, which then causes FLOAT_REGS
to be expensive on subsequent passes. Seems like for this example
where GENERAL/FLOAT are equally best, one would be the preferred class
and the other would be the alternative class. But the same thing
happens with compilers prior to the ira merge mentioned above, so
guessing it's a separate issue.
Yes, it is a separate issue. Setting up right alternative/preferred
class is necessary mainly for correct work of reload which for some
reasons is very sensitive to this. The alternative/preferred class is
not used for IRA itself.