I'm seeing some odd behavior in ira for PowerPC, starting with the big ira merge
best I can tell (r171649).
void foo(float *f1, float*f2) {
*f1 = *f2;
}
If I compile with gcc -S -m64 -O3 -mcpu=power7 and look at the ira dump, I see
that the pseudo used to copy the data, r120, is spilled. Reload comes along and
fixes up this simple example so we end up with just a load/store for the copy,
but spilling when we have plenty of available registers is obviously wrong.
Portion of the ira dump:
Pass 0 for finding pseudo/allocno costs
r120 costs: BASE_REGS:0 GENERAL_REGS:0 FLOAT_REGS:0 VSX_REGS:2000000
NON_SPECIAL_REGS:16000 LINK_REGS:4000 CTR_REGS:4000 LINK_OR_CTR_REGS:4000
SPECIAL_REGS:4000 SPEC_OR_GEN_REGS:4000 NON_FLOAT_REGS:2000000 ALL_REGS:2000000
MEM:8000
Pass 1 for finding pseudo/allocno costs
r122: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
r121: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
r120: preferred SPEC_OR_GEN_REGS, alternative NO_REGS, allocno
SPEC_OR_GEN_REGS
r119: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
r118: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
r117: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
r116: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
r115: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
r114: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
r120 costs: VSX_REGS:2000000 NON_SPECIAL_REGS:16000 LINK_REGS:4000
CTR_REGS:4000 LINK_OR_CTR_REGS:4000 SPECIAL_REGS:4000 SPEC_OR_GEN_REGS:4000
NON_FLOAT_REGS:2000000 ALL_REGS:2000000 MEM:8000
Building IRA IR
starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
init_insns for 120: (insn_list:REG_DEP_TRUE 8 (nil))
Pass 1 for finding pseudo/allocno costs
r120: preferred NO_REGS, alternative NO_REGS, allocno NO_REGS
a0 (r120,l0) best NO_REGS, allocno NO_REGS
a0(r120,l0) costs: FLOAT_REGS:16000,16000 VSX_REGS:2000000,2000000
NON_SPECIAL_REGS:16000,16000 SPEC_OR_GEN_REGS:4000,4000
NON_FLOAT_REGS:2000000,2000000 ALL_REGS:2000000,2000000 MEM:0,0
...
**** Allocnos coloring:
Loop 0 (parent -1, header bb0, depth 0)
bbs: 2
all: 0r120
modified regnos: 120
border:
Pressure: NON_FLOAT_REGS=2
Hard reg set forest:
0:( 0 3-12 14-63 65 66 68-72 74 75 77-108)@0
Spill a0(r120,l0)
Disposition:
0:r120 l0 mem
Things start to go wrong during the first pass of find_costs_and_classes, while
walking the list of cost_classes to find the best. If two classes have the same
cost (such as GENERAL_REGS and FLOAT_REGS in this example) the following portion
of code grabs a union of them.
else if (i_costs[k] == best_cost)
best = ira_reg_class_subunion[best][rclass];
In this case that class is NON_SPECIAL_REGS, which has a cost greater than both
due to the fact that move cost for GPR<->FPR needs to go through memory, and
may_move_[in|out]_cost use maximal cost when computing cost such as
NON_SPECIAL<->[GENERAL|FLOAT]. Picking NON_SPECIAL for the best class during
the first iteration then affects subsequent iterations until it's decided that
memory is best.
The following change fixes the problem by not updating the best_cost if the
union has a greater cost. Is this the correct approach or is there more to it
than this?
===================================================================
--- gcc/ira-costs.c (revision 173392)
+++ gcc/ira-costs.c (working copy)
@@ -1697,7 +1697,14 @@ find_costs_and_classes (FILE *dump_file)
best = (enum reg_class) rclass;
}
else if (i_costs[k] == best_cost)
- best = ira_reg_class_subunion[best][rclass];
+ {
+ enum reg_class temp_class;
+ temp_class = ira_reg_class_subunion[best][rclass];
+ if (cost_classes_ptr->index[temp_class] != -1
+ && i_costs[cost_classes_ptr->index[temp_class]]
+ <= best_cost)
+ best = temp_class;
+ }
if (pass == flag_expensive_optimizations
&& i_costs[k] < i_mem_cost
&& (reg_class_size[reg_class_subunion[alt_class][rclass]]
One thing I did notice with this change is that we'll now pick GENERAL_REGS as
best on the first pass, which then causes FLOAT_REGS to be expensive on
subsequent passes. Seems like for this example where GENERAL/FLOAT are equally
best, one would be the preferred class and the other would be the alternative
class. But the same thing happens with compilers prior to the ira merge
mentioned above, so guessing it's a separate issue.
-Pat