I'm seeing some odd behavior in ira for PowerPC, starting with the big ira merge best I can tell (r171649).

void foo(float *f1, float*f2) {
  *f1 = *f2;
}

If I compile with gcc -S -m64 -O3 -mcpu=power7 and look at the ira dump, I see that the pseudo used to copy the data, r120, is spilled. Reload comes along and fixes up this simple example so we end up with just a load/store for the copy, but spilling when we have plenty of available registers is obviously wrong.

Portion of the ira dump:


Pass 0 for finding pseudo/allocno costs


r120 costs: BASE_REGS:0 GENERAL_REGS:0 FLOAT_REGS:0 VSX_REGS:2000000 NON_SPECIAL_REGS:16000 LINK_REGS:4000 CTR_REGS:4000 LINK_OR_CTR_REGS:4000 SPECIAL_REGS:4000 SPEC_OR_GEN_REGS:4000 NON_FLOAT_REGS:2000000 ALL_REGS:2000000 MEM:8000


Pass 1 for finding pseudo/allocno costs

    r122: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
    r121: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
    r120: preferred SPEC_OR_GEN_REGS, alternative NO_REGS, allocno 
SPEC_OR_GEN_REGS
    r119: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
    r118: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
    r117: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
    r116: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
    r115: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
    r114: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS

r120 costs: VSX_REGS:2000000 NON_SPECIAL_REGS:16000 LINK_REGS:4000 CTR_REGS:4000 LINK_OR_CTR_REGS:4000 SPECIAL_REGS:4000 SPEC_OR_GEN_REGS:4000 NON_FLOAT_REGS:2000000 ALL_REGS:2000000 MEM:8000

Building IRA IR
starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
init_insns for 120: (insn_list:REG_DEP_TRUE 8 (nil))

Pass 1 for finding pseudo/allocno costs

    r120: preferred NO_REGS, alternative NO_REGS, allocno NO_REGS
    a0 (r120,l0) best NO_REGS, allocno NO_REGS

a0(r120,l0) costs: FLOAT_REGS:16000,16000 VSX_REGS:2000000,2000000 NON_SPECIAL_REGS:16000,16000 SPEC_OR_GEN_REGS:4000,4000 NON_FLOAT_REGS:2000000,2000000 ALL_REGS:2000000,2000000 MEM:0,0

...

**** Allocnos coloring:


  Loop 0 (parent -1, header bb0, depth 0)
    bbs: 2
    all: 0r120
    modified regnos: 120
    border:
    Pressure: NON_FLOAT_REGS=2
    Hard reg set forest:
      0:( 0 3-12 14-63 65 66 68-72 74 75 77-108)@0
      Spill a0(r120,l0)
Disposition:
    0:r120 l0   mem


Things start to go wrong during the first pass of find_costs_and_classes, while walking the list of cost_classes to find the best. If two classes have the same cost (such as GENERAL_REGS and FLOAT_REGS in this example) the following portion of code grabs a union of them.

              else if (i_costs[k] == best_cost)
                best = ira_reg_class_subunion[best][rclass];

In this case that class is NON_SPECIAL_REGS, which has a cost greater than both due to the fact that move cost for GPR<->FPR needs to go through memory, and may_move_[in|out]_cost use maximal cost when computing cost such as NON_SPECIAL<->[GENERAL|FLOAT]. Picking NON_SPECIAL for the best class during the first iteration then affects subsequent iterations until it's decided that memory is best.

The following change fixes the problem by not updating the best_cost if the union has a greater cost. Is this the correct approach or is there more to it than this?

===================================================================
--- gcc/ira-costs.c     (revision 173392)
+++ gcc/ira-costs.c     (working copy)
@@ -1697,7 +1697,14 @@ find_costs_and_classes (FILE *dump_file)
                  best = (enum reg_class) rclass;
                }
              else if (i_costs[k] == best_cost)
-               best = ira_reg_class_subunion[best][rclass];
+               {
+                 enum reg_class temp_class;
+                 temp_class = ira_reg_class_subunion[best][rclass];
+                 if (cost_classes_ptr->index[temp_class] != -1
+                     && i_costs[cost_classes_ptr->index[temp_class]]
+                        <= best_cost)
+                   best = temp_class;
+               }
              if (pass == flag_expensive_optimizations
                  && i_costs[k] < i_mem_cost
                  && (reg_class_size[reg_class_subunion[alt_class][rclass]]


One thing I did notice with this change is that we'll now pick GENERAL_REGS as best on the first pass, which then causes FLOAT_REGS to be expensive on subsequent passes. Seems like for this example where GENERAL/FLOAT are equally best, one would be the preferred class and the other would be the alternative class. But the same thing happens with compilers prior to the ira merge mentioned above, so guessing it's a separate issue.

-Pat

Reply via email to