IRA observation/question

Pat Haugen Mon, 16 May 2011 14:00:55 -0700

I'm seeing some odd behavior in ira for PowerPC, starting with the big ira mergebest I can tell (r171649).


void foo(float *f1, float*f2) {
  *f1 = *f2;
}

If I compile with gcc -S -m64 -O3 -mcpu=power7 and look at the ira dump, I seethat the pseudo used to copy the data, r120, is spilled. Reload comes along andfixes up this simple example so we end up with just a load/store for the copy,but spilling when we have plenty of available registers is obviously wrong.


Portion of the ira dump:


Pass 0 for finding pseudo/allocno costs

r120 costs: BASE_REGS:0 GENERAL_REGS:0 FLOAT_REGS:0 VSX_REGS:2000000NON_SPECIAL_REGS:16000 LINK_REGS:4000 CTR_REGS:4000 LINK_OR_CTR_REGS:4000SPECIAL_REGS:4000 SPEC_OR_GEN_REGS:4000 NON_FLOAT_REGS:2000000 ALL_REGS:2000000MEM:8000



Pass 1 for finding pseudo/allocno costs

    r122: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
    r121: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
    r120: preferred SPEC_OR_GEN_REGS, alternative NO_REGS, allocno 
SPEC_OR_GEN_REGS
    r119: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
    r118: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
    r117: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
    r116: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
    r115: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS
    r114: preferred ALL_REGS, alternative NO_REGS, allocno ALL_REGS

r120 costs: VSX_REGS:2000000 NON_SPECIAL_REGS:16000 LINK_REGS:4000CTR_REGS:4000 LINK_OR_CTR_REGS:4000 SPECIAL_REGS:4000 SPEC_OR_GEN_REGS:4000NON_FLOAT_REGS:2000000 ALL_REGS:2000000 MEM:8000


Building IRA IR
starting the processing of deferred insns
ending the processing of deferred insns
df_analyze called
init_insns for 120: (insn_list:REG_DEP_TRUE 8 (nil))

Pass 1 for finding pseudo/allocno costs

    r120: preferred NO_REGS, alternative NO_REGS, allocno NO_REGS
    a0 (r120,l0) best NO_REGS, allocno NO_REGS

a0(r120,l0) costs: FLOAT_REGS:16000,16000 VSX_REGS:2000000,2000000NON_SPECIAL_REGS:16000,16000 SPEC_OR_GEN_REGS:4000,4000NON_FLOAT_REGS:2000000,2000000 ALL_REGS:2000000,2000000 MEM:0,0


...

**** Allocnos coloring:


  Loop 0 (parent -1, header bb0, depth 0)
    bbs: 2
    all: 0r120
    modified regnos: 120
    border:
    Pressure: NON_FLOAT_REGS=2
    Hard reg set forest:
      0:( 0 3-12 14-63 65 66 68-72 74 75 77-108)@0
      Spill a0(r120,l0)
Disposition:
    0:r120 l0   mem

Things start to go wrong during the first pass of find_costs_and_classes, whilewalking the list of cost_classes to find the best. If two classes have the samecost (such as GENERAL_REGS and FLOAT_REGS in this example) the following portionof code grabs a union of them.


              else if (i_costs[k] == best_cost)
                best = ira_reg_class_subunion[best][rclass];

In this case that class is NON_SPECIAL_REGS, which has a cost greater than bothdue to the fact that move cost for GPR<->FPR needs to go through memory, andmay_move_[in|out]_cost use maximal cost when computing cost such asNON_SPECIAL<->[GENERAL|FLOAT]. Picking NON_SPECIAL for the best class duringthe first iteration then affects subsequent iterations until it's decided thatmemory is best.

The following change fixes the problem by not updating the best_cost if theunion has a greater cost. Is this the correct approach or is there more to itthan this?


===================================================================
--- gcc/ira-costs.c     (revision 173392)
+++ gcc/ira-costs.c     (working copy)
@@ -1697,7 +1697,14 @@ find_costs_and_classes (FILE *dump_file)
                  best = (enum reg_class) rclass;
                }
              else if (i_costs[k] == best_cost)
-               best = ira_reg_class_subunion[best][rclass];
+               {
+                 enum reg_class temp_class;
+                 temp_class = ira_reg_class_subunion[best][rclass];
+                 if (cost_classes_ptr->index[temp_class] != -1
+                     && i_costs[cost_classes_ptr->index[temp_class]]
+                        <= best_cost)
+                   best = temp_class;
+               }
              if (pass == flag_expensive_optimizations
                  && i_costs[k] < i_mem_cost
                  && (reg_class_size[reg_class_subunion[alt_class][rclass]]

One thing I did notice with this change is that we'll now pick GENERAL_REGS asbest on the first pass, which then causes FLOAT_REGS to be expensive onsubsequent passes. Seems like for this example where GENERAL/FLOAT are equallybest, one would be the preferred class and the other would be the alternativeclass. But the same thing happens with compilers prior to the ira mergementioned above, so guessing it's a separate issue.


-Pat

IRA observation/question

Reply via email to