Hi! On Thu, Mar 28, 2019 at 12:10:06AM +1030, Alan Modra wrote: > This patch makes a number of corrections to rs6000_register_move_cost, > adds a new register union class, GEN_OR_VSX_REGS, and adjusts insn > alternative to suit.
Cool beans. [ snip various great explanations, thanks! ] > TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS is needed for rs6000 in order > to fix the 20% cactus_adm spec regression when using GEN_OR_VSX_REGS > as an allocno class. It is similar to the aarch64 version but without > any selection by regno mode if the best class is a union class. It would be nice if we could do without such hacks. Alas. > OK assuming Pat's latest spec test run doesn't contain any nasty > surprises? Not for stage 4, no. Sorry. But it should be great in stage 1 :-) > diff --git a/gcc/config/rs6000/darwin.h b/gcc/config/rs6000/darwin.h > index 9fb36e41e7d..20c59f89c8f 100644 > --- a/gcc/config/rs6000/darwin.h > +++ b/gcc/config/rs6000/darwin.h > @@ -346,7 +346,8 @@ extern int darwin_emit_branch_islands; > && reg_class_subset_p (BASE_REGS, (CLASS))) \ > ? BASE_REGS \ > : (GET_MODE_CLASS (GET_MODE (X)) == MODE_INT \ > - && (CLASS) == GEN_OR_FLOAT_REGS) \ > + && ((CLASS) == GEN_OR_FLOAT_REGS \ > + || (CLASS) == GEN_OR_VSX_REGS)) \ > ? GENERAL_REGS \ > : (CLASS)) Darwin doesn't do VSX at all... But maybe there is something that can get allocated to both FPRs and VRs, sure. And GPRs. This PREFERRED_RELOAD_CLASS, but it makes me worried if there are places where we should care about this change for correctness. > @@ -20236,7 +20239,8 @@ rs6000_preferred_reload_class (rtx x, enum reg_class > rclass) > return NO_REGS; > } > > - if (GET_MODE_CLASS (mode) == MODE_INT && rclass == GEN_OR_FLOAT_REGS) > + if (GET_MODE_CLASS (mode) == MODE_INT > + && (rclass == GEN_OR_FLOAT_REGS || rclass == GEN_OR_VSX_REGS)) > return GENERAL_REGS; Maybe do this whenever rclass contains the GPRs? > @@ -34966,22 +34970,56 @@ rs6000_register_move_cost (machine_mode mode, > reg_class_t from, reg_class_t to) > { > int ret; > + reg_class_t rclass; > > if (TARGET_DEBUG_COST) > dbg_cost_ctrl++; > > + /* If we have VSX, we can easily move between FPR or Altivec registers, > + otherwise we can only easily move within classes. > + Do this first so we give best-case answers for union classes > + containing both gprs and vsx regs. */ > + HARD_REG_SET to_vsx, from_vsx; > + COPY_HARD_REG_SET (to_vsx, reg_class_contents[to]); > + AND_HARD_REG_SET (to_vsx, reg_class_contents[VSX_REGS]); > + COPY_HARD_REG_SET (from_vsx, reg_class_contents[from]); > + AND_HARD_REG_SET (from_vsx, reg_class_contents[VSX_REGS]); This is a bit expensive to run at every call of rs6000_register_move_cost. Can it be precomputed easily? > + { > + if (TARGET_DIRECT_MOVE > + && (rs6000_tune == PROCESSOR_POWER8 > + || rs6000_tune == PROCESSOR_POWER9)) TARGET_DIRECT_MOVE is always on for these CPUs. Should this also use the m*vsr* cost with say -mcpu=power7 -mtune=power9? If so you can simplify this condition. Or maybe it give problems elsewhere? It's not really worth spending to much time on, separate -mtune isn't used much at all. > +static reg_class_t > +rs6000_ira_change_pseudo_allocno_class (int regno ATTRIBUTE_UNUSED, > + reg_class_t allocno_class, > + reg_class_t best_class) > +{ > + switch (allocno_class) > + { > + default: > + break; Please put default cases at the end. > + gcc_checking_assert (best_class == GEN_OR_VSX_REGS > + || best_class == GEN_OR_FLOAT_REGS > + || best_class == VSX_REGS > + || best_class == ALTIVEC_REGS > + || best_class == FLOAT_REGS > + || best_class == GENERAL_REGS > + || best_class == BASE_REGS); This list makes me think... Should there be a GEN_OR_ALTIVEC as well? Segher