On Thu, Mar 28, 2019 at 01:08:55PM -0500, Segher Boessenkool wrote: > > TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS is needed for rs6000 in order > > to fix the 20% cactus_adm spec regression when using GEN_OR_VSX_REGS > > as an allocno class. It is similar to the aarch64 version but without > > any selection by regno mode if the best class is a union class. > > It would be nice if we could do without such hacks. Alas.
Yeah. We have some serious problems with register pressure calculations in sched1 and ira. This is merely a workaround for the regression. I intend to poke a little more at the scheduler to see whether I can figure out a proper fix, but for now this is the best I have. > > OK assuming Pat's latest spec test run doesn't contain any nasty > > surprises? > > Not for stage 4, no. Sorry. But it should be great in stage 1 :-) Good. I'm happy to leave this until next stage 1. > > diff --git a/gcc/config/rs6000/darwin.h b/gcc/config/rs6000/darwin.h > > index 9fb36e41e7d..20c59f89c8f 100644 > > --- a/gcc/config/rs6000/darwin.h > > +++ b/gcc/config/rs6000/darwin.h > > @@ -346,7 +346,8 @@ extern int darwin_emit_branch_islands; > > && reg_class_subset_p (BASE_REGS, (CLASS))) \ > > ? BASE_REGS \ > > : (GET_MODE_CLASS (GET_MODE (X)) == MODE_INT \ > > - && (CLASS) == GEN_OR_FLOAT_REGS) \ > > + && ((CLASS) == GEN_OR_FLOAT_REGS \ > > + || (CLASS) == GEN_OR_VSX_REGS)) \ > > ? GENERAL_REGS \ > > : (CLASS)) > > Darwin doesn't do VSX at all... But maybe there is something that can get > allocated to both FPRs and VRs, sure. And GPRs. > > This PREFERRED_RELOAD_CLASS, but it makes me worried if there are places > where we should care about this change for correctness. For sure there are other places. A new union register class trips all those places where union classes fail. For example, ira.c:setup_class_translate_array doesn't give useful answers for GEN_OR_VSX_REGS, or GEN_OR_FLOAT_REGS for that matter. That makes uses of ira_allocno_class_translate and ira_pressure_class_translate for pseudos suspect whenever one of the union classes is involved. rs6000_ira_change_pseudo_allocno_class helps of course, in that GEN_OR_VSX_REGS mostly won't be used. > > @@ -20236,7 +20239,8 @@ rs6000_preferred_reload_class (rtx x, enum > > reg_class rclass) > > return NO_REGS; > > } > > > > - if (GET_MODE_CLASS (mode) == MODE_INT && rclass == GEN_OR_FLOAT_REGS) > > + if (GET_MODE_CLASS (mode) == MODE_INT > > + && (rclass == GEN_OR_FLOAT_REGS || rclass == GEN_OR_VSX_REGS)) > > return GENERAL_REGS; > > Maybe do this whenever rclass contains the GPRs? Well, you caught me out here. Like the darwin.h change I made this change early on in the process of fixing register_move_cost, on the grounds that whatever we did for GEN_OR_FLOAT_REGS probably should be done for GEN_OR_VSX_REGS. That's not a really good reason particularly since this code is so old (git a99459e46d, svn r35162). Maybe it's just a reload hack? So you might be better questioning the need for this change at all. I'll see how the test results look if I remove it. Int modes can live in VSX regs, after all. > > @@ -34966,22 +34970,56 @@ rs6000_register_move_cost (machine_mode mode, > > reg_class_t from, reg_class_t to) > > { > > int ret; > > + reg_class_t rclass; > > > > if (TARGET_DEBUG_COST) > > dbg_cost_ctrl++; > > > > + /* If we have VSX, we can easily move between FPR or Altivec registers, > > + otherwise we can only easily move within classes. > > + Do this first so we give best-case answers for union classes > > + containing both gprs and vsx regs. */ > > + HARD_REG_SET to_vsx, from_vsx; > > + COPY_HARD_REG_SET (to_vsx, reg_class_contents[to]); > > + AND_HARD_REG_SET (to_vsx, reg_class_contents[VSX_REGS]); > > + COPY_HARD_REG_SET (from_vsx, reg_class_contents[from]); > > + AND_HARD_REG_SET (from_vsx, reg_class_contents[VSX_REGS]); > > This is a bit expensive to run at every call of rs6000_register_move_cost. > Can it be precomputed easily? register_move_cost tends to be cached by callers of this function, so I'm inclined to go for simple and correct rather than fast and complicated. > > + { > > + if (TARGET_DIRECT_MOVE > > + && (rs6000_tune == PROCESSOR_POWER8 > > + || rs6000_tune == PROCESSOR_POWER9)) > > TARGET_DIRECT_MOVE is always on for these CPUs. Should this also use the > m*vsr* cost with say -mcpu=power7 -mtune=power9? No, because if we don't generate m*vsr*, and we shouldn't, then that would be telling a lie. > This list makes me think... Should there be a GEN_OR_ALTIVEC as well? That might make sense for pre-vsx processors, if you can find a testcase where the GENERAL_REGS cost is the same as ALTIVEC_REGS cost. Extending rs6000_ira_change_pseudo_allocno_class to handle GEN_OR_FLOAT_REGS and VSX_REGS for !TARGET_VSX might be an idea too. Where the cost for moves between float and altivec is low, you'll find that VSX_REGS is always used as the allocno class whenever ALTIVEC_REGS is preferred. So for power7 and later I expect GEN_OR_ALTIVEC_REGS wouldn't ever be used as an allocno class. -- Alan Modra Australia Development Lab, IBM