On Thu, Mar 28, 2019 at 01:08:55PM -0500, Segher Boessenkool wrote:
> > TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS is needed for rs6000 in order
> > to fix the 20% cactus_adm spec regression when using GEN_OR_VSX_REGS
> > as an allocno class. It is similar to the aarch64 version but without
> > any selection by regno mode if the best class is a union class.
>
> It would be nice if we could do without such hacks. Alas.
Yeah. We have some serious problems with register pressure
calculations in sched1 and ira. This is merely a workaround for the
regression. I intend to poke a little more at the scheduler to see
whether I can figure out a proper fix, but for now this is the best I
have.
> > OK assuming Pat's latest spec test run doesn't contain any nasty
> > surprises?
>
> Not for stage 4, no. Sorry. But it should be great in stage 1 :-)
Good. I'm happy to leave this until next stage 1.
> > diff --git a/gcc/config/rs6000/darwin.h b/gcc/config/rs6000/darwin.h
> > index 9fb36e41e7d..20c59f89c8f 100644
> > --- a/gcc/config/rs6000/darwin.h
> > +++ b/gcc/config/rs6000/darwin.h
> > @@ -346,7 +346,8 @@ extern int darwin_emit_branch_islands;
> > && reg_class_subset_p (BASE_REGS, (CLASS))) \
> > ? BASE_REGS \
> > : (GET_MODE_CLASS (GET_MODE (X)) == MODE_INT \
> > - && (CLASS) == GEN_OR_FLOAT_REGS) \
> > + && ((CLASS) == GEN_OR_FLOAT_REGS \
> > + || (CLASS) == GEN_OR_VSX_REGS)) \
> > ? GENERAL_REGS \
> > : (CLASS))
>
> Darwin doesn't do VSX at all... But maybe there is something that can get
> allocated to both FPRs and VRs, sure. And GPRs.
>
> This PREFERRED_RELOAD_CLASS, but it makes me worried if there are places
> where we should care about this change for correctness.
For sure there are other places. A new union register class trips all
those places where union classes fail. For example,
ira.c:setup_class_translate_array doesn't give useful answers for
GEN_OR_VSX_REGS, or GEN_OR_FLOAT_REGS for that matter. That makes
uses of ira_allocno_class_translate and ira_pressure_class_translate
for pseudos suspect whenever one of the union classes is involved.
rs6000_ira_change_pseudo_allocno_class helps of course, in that
GEN_OR_VSX_REGS mostly won't be used.
> > @@ -20236,7 +20239,8 @@ rs6000_preferred_reload_class (rtx x, enum
> > reg_class rclass)
> > return NO_REGS;
> > }
> >
> > - if (GET_MODE_CLASS (mode) == MODE_INT && rclass == GEN_OR_FLOAT_REGS)
> > + if (GET_MODE_CLASS (mode) == MODE_INT
> > + && (rclass == GEN_OR_FLOAT_REGS || rclass == GEN_OR_VSX_REGS))
> > return GENERAL_REGS;
>
> Maybe do this whenever rclass contains the GPRs?
Well, you caught me out here. Like the darwin.h change I made this
change early on in the process of fixing register_move_cost, on the
grounds that whatever we did for GEN_OR_FLOAT_REGS probably should be
done for GEN_OR_VSX_REGS. That's not a really good reason
particularly since this code is so old (git a99459e46d, svn r35162).
Maybe it's just a reload hack?
So you might be better questioning the need for this change at all.
I'll see how the test results look if I remove it. Int modes can live
in VSX regs, after all.
> > @@ -34966,22 +34970,56 @@ rs6000_register_move_cost (machine_mode mode,
> > reg_class_t from, reg_class_t to)
> > {
> > int ret;
> > + reg_class_t rclass;
> >
> > if (TARGET_DEBUG_COST)
> > dbg_cost_ctrl++;
> >
> > + /* If we have VSX, we can easily move between FPR or Altivec registers,
> > + otherwise we can only easily move within classes.
> > + Do this first so we give best-case answers for union classes
> > + containing both gprs and vsx regs. */
> > + HARD_REG_SET to_vsx, from_vsx;
> > + COPY_HARD_REG_SET (to_vsx, reg_class_contents[to]);
> > + AND_HARD_REG_SET (to_vsx, reg_class_contents[VSX_REGS]);
> > + COPY_HARD_REG_SET (from_vsx, reg_class_contents[from]);
> > + AND_HARD_REG_SET (from_vsx, reg_class_contents[VSX_REGS]);
>
> This is a bit expensive to run at every call of rs6000_register_move_cost.
> Can it be precomputed easily?
register_move_cost tends to be cached by callers of this function, so
I'm inclined to go for simple and correct rather than fast and
complicated.
> > + {
> > + if (TARGET_DIRECT_MOVE
> > + && (rs6000_tune == PROCESSOR_POWER8
> > + || rs6000_tune == PROCESSOR_POWER9))
>
> TARGET_DIRECT_MOVE is always on for these CPUs. Should this also use the
> m*vsr* cost with say -mcpu=power7 -mtune=power9?
No, because if we don't generate m*vsr*, and we shouldn't, then that
would be telling a lie.
> This list makes me think... Should there be a GEN_OR_ALTIVEC as well?
That might make sense for pre-vsx processors, if you can find a
testcase where the GENERAL_REGS cost is the same as ALTIVEC_REGS cost.
Extending rs6000_ira_change_pseudo_allocno_class to handle
GEN_OR_FLOAT_REGS and VSX_REGS for !TARGET_VSX might be an idea too.
Where the cost for moves between float and altivec is low, you'll find
that VSX_REGS is always used as the allocno class whenever
ALTIVEC_REGS is preferred. So for power7 and later I expect
GEN_OR_ALTIVEC_REGS wouldn't ever be used as an allocno class.
--
Alan Modra
Australia Development Lab, IBM