On Thu, Mar 28, 2019 at 01:08:55PM -0500, Segher Boessenkool wrote:
> > TARGET_IRA_CHANGE_PSEUDO_ALLOCNO_CLASS is needed for rs6000 in order
> > to fix the 20% cactus_adm spec regression when using GEN_OR_VSX_REGS
> > as an allocno class.  It is similar to the aarch64 version but without
> > any selection by regno mode if the best class is a union class.
> 
> It would be nice if we could do without such hacks.  Alas.

Yeah.  We have some serious problems with register pressure
calculations in sched1 and ira.  This is merely a workaround for the
regression.  I intend to poke a little more at the scheduler to see
whether I can figure out a proper fix, but for now this is the best I
have.

> > OK assuming Pat's latest spec test run doesn't contain any nasty
> > surprises?
> 
> Not for stage 4, no.  Sorry.  But it should be great in stage 1 :-)

Good.  I'm happy to leave this until next stage 1.

> > diff --git a/gcc/config/rs6000/darwin.h b/gcc/config/rs6000/darwin.h
> > index 9fb36e41e7d..20c59f89c8f 100644
> > --- a/gcc/config/rs6000/darwin.h
> > +++ b/gcc/config/rs6000/darwin.h
> > @@ -346,7 +346,8 @@ extern int darwin_emit_branch_islands;
> >        && reg_class_subset_p (BASE_REGS, (CLASS)))          \
> >     ? BASE_REGS                                                     \
> >     : (GET_MODE_CLASS (GET_MODE (X)) == MODE_INT                    \
> > -      && (CLASS) == GEN_OR_FLOAT_REGS)                             \
> > +      && ((CLASS) == GEN_OR_FLOAT_REGS                             \
> > +     || (CLASS) == GEN_OR_VSX_REGS))                       \
> >     ? GENERAL_REGS                                          \
> >     : (CLASS))
> 
> Darwin doesn't do VSX at all...  But maybe there is something that can get
> allocated to both FPRs and VRs, sure.  And GPRs.
> 
> This PREFERRED_RELOAD_CLASS, but it makes me worried if there are places
> where we should care about this change for correctness.

For sure there are other places.  A new union register class trips all
those places where union classes fail.  For example,
ira.c:setup_class_translate_array doesn't give useful answers for
GEN_OR_VSX_REGS, or GEN_OR_FLOAT_REGS for that matter.  That makes
uses of ira_allocno_class_translate and ira_pressure_class_translate
for pseudos suspect whenever one of the union classes is involved.

rs6000_ira_change_pseudo_allocno_class helps of course, in that
GEN_OR_VSX_REGS mostly won't be used.

> > @@ -20236,7 +20239,8 @@ rs6000_preferred_reload_class (rtx x, enum 
> > reg_class rclass)
> >        return NO_REGS;
> >      }
> >  
> > -  if (GET_MODE_CLASS (mode) == MODE_INT && rclass == GEN_OR_FLOAT_REGS)
> > +  if (GET_MODE_CLASS (mode) == MODE_INT
> > +      && (rclass == GEN_OR_FLOAT_REGS || rclass == GEN_OR_VSX_REGS))
> >      return GENERAL_REGS;
> 
> Maybe do this whenever rclass contains the GPRs?

Well, you caught me out here.  Like the darwin.h change I made this
change early on in the process of fixing register_move_cost, on the
grounds that whatever we did for GEN_OR_FLOAT_REGS probably should be
done for GEN_OR_VSX_REGS.  That's not a really good reason
particularly since this code is so old (git a99459e46d, svn r35162).
Maybe it's just a reload hack?

So you might be better questioning the need for this change at all.
I'll see how the test results look if I remove it.  Int modes can live
in VSX regs, after all.

> > @@ -34966,22 +34970,56 @@ rs6000_register_move_cost (machine_mode mode,
> >                        reg_class_t from, reg_class_t to)
> >  {
> >    int ret;
> > +  reg_class_t rclass;
> >  
> >    if (TARGET_DEBUG_COST)
> >      dbg_cost_ctrl++;
> >  
> > +  /* If we have VSX, we can easily move between FPR or Altivec registers,
> > +     otherwise we can only easily move within classes.
> > +     Do this first so we give best-case answers for union classes
> > +     containing both gprs and vsx regs.  */
> > +  HARD_REG_SET to_vsx, from_vsx;
> > +  COPY_HARD_REG_SET (to_vsx, reg_class_contents[to]);
> > +  AND_HARD_REG_SET (to_vsx, reg_class_contents[VSX_REGS]);
> > +  COPY_HARD_REG_SET (from_vsx, reg_class_contents[from]);
> > +  AND_HARD_REG_SET (from_vsx, reg_class_contents[VSX_REGS]);
> 
> This is a bit expensive to run at every call of rs6000_register_move_cost.
> Can it be precomputed easily?

register_move_cost tends to be cached by callers of this function, so
I'm inclined to go for simple and correct rather than fast and
complicated.

> > +   {
> > +     if (TARGET_DIRECT_MOVE
> > +         && (rs6000_tune == PROCESSOR_POWER8
> > +             || rs6000_tune == PROCESSOR_POWER9))
> 
> TARGET_DIRECT_MOVE is always on for these CPUs.  Should this also use the
> m*vsr* cost with say -mcpu=power7 -mtune=power9?

No, because if we don't generate m*vsr*, and we shouldn't, then that
would be telling a lie.

> This list makes me think...  Should there be a GEN_OR_ALTIVEC as well?

That might make sense for pre-vsx processors, if you can find a
testcase where the GENERAL_REGS cost is the same as ALTIVEC_REGS cost.
Extending rs6000_ira_change_pseudo_allocno_class to handle
GEN_OR_FLOAT_REGS and VSX_REGS for !TARGET_VSX might be an idea too.

Where the cost for moves between float and altivec is low, you'll find
that VSX_REGS is always used as the allocno class whenever
ALTIVEC_REGS is preferred.  So for power7 and later I expect
GEN_OR_ALTIVEC_REGS wouldn't ever be used as an allocno class.

-- 
Alan Modra
Australia Development Lab, IBM

Reply via email to