Hi!

On Tue, Jun 18, 2019 at 01:53:36PM -0400, Michael Meissner wrote:
> On Tue, Jun 18, 2019 at 06:37:54AM -0500, Segher Boessenkool wrote:
> > On Mon, Jun 17, 2019 at 05:24:37PM -0400, Michael Meissner wrote:
> > > I wrote the code to generate LFIWAX and LFIWZX originally for the power7 
> > > in the
> > > 2010 time frame.  At the time, we did not allow SImode to go into floating
> > > point and vector registers.  As part of the power9 work, we now allow 
> > > SImode to
> > > go into FP/vector registers with for 64-bit code targetting -mcpu=power8 
> > > or
> > > higher.  But we never went back and tweaked the LFIWAX/LFIWZX support.
> > 
> > Why do we allow it only in 64-bit mode?  I mean, it sounds like only
> > handling 64-bit mode causes us to have more code and more complexity
> > instead of less.
> 
> The main reason is extendsidi2 and zero_extendsidi2.  These are not enabled on
> 32-bit (due to the EXTSI mode iterator),

That's no reason.  Just change that iterator; see below.

> so that the common code is done to do
> sign/zero extension.  And I felt that if you allowed it, the compiler would
> move the extensions to the fp/vector unit.  Note that direct moves of 64-bit
> items to/from the GPRs is somewhat messy.

Yes, you need to set up costing properly (but you have to *anyway*); and
you might want to expand things to fit GPRs, so that GPRs are used
preferably, and VSRs are only used if there is a benefit to that.  E.g.
even if you allow DImode in VSRs, don't expand normal DImode ops to one
pseudo.

> > > In general, the 32-bit code seems to generate a lot less instructions,
> > > including fewer lfiwax/lfiwzx instructions.  On power8/power9 32-bit code,
> > > there was more mtvsrwz mtvsrwa instructions.
> > 
> > Interesting.  Is that caused by less register pressure?

?

> > > --- gcc/config/rs6000/rs6000.md   (revision 272166)
> > > +++ gcc/config/rs6000/rs6000.md   (working copy)
> > 
> > This patch is very hard to read.  It mixes insertions and deletions of
> > different definitions, where the only thing they have in common is some
> > braces or parens or whitespace usually.
> 
> I was trying to move things so that related things were together (i.e. the
> basic lfiwax and lfiwzx patterns and the two define_insn_and_splits that
> generate it).  I tend to think that when you look at the code and not the
> patches, that it makes more sense.

There are 14k+ lines of rs6000.md...  I prefer looking at diffs ;-)

> > > +; On 32-bit systems, we need to have special versions of LFIWAX and 
> > > LFIWZX because
> > > +; the sign/zero extend insns are not defined.
> > 
> > I don't understand what this means.
> 
> See above about EXTSI.  For reference here is the code from rs6000.md.
> 
> ; Everything we can extend SImode to.
> (define_mode_iterator EXTSI [(DI "TARGET_POWERPC64")])

So change it?  Condition "TARGET_POWERPC64 || TARGET_VSX", perhaps.

If things have proper costs, and you expand the named patterns to just
the GPR version, all should work fine.

> (define_insn "zero_extendsi<mode>2"

So you probably want a separate define_expand for this name, and maybe
only have that for TARGET_POWERPC64 even?

> > > +(define_insn_and_split "lfiwax"
> > 
> > This could use a better name?  Why is it separate from extendsidi2 anyway?
> 
> I was using the name that is currently in the code, i.e. the instruction.

But it is not, it is one of four different insns, _or_ a split even.

> > > +  [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wa,wa,v,v")
> > > + (unspec:DI [(match_operand:SI 1 "reg_or_indexed_operand" "Z,Z,r,v,v")]
> > >              UNSPEC_LFIWAX))]
> > >    "TARGET_HARD_FLOAT && TARGET_LFIWAX"
> > >    "@
> > >     lfiwax %0,%y1
> > >     lxsiwax %x0,%y1
> > >     mtvsrwa %x0,%1
> > > +   vextsw2d %0,%1
> > > +   #"
> > > +  "&& reload_completed && TARGET_P8_VECTOR && !TARGET_P9_VECTOR
> > > +   && altivec_register_operand (operands[1], SImode)"
> > 
> > "&& reload_completed && which_alternative == 3" works fine for that; but
> > just "&& reload_completed" should work as well, this is the only alternative
> > with "#" template.
> 
> No, it has been my experience that if you do not limit the split, that it will
> be done.  It does not look for the '#' code.

Ah yes, that's for the split-during-output only.  So use which_alternative
please.

> > > +;; Keep the SImode -> DImode conversion along with DImode -> SF/DFmode 
> > > through
> > > +;; register allocation so that the register allocator generates a LFIWAX 
> > > or
> > > +;; LXSIWAX instruction instead of a LWA instruction plus a MTVSRD* 
> > > instruction
> > > +;; on power8 and LWA + STD + LFD on power7/power6 systems.
> > > +
> > > +;; LFIWAX LFIWAX LXSIWAX MTVSRWA VEXTSW2D VUPKLSW+SPLAT
> > 
> > Not sure what this line means?
> 
> Those are the instructions generated by the different alternatives.  Similar 
> to
> the lines we have in front of the moves when grouping alternatives and
> attribute setting.

Ah.  Not sure if it helps anything, here.

> > > --- gcc/testsuite/gcc.target/powerpc/pr81348.c    (revision 272165)
> > > +++ gcc/testsuite/gcc.target/powerpc/pr81348.c    (working copy)
> > > @@ -1,9 +1,11 @@
> > >  /* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
> > 
> > powerpc64*-*-* is almost never correct.  Here it isn't, either.  You don't
> > care what the default target of your compiler is: you care what the
> > *current* target is.
> 
> Note, I wasn't changing that line.

I know, but it still is wrong ;-)

> > So, why are these patterns separate from {zero_,}extendsidi2?  That's where
> > all complexity starts, afaics?
> 
> Yes, and the desire to do the lfiwax instruction instead of the lwa 
> instruction
> and then mtvsr{d,wa} instruction that the register allocator would tend to
> pick.

There are other, simpler, ways to do that.  Even a peephole would be
simpler, and you know how I feel about peepholes :-)


Segher

Reply via email to