Hi! On Tue, Jun 18, 2019 at 01:53:36PM -0400, Michael Meissner wrote: > On Tue, Jun 18, 2019 at 06:37:54AM -0500, Segher Boessenkool wrote: > > On Mon, Jun 17, 2019 at 05:24:37PM -0400, Michael Meissner wrote: > > > I wrote the code to generate LFIWAX and LFIWZX originally for the power7 > > > in the > > > 2010 time frame. At the time, we did not allow SImode to go into floating > > > point and vector registers. As part of the power9 work, we now allow > > > SImode to > > > go into FP/vector registers with for 64-bit code targetting -mcpu=power8 > > > or > > > higher. But we never went back and tweaked the LFIWAX/LFIWZX support. > > > > Why do we allow it only in 64-bit mode? I mean, it sounds like only > > handling 64-bit mode causes us to have more code and more complexity > > instead of less. > > The main reason is extendsidi2 and zero_extendsidi2. These are not enabled on > 32-bit (due to the EXTSI mode iterator),
That's no reason. Just change that iterator; see below. > so that the common code is done to do > sign/zero extension. And I felt that if you allowed it, the compiler would > move the extensions to the fp/vector unit. Note that direct moves of 64-bit > items to/from the GPRs is somewhat messy. Yes, you need to set up costing properly (but you have to *anyway*); and you might want to expand things to fit GPRs, so that GPRs are used preferably, and VSRs are only used if there is a benefit to that. E.g. even if you allow DImode in VSRs, don't expand normal DImode ops to one pseudo. > > > In general, the 32-bit code seems to generate a lot less instructions, > > > including fewer lfiwax/lfiwzx instructions. On power8/power9 32-bit code, > > > there was more mtvsrwz mtvsrwa instructions. > > > > Interesting. Is that caused by less register pressure? ? > > > --- gcc/config/rs6000/rs6000.md (revision 272166) > > > +++ gcc/config/rs6000/rs6000.md (working copy) > > > > This patch is very hard to read. It mixes insertions and deletions of > > different definitions, where the only thing they have in common is some > > braces or parens or whitespace usually. > > I was trying to move things so that related things were together (i.e. the > basic lfiwax and lfiwzx patterns and the two define_insn_and_splits that > generate it). I tend to think that when you look at the code and not the > patches, that it makes more sense. There are 14k+ lines of rs6000.md... I prefer looking at diffs ;-) > > > +; On 32-bit systems, we need to have special versions of LFIWAX and > > > LFIWZX because > > > +; the sign/zero extend insns are not defined. > > > > I don't understand what this means. > > See above about EXTSI. For reference here is the code from rs6000.md. > > ; Everything we can extend SImode to. > (define_mode_iterator EXTSI [(DI "TARGET_POWERPC64")]) So change it? Condition "TARGET_POWERPC64 || TARGET_VSX", perhaps. If things have proper costs, and you expand the named patterns to just the GPR version, all should work fine. > (define_insn "zero_extendsi<mode>2" So you probably want a separate define_expand for this name, and maybe only have that for TARGET_POWERPC64 even? > > > +(define_insn_and_split "lfiwax" > > > > This could use a better name? Why is it separate from extendsidi2 anyway? > > I was using the name that is currently in the code, i.e. the instruction. But it is not, it is one of four different insns, _or_ a split even. > > > + [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wa,wa,v,v") > > > + (unspec:DI [(match_operand:SI 1 "reg_or_indexed_operand" "Z,Z,r,v,v")] > > > UNSPEC_LFIWAX))] > > > "TARGET_HARD_FLOAT && TARGET_LFIWAX" > > > "@ > > > lfiwax %0,%y1 > > > lxsiwax %x0,%y1 > > > mtvsrwa %x0,%1 > > > + vextsw2d %0,%1 > > > + #" > > > + "&& reload_completed && TARGET_P8_VECTOR && !TARGET_P9_VECTOR > > > + && altivec_register_operand (operands[1], SImode)" > > > > "&& reload_completed && which_alternative == 3" works fine for that; but > > just "&& reload_completed" should work as well, this is the only alternative > > with "#" template. > > No, it has been my experience that if you do not limit the split, that it will > be done. It does not look for the '#' code. Ah yes, that's for the split-during-output only. So use which_alternative please. > > > +;; Keep the SImode -> DImode conversion along with DImode -> SF/DFmode > > > through > > > +;; register allocation so that the register allocator generates a LFIWAX > > > or > > > +;; LXSIWAX instruction instead of a LWA instruction plus a MTVSRD* > > > instruction > > > +;; on power8 and LWA + STD + LFD on power7/power6 systems. > > > + > > > +;; LFIWAX LFIWAX LXSIWAX MTVSRWA VEXTSW2D VUPKLSW+SPLAT > > > > Not sure what this line means? > > Those are the instructions generated by the different alternatives. Similar > to > the lines we have in front of the moves when grouping alternatives and > attribute setting. Ah. Not sure if it helps anything, here. > > > --- gcc/testsuite/gcc.target/powerpc/pr81348.c (revision 272165) > > > +++ gcc/testsuite/gcc.target/powerpc/pr81348.c (working copy) > > > @@ -1,9 +1,11 @@ > > > /* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */ > > > > powerpc64*-*-* is almost never correct. Here it isn't, either. You don't > > care what the default target of your compiler is: you care what the > > *current* target is. > > Note, I wasn't changing that line. I know, but it still is wrong ;-) > > So, why are these patterns separate from {zero_,}extendsidi2? That's where > > all complexity starts, afaics? > > Yes, and the desire to do the lfiwax instruction instead of the lwa > instruction > and then mtvsr{d,wa} instruction that the register allocator would tend to > pick. There are other, simpler, ways to do that. Even a peephole would be simpler, and you know how I feel about peepholes :-) Segher