On Tue, Jun 18, 2019 at 06:37:54AM -0500, Segher Boessenkool wrote: > On Mon, Jun 17, 2019 at 05:24:37PM -0400, Michael Meissner wrote: > > I wrote the code to generate LFIWAX and LFIWZX originally for the power7 in > > the > > 2010 time frame. At the time, we did not allow SImode to go into floating > > point and vector registers. As part of the power9 work, we now allow > > SImode to > > go into FP/vector registers with for 64-bit code targetting -mcpu=power8 or > > higher. But we never went back and tweaked the LFIWAX/LFIWZX support. > > Why do we allow it only in 64-bit mode? I mean, it sounds like only > handling 64-bit mode causes us to have more code and more complexity > instead of less.
The main reason is extendsidi2 and zero_extendsidi2. These are not enabled on 32-bit (due to the EXTSI mode iterator), so that the common code is done to do sign/zero extension. And I felt that if you allowed it, the compiler would move the extensions to the fp/vector unit. Note that direct moves of 64-bit items to/from the GPRs is somewhat messy. > > I was writing code for a possible future PowerPC machine, and the new code > > added an attribute that caused some of the -mno-vsx tests to fail. This was > > due to the floatsi<mode>2_lfiwax and floatunssi<mode>2_lfiwzx patterns did > > not > > have a non-VSX alternative, and the attribute processing needed to process > > the > > alternatives before the first split pass. > > I don't understand what you mean... "attribute processing"? In my current "prefixed" attribute support (that says whether a prefixed instruction is used and changes the length), I eliminated the "maybe_prefix" attribute, since you have complained about it. This means that when the "prefixed" attribute is checked, it has to figure out which alternative it is is, to see what the "type" attribute is. The floatsi<mode>2_lfiwax and floatunssi<mode>2_lfiwzx patterns do not have a non-VSX alternative. In the past before the "isa" attribute enabled removing alternatives based on the machine, this was harmless, because the split that those instructions do (before register allocation) would create appropriate code. Now that "isa" removes the alternatives, there are no alternatives that will process the insn, so the compiler dies with insn not found. Obviously, I can make a simpler patch just to fix that problem. > > In general, the 32-bit code seems to generate a lot less instructions, > > including fewer lfiwax/lfiwzx instructions. On power8/power9 32-bit code, > > there was more mtvsrwz mtvsrwa instructions. > > Interesting. Is that caused by less register pressure? > > > --- gcc/config/rs6000/rs6000.md (revision 272166) > > +++ gcc/config/rs6000/rs6000.md (working copy) > > This patch is very hard to read. It mixes insertions and deletions of > different definitions, where the only thing they have in common is some > braces or parens or whitespace usually. I was trying to move things so that related things were together (i.e. the basic lfiwax and lfiwzx patterns and the two define_insn_and_splits that generate it). I tend to think that when you look at the code and not the patches, that it makes more sense. > > Maybe more context (-U<n>) helps, maybe whole-function mode is better (-W), > maybe something else. It also sometimes helps to do things as a patch > series instead of as one patch. Please experiment. > > > +; On 32-bit systems, we need to have special versions of LFIWAX and LFIWZX > > because > > +; the sign/zero extend insns are not defined. > > I don't understand what this means. See above about EXTSI. For reference here is the code from rs6000.md. ; Everything we can extend SImode to. (define_mode_iterator EXTSI [(DI "TARGET_POWERPC64")]) ; ... (define_insn "zero_extendsi<mode>2" [(set (match_operand:EXTSI 0 "gpc_reg_operand" "=r,r,d,wa,wa,r,wa") (zero_extend:EXTSI (match_operand:SI 1 "reg_or_mem_operand" "m,r,Z,Z,r,wa,wa")))] "" "@ lwz%U1%X1 %0,%1 rldicl %0,%1,0,32 lfiwzx %0,%y1 lxsiwzx %x0,%y1 mtvsrwz %x0,%1 mfvsrwz %0,%x1 xxextractuw %x0,%x1,4" [(set_attr "type" "load,shift,fpload,fpload,mffgpr,mftgpr,vecexts") (set_attr "isa" "*,*,p7,p8v,p8v,p8v,p9v")]) ; and similar for extendsi<mode>2. > > [ Deleted all "-" lines below, to make some sense of it. ] > > > > +(define_insn_and_split "lfiwax" > > This could use a better name? Why is it separate from extendsidi2 anyway? I was using the name that is currently in the code, i.e. the instruction. > > + [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wa,wa,v,v") > > + (unspec:DI [(match_operand:SI 1 "reg_or_indexed_operand" "Z,Z,r,v,v")] > > UNSPEC_LFIWAX))] > > "TARGET_HARD_FLOAT && TARGET_LFIWAX" > > "@ > > lfiwax %0,%y1 > > lxsiwax %x0,%y1 > > mtvsrwa %x0,%1 > > + vextsw2d %0,%1 > > + #" > > + "&& reload_completed && TARGET_P8_VECTOR && !TARGET_P9_VECTOR > > + && altivec_register_operand (operands[1], SImode)" > > "&& reload_completed && which_alternative == 3" works fine for that; but > just "&& reload_completed" should work as well, this is the only alternative > with "#" template. No, it has been my experience that if you do not limit the split, that it will be done. It does not look for the '#' code. I.e. if we don't limit it to just the power8, on power9 it will do the split instead of doing the vextsw2d instruction. > > + [(const_int 0)] > > { > > rtx dest = operands[0]; > > rtx src = operands[1]; > > + int dest_regno = REGNO (dest); > > + int src_regno = REGNO (src); > > + rtx dest_v2di = gen_rtx_REG (V2DImode, dest_regno); > > + rtx src_v4si = gen_rtx_REG (V4SImode, src_regno); > > > > + if (BYTES_BIG_ENDIAN) > > { > > + emit_insn (gen_altivec_vupkhsw (dest_v2di, src_v4si)); > > + emit_insn (gen_vsx_xxspltd_v2di (dest_v2di, dest_v2di, const1_rtx)); > > + DONE; > > } > > else > > + { > > + emit_insn (gen_altivec_vupklsw (dest_v2di, src_v4si)); > > + emit_insn (gen_vsx_xxspltd_v2di (dest_v2di, dest_v2di, const0_rtx)); > > + DONE; > > + } > > } > > + [(set_attr "type" "fpload,fpload,mffgpr,vecexts,vecexts") > > + (set_attr "isa" "*,p8v,p8v,p9v,p8v") > > + (set_attr "length" "*,*,*,*,8")]) > > > > +;; Keep the SImode -> DImode conversion along with DImode -> SF/DFmode > > through > > +;; register allocation so that the register allocator generates a LFIWAX or > > +;; LXSIWAX instruction instead of a LWA instruction plus a MTVSRD* > > instruction > > +;; on power8 and LWA + STD + LFD on power7/power6 systems. > > + > > +;; LFIWAX LFIWAX LXSIWAX MTVSRWA VEXTSW2D VUPKLSW+SPLAT > > Not sure what this line means? Those are the instructions generated by the different alternatives. Similar to the lines we have in front of the moves when grouping alternatives and attribute setting. > > +;; The first alternative is to support -mno-vsx and -mcpu=power6. > > +(define_insn_and_split "floatsi<mode>2_lfiwax" > > + [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa,wa,wa,wa,wa") > > + (float:SFDF > > + (match_operand:SI 1 "nonimmediate_operand" "Z,Z,Z,r,v,v"))) > > + (clobber (match_scratch:DI 2 "=d,d,v,wa,v,v"))] > > + "TARGET_HARD_FLOAT && TARGET_LFIWAX && <SI_CONVERT_FP>" > > "#" > > + "&& reload_completed" > > + [(match_dup 3) > > + (set (match_dup 0) > > + (float:SFDF (match_dup 2)))] > > { > > rtx src = operands[1]; > > + rtx tmp = operands[2]; > > > > + operands[3] = (TARGET_DIRECT_MOVE_64BIT > > + ? gen_extendsidi2 (tmp, src) > > + : gen_lfiwax (tmp, src)); > > } > > + [(set_attr "length" "8,8,8,8,8,12") > > + (set_attr "type" "fpload,fpload,fpload,mffgpr,fp,fp") > > + (set_attr "isa" "*,p7v,p8v,p8v,p9v,p8v")]) > > So this says to convert a SI to SF or DF, first sign-extend it to DImode > and then do the conversion? Yes, but on 32-bit anything and 64-bit power7, we don't enable the insn that does the sign extend via the normal pattern, so we have to use the lfiwax unspec insn. > > +;; LFIWZX LXSIWZX MTVSRWZ XXEXTRACTUW > > +;; The first alternative is to support -mno-vsx. > > +(define_insn_and_split "floatunssi<mode>2_lfiwzx" > > + [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa,wa,wa,wa") > > (unsigned_float:SFDF > > + (match_operand:SI 1 "nonimmediate_operand" "Z,Z,Z,r,wa"))) > > + (clobber (match_scratch:DI 2 "=d,d,v,wa,wa"))] > > "TARGET_HARD_FLOAT && TARGET_LFIWZX && <SI_CONVERT_FP>" > > "#" > > + "&& reload_completed" > > + [(match_dup 3) > > + (set (match_dup 0) > > + (float:SFDF (match_dup 2)))] > > { > > + rtx src = operands[1]; > > + rtx tmp = operands[2]; > > + > > + operands[3] = (TARGET_DIRECT_MOVE_64BIT > > + ? gen_zero_extendsidi2 (tmp, src) > > + : gen_lfiwzx (tmp, src)); > > + > > } > > [(set_attr "length" "8") > > - (set_attr "type" "fpload")]) > > + (set_attr "type" "fpload,fpload,fpload,mffgpr,vecexts") > > + (set_attr "isa" "*,p7v,p8v,p8v,p9v")]) > > > > --- gcc/testsuite/gcc.target/powerpc/pr81348.c (revision 272165) > > +++ gcc/testsuite/gcc.target/powerpc/pr81348.c (working copy) > > @@ -1,9 +1,11 @@ > > /* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */ > > powerpc64*-*-* is almost never correct. Here it isn't, either. You don't > care what the default target of your compiler is: you care what the > *current* target is. Note, I wasn't changing that line. I just needed to bump up the optimization level so it generated the code that was being checked for. Otherwise it would optimize it to do a direct move. > > + the wrong register in a define_split. Originially it failed with -Og. > > Typo ("originally"). Ok. > So, why are these patterns separate from {zero_,}extendsidi2? That's where > all complexity starts, afaics? Yes, and the desire to do the lfiwax instruction instead of the lwa instruction and then mtvsr{d,wa} instruction that the register allocator would tend to pick. -- Michael Meissner, IBM IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797