On Mon, Jun 17, 2019 at 05:24:37PM -0400, Michael Meissner wrote:
> I wrote the code to generate LFIWAX and LFIWZX originally for the power7 in 
> the
> 2010 time frame.  At the time, we did not allow SImode to go into floating
> point and vector registers.  As part of the power9 work, we now allow SImode 
> to
> go into FP/vector registers with for 64-bit code targetting -mcpu=power8 or
> higher.  But we never went back and tweaked the LFIWAX/LFIWZX support.

Why do we allow it only in 64-bit mode?  I mean, it sounds like only
handling 64-bit mode causes us to have more code and more complexity
instead of less.

> I was writing code for a possible future PowerPC machine, and the new code
> added an attribute that caused some of the -mno-vsx tests to fail.  This was
> due to the floatsi<mode>2_lfiwax and floatunssi<mode>2_lfiwzx patterns did not
> have a non-VSX alternative, and the attribute processing needed to process the
> alternatives before the first split pass.

I don't understand what you mean...  "attribute processing"?

> In general, the 32-bit code seems to generate a lot less instructions,
> including fewer lfiwax/lfiwzx instructions.  On power8/power9 32-bit code,
> there was more mtvsrwz mtvsrwa instructions.

Interesting.  Is that caused by less register pressure?

> --- gcc/config/rs6000/rs6000.md       (revision 272166)
> +++ gcc/config/rs6000/rs6000.md       (working copy)

This patch is very hard to read.  It mixes insertions and deletions of
different definitions, where the only thing they have in common is some
braces or parens or whitespace usually.

Maybe more context (-U<n>) helps, maybe whole-function mode is better (-W),
maybe something else.  It also sometimes helps to do things as a patch
series instead of as one patch.  Please experiment.

> +; On 32-bit systems, we need to have special versions of LFIWAX and LFIWZX 
> because
> +; the sign/zero extend insns are not defined.

I don't understand what this means.

[ Deleted all "-" lines below, to make some sense of it. ]


> +(define_insn_and_split "lfiwax"

This could use a better name?  Why is it separate from extendsidi2 anyway?

> +  [(set (match_operand:DI 0 "gpc_reg_operand" "=d,wa,wa,v,v")
> +     (unspec:DI [(match_operand:SI 1 "reg_or_indexed_operand" "Z,Z,r,v,v")]
>                  UNSPEC_LFIWAX))]
>    "TARGET_HARD_FLOAT && TARGET_LFIWAX"
>    "@
>     lfiwax %0,%y1
>     lxsiwax %x0,%y1
>     mtvsrwa %x0,%1
> +   vextsw2d %0,%1
> +   #"
> +  "&& reload_completed && TARGET_P8_VECTOR && !TARGET_P9_VECTOR
> +   && altivec_register_operand (operands[1], SImode)"

"&& reload_completed && which_alternative == 3" works fine for that; but
just "&& reload_completed" should work as well, this is the only alternative
with "#" template.

> +  [(const_int 0)]
>  {
>    rtx dest = operands[0];
>    rtx src = operands[1];
> +  int dest_regno = REGNO (dest);
> +  int src_regno = REGNO (src);
> +  rtx dest_v2di = gen_rtx_REG (V2DImode, dest_regno);
> +  rtx src_v4si = gen_rtx_REG (V4SImode, src_regno);
>  
> +  if (BYTES_BIG_ENDIAN)
>      {
> +      emit_insn (gen_altivec_vupkhsw (dest_v2di, src_v4si));
> +      emit_insn (gen_vsx_xxspltd_v2di (dest_v2di, dest_v2di, const1_rtx));
> +      DONE;
>      }
>    else
> +    {
> +      emit_insn (gen_altivec_vupklsw (dest_v2di, src_v4si));
> +      emit_insn (gen_vsx_xxspltd_v2di (dest_v2di, dest_v2di, const0_rtx));
> +      DONE;
> +    }
>  }
> +  [(set_attr "type" "fpload,fpload,mffgpr,vecexts,vecexts")
> +   (set_attr "isa" "*,p8v,p8v,p9v,p8v")
> +   (set_attr "length" "*,*,*,*,8")])


> +;; Keep the SImode -> DImode conversion along with DImode -> SF/DFmode 
> through
> +;; register allocation so that the register allocator generates a LFIWAX or
> +;; LXSIWAX instruction instead of a LWA instruction plus a MTVSRD* 
> instruction
> +;; on power8 and LWA + STD + LFD on power7/power6 systems.
> +
> +;; LFIWAX LFIWAX LXSIWAX MTVSRWA VEXTSW2D VUPKLSW+SPLAT

Not sure what this line means?

> +;; The first alternative is to support -mno-vsx and -mcpu=power6.
> +(define_insn_and_split "floatsi<mode>2_lfiwax"
> +  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa,wa,wa,wa,wa")
> +     (float:SFDF
> +      (match_operand:SI 1 "nonimmediate_operand" "Z,Z,Z,r,v,v")))
> +   (clobber (match_scratch:DI 2 "=d,d,v,wa,v,v"))]
> +  "TARGET_HARD_FLOAT && TARGET_LFIWAX && <SI_CONVERT_FP>"
>    "#"
> +  "&& reload_completed"
> +  [(match_dup 3)
> +   (set (match_dup 0)
> +     (float:SFDF (match_dup 2)))]
>  {
>    rtx src = operands[1];
> +  rtx tmp = operands[2];
>  
> +  operands[3] = (TARGET_DIRECT_MOVE_64BIT
> +              ? gen_extendsidi2 (tmp, src)
> +              : gen_lfiwax (tmp, src));
>  }
> +  [(set_attr "length" "8,8,8,8,8,12")
> +   (set_attr "type" "fpload,fpload,fpload,mffgpr,fp,fp")
> +   (set_attr "isa" "*,p7v,p8v,p8v,p9v,p8v")])

So this says to convert a SI to SF or DF, first sign-extend it to DImode
and then do the conversion?

> +;; LFIWZX LXSIWZX MTVSRWZ XXEXTRACTUW
> +;; The first alternative is to support -mno-vsx.
> +(define_insn_and_split "floatunssi<mode>2_lfiwzx"
> +  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,wa,wa,wa,wa")
>       (unsigned_float:SFDF
> +      (match_operand:SI 1 "nonimmediate_operand" "Z,Z,Z,r,wa")))
> +   (clobber (match_scratch:DI 2 "=d,d,v,wa,wa"))]
>    "TARGET_HARD_FLOAT && TARGET_LFIWZX && <SI_CONVERT_FP>"
>    "#"
> +  "&& reload_completed"
> +  [(match_dup 3)
> +   (set (match_dup 0)
> +     (float:SFDF (match_dup 2)))]
>  {
> +  rtx src = operands[1];
> +  rtx tmp = operands[2];
> +
> +  operands[3] = (TARGET_DIRECT_MOVE_64BIT
> +              ? gen_zero_extendsidi2 (tmp, src)
> +              : gen_lfiwzx (tmp, src));
> +
>  }
>    [(set_attr "length" "8")
> -   (set_attr "type" "fpload")])
> +   (set_attr "type" "fpload,fpload,fpload,mffgpr,vecexts")
> +   (set_attr "isa" "*,p7v,p8v,p8v,p9v")])


> --- gcc/testsuite/gcc.target/powerpc/pr81348.c        (revision 272165)
> +++ gcc/testsuite/gcc.target/powerpc/pr81348.c        (working copy)
> @@ -1,9 +1,11 @@
>  /* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */

powerpc64*-*-* is almost never correct.  Here it isn't, either.  You don't
care what the default target of your compiler is: you care what the
*current* target is.

> +   the wrong register in a define_split.  Originially it failed with -Og.

Typo ("originally").

So, why are these patterns separate from {zero_,}extendsidi2?  That's where
all complexity starts, afaics?


Segher

Reply via email to