On Wed, Mar 02, 2022 at 06:47:39PM -0600, Segher Boessenkool wrote:
> On Wed, Mar 02, 2022 at 03:54:29PM -0500, Michael Meissner wrote:
> > Optimize signed DImode -> TImode on power10.
> 
> > On power10, GCC tries to optimize the signed conversion from DImode to
> > TImode by using the vextsd2q instruction.  However to generate this
> > instruction, it would have to generate 3 direct moves (1 from the GPR
> > registers to the altivec registers, and 2 from the altivec registers to
> > the GPR register).
> > 
> > This patch generates the shift right immediate instruction to do the
> > conversion if the target/source registers ares GPR registers like it does
> > on earlier systems.  If the target/source registers are Altivec registers,
> > it will generate the vextsd2q instruction.
> 
> >     PR target/104698
> >     * config/rs6000/vsx.md (mtvsrdd_diti_w1): Delete.
> >     (extendditi2): Convert from define_expand to
> >     define_insn_and_split.  Replace with code to deal with both GPR
> >     registers and with altivec registers.
> > 
> > gcc/testsuite/
> >     PR target/104698
> >     * gcc.target/powerpc/pr104698-1.c: New test.
> >     * gcc.target/powerpc/pr104698-2.c: New test.
> 
> > +;; Sign extend DI to TI.  We provide both GPR targets and Altivec targets 
> > on
> > +;; power10.  On earlier systems, the machine independent code will 
> > generate a
> > +;; shift left to sign extend the 64-bit value to 128-bit.
> > +;;
> > +;; If the register allocator prefers to use GPR registers, we will use a 
> > shift
> > +;; left instruction to sign extend the 64-bit value to 128-bit.
> > +;;
> > +;; If the register allocator prefers to use Altivec registers on power10,
> > +;; generate the vextsd2q instruction.
> > +(define_insn_and_split "extendditi2"
> > +  [(set (match_operand:TI 0 "register_operand" "=r,r,v,v,v")
> > +   (sign_extend:TI (match_operand:DI 1 "input_operand" "r,m,r,wa,Z")))
> > +   (clobber (reg:DI CA_REGNO))]
> > +  "TARGET_POWERPC64 && TARGET_POWER10"
> 
> What happens with -m32 -m{no,}-powerpc64?

The __int128_t and __uint128_t types are not defined in 32-bit.  So you would
never get a DImode to TImode conversion.

> > +  "#"
> > +  "&& reload_completed"
> > +  [(pc)]
> > +{
> > +  rtx dest = operands[0];
> > +  rtx src = operands[1];
> > +  int dest_regno = reg_or_subregno (dest);
> > +
> > +  /* Handle conversion to GPR registers.  Load up the low part and then do
> > +     a sign extension to the upper part.  */
> > +  if (INT_REGNO_P (dest_regno))
> > +    {
> > +      rtx dest_hi = gen_highpart (DImode, dest);
> > +      rtx dest_lo = gen_lowpart (DImode, dest);
> > +
> > +      emit_move_insn (dest_lo, src);
> > +      emit_insn (gen_ashrdi3 (dest_hi, dest_lo, GEN_INT (63)));
> 
> Please use src instead of dest_lo.  This always works, because you did
> the low-part move first.

Ok.

> > +      DONE;
> > +    }
> > +
> > +  /* For conversion to an Altivec register, generate either a splat 
> > operation
> > +     or a load rightmost double word instruction.  Both instructions gets 
> > the
> > +     DImode value into the lower 64 bits, and then do the vextsd2q
> > +     instruction.  */
> > +     
> 
> (trailing whitespace)

Ok.

> > +  else if (ALTIVEC_REGNO_P (dest_regno))
> > +    {
> > +      if (MEM_P (src))
> > +   emit_insn (gen_vsx_lxvrdx (dest, src));
> > +      else
> > +   {
> > +     rtx dest_v2di = gen_rtx_REG (V2DImode, dest_regno);
> > +     emit_insn (gen_vsx_splat_v2di (dest_v2di, src));
> > +   }
> > +
> > +      emit_insn (gen_extendditi2_vector (dest, dest));
> > +      DONE;
> > +    }
> 
> This patch needs testing on BE (and 32-bit as well of course).

Will do, but for 32-bit, it will be a NOP.

-- 
Michael Meissner, IBM
PO Box 98, Ayer, Massachusetts, USA, 01432
email: meiss...@linux.ibm.com

Reply via email to