On Wed, Mar 02, 2022 at 03:54:29PM -0500, Michael Meissner wrote:
> Optimize signed DImode -> TImode on power10.

> On power10, GCC tries to optimize the signed conversion from DImode to
> TImode by using the vextsd2q instruction.  However to generate this
> instruction, it would have to generate 3 direct moves (1 from the GPR
> registers to the altivec registers, and 2 from the altivec registers to
> the GPR register).
> 
> This patch generates the shift right immediate instruction to do the
> conversion if the target/source registers ares GPR registers like it does
> on earlier systems.  If the target/source registers are Altivec registers,
> it will generate the vextsd2q instruction.

>       PR target/104698
>       * config/rs6000/vsx.md (mtvsrdd_diti_w1): Delete.
>       (extendditi2): Convert from define_expand to
>       define_insn_and_split.  Replace with code to deal with both GPR
>       registers and with altivec registers.
> 
> gcc/testsuite/
>       PR target/104698
>       * gcc.target/powerpc/pr104698-1.c: New test.
>       * gcc.target/powerpc/pr104698-2.c: New test.

> +;; Sign extend DI to TI.  We provide both GPR targets and Altivec targets on
> +;; power10.  On earlier systems, the machine independent code will generate a
> +;; shift left to sign extend the 64-bit value to 128-bit.
> +;;
> +;; If the register allocator prefers to use GPR registers, we will use a 
> shift
> +;; left instruction to sign extend the 64-bit value to 128-bit.
> +;;
> +;; If the register allocator prefers to use Altivec registers on power10,
> +;; generate the vextsd2q instruction.
> +(define_insn_and_split "extendditi2"
> +  [(set (match_operand:TI 0 "register_operand" "=r,r,v,v,v")
> +     (sign_extend:TI (match_operand:DI 1 "input_operand" "r,m,r,wa,Z")))
> +   (clobber (reg:DI CA_REGNO))]
> +  "TARGET_POWERPC64 && TARGET_POWER10"

What happens with -m32 -m{no,}-powerpc64?

> +  "#"
> +  "&& reload_completed"
> +  [(pc)]
> +{
> +  rtx dest = operands[0];
> +  rtx src = operands[1];
> +  int dest_regno = reg_or_subregno (dest);
> +
> +  /* Handle conversion to GPR registers.  Load up the low part and then do
> +     a sign extension to the upper part.  */
> +  if (INT_REGNO_P (dest_regno))
> +    {
> +      rtx dest_hi = gen_highpart (DImode, dest);
> +      rtx dest_lo = gen_lowpart (DImode, dest);
> +
> +      emit_move_insn (dest_lo, src);
> +      emit_insn (gen_ashrdi3 (dest_hi, dest_lo, GEN_INT (63)));

Please use src instead of dest_lo.  This always works, because you did
the low-part move first.

> +      DONE;
> +    }
> +
> +  /* For conversion to an Altivec register, generate either a splat operation
> +     or a load rightmost double word instruction.  Both instructions gets the
> +     DImode value into the lower 64 bits, and then do the vextsd2q
> +     instruction.  */
> +     

(trailing whitespace)

> +  else if (ALTIVEC_REGNO_P (dest_regno))
> +    {
> +      if (MEM_P (src))
> +     emit_insn (gen_vsx_lxvrdx (dest, src));
> +      else
> +     {
> +       rtx dest_v2di = gen_rtx_REG (V2DImode, dest_regno);
> +       emit_insn (gen_vsx_splat_v2di (dest_v2di, src));
> +     }
> +
> +      emit_insn (gen_extendditi2_vector (dest, dest));
> +      DONE;
> +    }

This patch needs testing on BE (and 32-bit as well of course).

You did not get rid of the unspecs this way.  Oh well, that can happen
later.

> +
> +  else
> +    gcc_unreachable ();
> +}
> +  [(set_attr "length" "8")])

This needs to set the insn type for each alternative (they have
different costs, for memory at least!)

> +/* { dg-final { scan-assembler-not   {\mmfvsrd\M}     } } */
> +/* { dg-final { scan-assembler-not   {\mmfvsrld\M}    } } */
> +/* { dg-final { scan-assembler-not   {\mmtvsrdd\M}    } } */

You could just do

> +/* { dg-final { scan-assembler-not   {\mm[ft]vsr}    } } */

and even all future variants would be handled :-)

Looks good, just those last details :-)


Segher

Reply via email to