Re: [PATCH] Generate XXSPLTIDP on power10.

will schmidt via Gcc-patches Thu, 26 Aug 2021 12:18:25 -0700

On Wed, 2021-08-25 at 15:46 -0400, Michael Meissner wrote:
> Generate XXSPLTIDP on power10.
> 
> This patch implements XXSPLTIDP support for SF and DF scalar constants and 
> V2DF
> vector constants.  The XXSPLTIDP instruction is given a 32-bit immediate that
> is converted to a vector of two DFmode constants.  The immediate is in SFmode
> format, so only constants that fit as SFmode values can be loaded with
> XXSPLTIDP.


ok

> 
> I added a new constraint (eF) to match constants that can be loaded with the
> XXSPLTIDP instruction.

> 
> I have added a temporary switch (-mxxspltidp) to control whether or not the
> XXSPLTIDP instruction is generated.

How temporary?  

> 
> I added 3 new tests to test loading up SF/DF scalar and V2DF vector
> constants.
> 
> I have tested this with bootstrap compilers on power10 systems and there was 
> no
> regression.  I have built GCC with these patches on little endian power9 and
> big endian power8 systems, and there were no regressions.
> 
> In addition, I have built and run the full Spec 2017 rate suite, comparing 
> with
> the patches enabled and not enabled.  There were roughly 66,000 XXSPLTIDP's
> generated in the rate build for Spec 2017.  On a stand-alone system that is
> running single threaded, blender_r has a 1.9% increase in performance, and 
> rest
> of the benchmarks are performance neutral.  However, I would expect that in a
> real world scenario, switching to use XXSPLTIDP will increase performance due
> to removing all of the loads.

ok

> 
> Can I check this into the master branch?
> 
> 2021-08-25  Michael Meissner  <meiss...@linux.ibm.com>
> 
> gcc/
>       * config/rs6000/constraints.md (eF): New constraint.
>       * config/rs6000/predicates.md (easy_fp_constant): If we can load
>       the scalar constant with XXSPLTIDP, the floating point constant is
>       easy.

Could be shortened to something like ? 
  Add clause to accept xxspltidp_operand as easy.

>       (xxspltidp_operand): New predicate.

Will there ever be another instruction using the SF/DF CONST_DOUBLE  or
V2DF CONST_VECTOR ?   I tentatively question the name of the operand,
but defer.. 

>       (easy_vector_constant): If we can generate XXSPLTIDP, mark the
>       vector constant as easy.

Duplicated from above.

>       * config/rs6000/rs6000-protos.h (xxspltidp_constant_p): New
>       declaration.
>       (prefixed_permute_p): Likewise.


>       * config/rs6000/rs6000.c (xxspltidp_constant_p): New function.
>       (output_vec_const_move): Add support for XXSPLTIDP.
>       (prefixed_permute_p): New function.

Duplicated.

>       * config/rs6000/rs6000.md (prefixed attribute): Add support for
>       permute prefixed instructions.
>       (movsf_hardfloat): Add XXSPLTIDP support.
>       (mov<mode>_hardfloat32, FMOVE64 iterator): Likewise.
>       (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise.
>       * config/rs6000/rs6000.opt (-mxxspltidp): New switch.
>       * config/rs6000/vsx.md (vsx_move<mode>_64bit): Add XXSPLTIDP
>       support.
>       (vsx_move<mode>_32bit): Likewise.

No e in mov (per patch contents below).

>       (vsx_splat_v2df_xxspltidp): New insn.
>       (XXSPLTIDP): New mode iterator.
>       (xxspltidp_<mode>_internal): New insn and splits.
>       (xxspltidp_<mode>_inst): Replace xxspltidp_v2df_inst with an
>       iterated form that also does SFmode, and DFmode.
Swap "an iterated form" with "xxspltidp_<mode>_inst  ?




> 
> gcc/testsuite/
>       * gcc.target/powerpc/vec-splat-constant-sf.c: New test.
>       * gcc.target/powerpc/vec-splat-constant-df.c: New test.
>       * gcc.target/powerpc/vec-splat-constant-v2df.c: New test.
> ---
>  gcc/config/rs6000/constraints.md              |   5 +
>  gcc/config/rs6000/predicates.md               |  17 +++
>  gcc/config/rs6000/rs6000-protos.h             |   2 +
>  gcc/config/rs6000/rs6000.c                    | 106 ++++++++++++++++++
>  gcc/config/rs6000/rs6000.md                   |  45 +++++---
>  gcc/config/rs6000/rs6000.opt                  |   4 +
>  gcc/config/rs6000/vsx.md                      |  64 ++++++++++-
>  .../powerpc/vec-splat-constant-df.c           |  60 ++++++++++
>  .../powerpc/vec-splat-constant-sf.c           |  60 ++++++++++
>  .../powerpc/vec-splat-constant-v2df.c         |  64 +++++++++++
>  10 files changed, 405 insertions(+), 22 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
>  create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
> 
> diff --git a/gcc/config/rs6000/constraints.md 
> b/gcc/config/rs6000/constraints.md
> index c8cff1a3038..ea2e4a267c3 100644
> --- a/gcc/config/rs6000/constraints.md
> +++ b/gcc/config/rs6000/constraints.md
> @@ -208,6 +208,11 @@ (define_constraint "P"
>    (and (match_code "const_int")
>         (match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < 
> 0x10000")))
> 
> +;; SF/DF/V2DF scalar or vector constant that can be loaded with XXSPLTIDP
> +(define_constraint "eF"
> +  "A vector constant that can be loaded with the XXSPLTIDP instruction."
> +  (match_operand 0 "xxspltidp_operand"))
> +
>  ;; 34-bit signed integer constant
>  (define_constraint "eI"
>    "A signed 34-bit integer constant if prefixed instructions are supported."
> diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md
> index 956e42bc514..134243e404b 100644
> --- a/gcc/config/rs6000/predicates.md
> +++ b/gcc/config/rs6000/predicates.md
> @@ -601,6 +601,11 @@ (define_predicate "easy_fp_constant"
>    if (TARGET_VSX && op == CONST0_RTX (mode))
>      return 1;
> 
> +  /* If we have the ISA 3.1 XXSPLTIDP instruction, see if the constant can
> +     be loaded with that instruction.  */
> +  if (xxspltidp_operand (op, mode))
> +    return 1;
> +
>    /* Otherwise consider floating point constants hard, so that the
>       constant gets pushed to memory during the early RTL phases.  This
>       has the advantage that double precision constants that can be
> @@ -640,6 +645,15 @@ (define_predicate "xxspltib_constant_nosplit"
>    return num_insns == 1;
>  })
> 
> +;; Return 1 if operand is a SF/DF CONST_DOUBLE or V2DF CONST_VECTOR that can 
> be
> +;; loaded via the ISA 3.1 XXSPLTIDP instruction.

"Return 1 if" doesnt seem right given the return statement here.

> +(define_predicate "xxspltidp_operand"
> +  (match_code "const_double,const_vector,vec_duplicate")
> +{
> +  HOST_WIDE_INT value = 0;
> +  return xxspltidp_constant_p (op, mode, &value);
> +})
> +
>  ;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a
>  ;; vector register without using memory.
>  (define_predicate "easy_vector_constant"
> @@ -653,6 +667,9 @@ (define_predicate "easy_vector_constant"
>        if (zero_constant (op, mode) || all_ones_constant (op, mode))
>       return true;
> 
> +      if (xxspltidp_operand (op, mode))
> +     return true;
> +
>        if (TARGET_P9_VECTOR
>            && xxspltib_constant_p (op, mode, &num_insns, &value))
>       return true;
> diff --git a/gcc/config/rs6000/rs6000-protos.h 
> b/gcc/config/rs6000/rs6000-protos.h
> index 14f6b313105..9bba57c22f2 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, 
> rtx, int, int, int,
> 
>  extern int easy_altivec_constant (rtx, machine_mode);
>  extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *);
> +extern bool xxspltidp_constant_p (rtx, machine_mode, HOST_WIDE_INT *);
>  extern int vspltis_shifted (rtx);
>  extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int);
>  extern bool macho_lo_sum_memory_operand (rtx, machine_mode);
> @@ -198,6 +199,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, 
> machine_mode mode);
>  extern bool prefixed_load_p (rtx_insn *);
>  extern bool prefixed_store_p (rtx_insn *);
>  extern bool prefixed_paddi_p (rtx_insn *);
> +extern bool prefixed_permute_p (rtx_insn *);
>  extern void rs6000_asm_output_opcode (FILE *);
>  extern void output_pcrel_opt_reloc (rtx);
>  extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int);
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index e073b26b430..322b3c83925 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -6533,6 +6533,74 @@ xxspltib_constant_p (rtx op,
>    return true;
>  }
> 
> +/* Return true if OP is of the given MODE and can be synthesized with ISA 3.1
> +   XXSPLTIDP instruction.
> +
> +   Return the constant that is being split via CONSTANT_PTR to use in the
> +   XXSPLTIDP instruction.  */

Appears to return true or false.  Is the "Return the constant" comment
meant to go on the predicate definition earlier?

> +
> +bool
> +xxspltidp_constant_p (rtx op,
> +                   machine_mode mode,
> +                   HOST_WIDE_INT *constant_ptr)
> +{
> +  *constant_ptr = 0;
> +
> +  if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX)
> +    return false;
> +
> +  if (mode == VOIDmode)
> +    mode = GET_MODE (op);
> +
> +  rtx element = op;
> +  if (mode == V2DFmode)
> +    {
> +      if (CONST_VECTOR_P (op))
> +     {
> +       element = CONST_VECTOR_ELT (op, 0);
> +       if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, 1)))
> +         return false;
> +     }
> +
> +      else if (GET_CODE (op) == VEC_DUPLICATE)
> +     element = XEXP (op, 0);
> +
> +      else
> +     return false;
> +
> +      mode = DFmode;
> +    }
> +
> +  if (mode != SFmode && mode != DFmode)
> +    return false;
> +
> +  if (GET_MODE (element) != mode)
> +    return false;
> +
> +  if (!CONST_DOUBLE_P (element))
> +    return false;
> +
> +  /* Don't return true for 0.0 since that is easy to create without
> +     XXSPLTIDP.  */
> +  if (element == CONST0_RTX (mode))
> +    return false;
> +
> +  /* If the value doesn't fit in a SFmode, exactly, we can't use XXSPLTIDP.  
> */
> +  const struct real_value *rv = CONST_DOUBLE_REAL_VALUE (element);
> +  if (!exact_real_truncate (SFmode, rv))
> +    return false;

The 'exactly' caught my eye.  Per a glance at comments in
extract_real_truncate this indicates that the value is identical after
conversion to the new format.   Ok.


> +
> +  long value;
> +  REAL_VALUE_TO_TARGET_SINGLE (*rv, value);
> +
> +  /* Test for SFmode denormal (exponent is 0, mantissa field is non-zero).  
> */
> +  if (((value & 0x7F800000) == 0) && ((value & 0x7FFFFF) != 0))
> +    return false;
> +
> +  *constant_ptr = value;
> +  return true;
> +}

ok


> +
>  const char *
>  output_vec_const_move (rtx *operands)
>  {
> @@ -6548,6 +6616,7 @@ output_vec_const_move (rtx *operands)
>      {
>        bool dest_vmx_p = ALTIVEC_REGNO_P (REGNO (dest));
>        int xxspltib_value = 256;
> +      HOST_WIDE_INT xxspltidp_value = 0;
>        int num_insns = -1;
> 
>        if (zero_constant (vec, mode))
> @@ -6577,6 +6646,12 @@ output_vec_const_move (rtx *operands)
>           gcc_unreachable ();
>       }
> 
> +      if (xxspltidp_constant_p (vec, mode, &xxspltidp_value))
> +     {
> +       operands[2] = GEN_INT (xxspltidp_value);
> +       return "xxspltidp %x0,%2";
> +     }
> +
>        if (TARGET_P9_VECTOR
>         && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value))
>       {

ok

> @@ -26219,6 +26294,37 @@ prefixed_paddi_p (rtx_insn *insn)
>    return (iform == INSN_FORM_PCREL_EXTERNAL || iform == 
> INSN_FORM_PCREL_LOCAL);
>  }
> 
> +/* Whether a permute type instruction is a prefixed instruction.  This is
> +   called from the prefixed attribute processing.  */
> +
> +bool
> +prefixed_permute_p (rtx_insn *insn)
> +{
> +  rtx set = single_set (insn);
> +  if (!set)
> +    return false;
> +
> +  rtx dest = SET_DEST (set);
> +  rtx src = SET_SRC (set);
> +  machine_mode mode = GET_MODE (dest);
> +
> +  if (!REG_P (dest) && !SUBREG_P (dest))
> +    return false;
> +
> +  switch (mode)
> +    {
> +    case DFmode:
> +    case SFmode:
> +    case V2DFmode:
> +      return xxspltidp_operand (src, mode);
> +
> +    default:
> +      break;
> +    }
> +
> +  return false;
> +}
> +
ok


>  /* Whether the next instruction needs a 'p' prefix issued before the
>     instruction is printed out.  */
>  static bool prepend_p_to_next_insn;
> diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
> index a84438f8545..bf3bfed3b88 100644
> --- a/gcc/config/rs6000/rs6000.md
> +++ b/gcc/config/rs6000/rs6000.md
> @@ -314,6 +314,11 @@ (define_attr "prefixed" "no,yes"
> 
>        (eq_attr "type" "integer,add")
>        (if_then_else (match_test "prefixed_paddi_p (insn)")
> +                    (const_string "yes")
> +                    (const_string "no"))
> +
> +      (eq_attr "type" "vecperm")
> +      (if_then_else (match_test "prefixed_permute_p (insn)")
>                      (const_string "yes")
>                      (const_string "no"))]
> 
> @@ -7723,17 +7728,17 @@ (define_split
>  ;;
>  ;;   LWZ          LFS        LXSSP       LXSSPX     STFS       STXSSP
>  ;;   STXSSPX      STW        XXLXOR      LI         FMR        XSCPSGNDP
> -;;   MR           MT<x>      MF<x>       NOP
> +;;   MR           MT<x>      MF<x>       NOP        XXSPLTIDP
> 
>  (define_insn "movsf_hardfloat"
>    [(set (match_operand:SF 0 "nonimmediate_operand"
>        "=!r,       f,         v,          wa,        m,         wY,
>         Z,         m,         wa,         !r,        f,         wa,
> -       !r,        *c*l,      !r,         *h")
> +       !r,        *c*l,      !r,         *h,        wa")
>       (match_operand:SF 1 "input_operand"
>        "m,         m,         wY,         Z,         f,         v,
>         wa,        r,         j,          j,         f,         wa,
> -       r,         r,         *h,         0"))]
> +       r,         r,         *h,         0,         eF"))]
>    "(register_operand (operands[0], SFmode)
>     || register_operand (operands[1], SFmode))
>     && TARGET_HARD_FLOAT
> @@ -7755,15 +7760,16 @@ (define_insn "movsf_hardfloat"
>     mr %0,%1
>     mt%0 %1
>     mf%1 %0
> -   nop"
> +   nop
> +   #"
>    [(set_attr "type"
>       "load,       fpload,    fpload,     fpload,    fpstore,   fpstore,
>        fpstore,    store,     veclogical, integer,   fpsimple,  fpsimple,
> -      *,          mtjmpr,    mfjmpr,     *")
> +      *,          mtjmpr,    mfjmpr,     *,         vecperm")
>     (set_attr "isa"
>       "*,          *,         p9v,        p8v,       *,         p9v,
>        p8v,        *,         *,          *,         *,         *,
> -      *,          *,         *,          *")])
> +      *,          *,         *,          *,         p10")])

OK, i think.   The addition of vecperm for type and p10 for the isa
entries catch my eye, but I expect this is obvious to others.  

> 
>  ;;   LWZ          LFIWZX     STW        STFIWX     MTVSRWZ    MFVSRWZ
>  ;;   FMR          MR         MT%0       MF%1       NOP
> @@ -8023,18 +8029,18 @@ (define_split
> 
>  ;;           STFD         LFD         FMR         LXSD        STXSD
>  ;;           LXSD         STXSD       XXLOR       XXLXOR      GPR<-0
> -;;           LWZ          STW         MR
> +;;           LWZ          STW         MR          XXSPLTIDP
> 
> 
>  (define_insn "*mov<mode>_hardfloat32"
>    [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
>              "=m,          d,          d,          <f64_p9>,   wY,
>                <f64_av>,   Z,          <f64_vsx>,  <f64_vsx>,  !r,
> -              Y,          r,          !r")
> +              Y,          r,          !r,         wa")
>       (match_operand:FMOVE64 1 "input_operand"
>               "d,          m,          d,          wY,         <f64_p9>,
>                Z,          <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
> -              r,          Y,          r"))]
> +              r,          Y,          r,          eF"))]
>    "! TARGET_POWERPC64 && TARGET_HARD_FLOAT
>     && (gpc_reg_operand (operands[0], <MODE>mode)
>         || gpc_reg_operand (operands[1], <MODE>mode))"
> @@ -8051,20 +8057,21 @@ (define_insn "*mov<mode>_hardfloat32"
>     #
>     #
>     #
> +   #
>     #"
>    [(set_attr "type"
>              "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
>               fpload,      fpstore,    veclogical, veclogical, two,
> -             store,       load,       two")
> +             store,       load,       two,        vecperm")
>     (set_attr "size" "64")
>     (set_attr "length"
>              "*,           *,          *,          *,          *,
>               *,           *,          *,          *,          8,
> -             8,           8,          8")
> +             8,           8,          8,          *")
>     (set_attr "isa"
>              "*,           *,          *,          p9v,        p9v,
>               p7v,         p7v,        *,          *,          *,
> -             *,           *,          *")])
> +             *,           *,          *,          p10")])
> 
>  ;;           STW      LWZ     MR      G-const H-const F-const
> 
> @@ -8091,19 +8098,19 @@ (define_insn "*mov<mode>_softfloat32"
>  ;;           STFD         LFD         FMR         LXSD        STXSD
>  ;;           LXSDX        STXSDX      XXLOR       XXLXOR      LI 0
>  ;;           STD          LD          MR          MT{CTR,LR}  MF{CTR,LR}
> -;;           NOP          MFVSRD      MTVSRD
> +;;           NOP          MFVSRD      MTVSRD      XXSPLTIDP
> 
>  (define_insn "*mov<mode>_hardfloat64"
>    [(set (match_operand:FMOVE64 0 "nonimmediate_operand"
>             "=m,           d,          d,          <f64_p9>,   wY,
>               <f64_av>,    Z,          <f64_vsx>,  <f64_vsx>,  !r,
>               YZ,          r,          !r,         *c*l,       !r,
> -            *h,           r,          <f64_dm>")
> +            *h,           r,          <f64_dm>,   wa")
>       (match_operand:FMOVE64 1 "input_operand"
>              "d,           m,          d,          wY,         <f64_p9>,
>               Z,           <f64_av>,   <f64_vsx>,  <zero_fp>,  <zero_fp>,
>               r,           YZ,         r,          r,          *h,
> -             0,           <f64_dm>,   r"))]
> +             0,           <f64_dm>,   r,          eF"))]
>    "TARGET_POWERPC64 && TARGET_HARD_FLOAT
>     && (gpc_reg_operand (operands[0], <MODE>mode)
>         || gpc_reg_operand (operands[1], <MODE>mode))"
> @@ -8125,18 +8132,19 @@ (define_insn "*mov<mode>_hardfloat64"
>     mf%1 %0
>     nop
>     mfvsrd %0,%x1
> -   mtvsrd %x0,%1"
> +   mtvsrd %x0,%1
> +   #"
>    [(set_attr "type"
>              "fpstore,     fpload,     fpsimple,   fpload,     fpstore,
>               fpload,      fpstore,    veclogical, veclogical, integer,
>               store,       load,       *,          mtjmpr,     mfjmpr,
> -             *,           mfvsr,      mtvsr")
> +             *,           mfvsr,      mtvsr,      vecperm")
>     (set_attr "size" "64")
>     (set_attr "isa"
>              "*,           *,          *,          p9v,        p9v,
>               p7v,         p7v,        *,          *,          *,
>               *,           *,          *,          *,          *,
> -             *,           p8v,        p8v")])
> +             *,           p8v,        p8v,        p10")])
> 
>  ;;           STD      LD       MR      MT<SPR> MF<SPR> G-const
>  ;;           H-const  F-const  Special


Ok.

> @@ -8170,6 +8178,7 @@ (define_insn "*mov<mode>_softfloat64"
>     (set_attr "length"
>              "*,       *,      *,      *,      *,      8,
>               12,      16,     *")])
> +
>  

Unnecessarily blank line?


>  (define_expand "mov<mode>"
>    [(set (match_operand:FMOVE128 0 "general_operand")
> diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
> index 0538db387dc..928c4fafe07 100644
> --- a/gcc/config/rs6000/rs6000.opt
> +++ b/gcc/config/rs6000/rs6000.opt
> @@ -639,3 +639,7 @@ Enable instructions that guard against return-oriented 
> programming attacks.
>  mprivileged
>  Target Var(rs6000_privileged) Init(0)
>  Generate code that will run in privileged state.
> +
> +mxxspltidp
> +Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save
> +Generate (do not generate) XXSPLTIDP instructions.


Ok.


> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
> index bf033e31c1c..af9a04870d4 100644
> --- a/gcc/config/rs6000/vsx.md
> +++ b/gcc/config/rs6000/vsx.md
> @@ -1191,16 +1191,19 @@ (define_insn_and_split "*xxspltib_<mode>_split"
>  ;; instruction). But generate XXLXOR/XXLORC if it will avoid a register move.
> 
>  ;;              VSX store  VSX load   VSX move  VSX->GPR   GPR->VSX    LQ 
> (GPR)
> +;;           XXSPLTIDP
>  ;;              STQ (GPR)  GPR load   GPR store GPR move   XXSPLTIB    
> VSPLTISW
>  ;;              VSX 0/-1   VMX const  GPR const LVX (VMX)  STVX (VMX)
>  (define_insn "vsx_mov<mode>_64bit"
>    [(set (match_operand:VSX_M 0 "nonimmediate_operand"
>                 "=ZwO,      wa,        wa,        r,         we,        ?wQ,
> +                wa,
>                  ?&r,       ??r,       ??Y,       <??r>,     wa,        v,
>                  ?wa,       v,         <??r>,     wZ,        v")
> 
>       (match_operand:VSX_M 1 "input_operand" 
>                 "wa,        ZwO,       wa,        we,        r,         r,
> +                eF,
>                  wQ,        Y,         r,         r,         wE,        jwM,
>                  ?jwM,      W,         <nW>,      v,         wZ"))]
> 
> @@ -1212,36 +1215,44 @@ (define_insn "vsx_mov<mode>_64bit"
>  }
>    [(set_attr "type"
>                 "vecstore,  vecload,   vecsimple, mtvsr,     mfvsr,     load,
> +                vecperm,
>                  store,     load,      store,     *,         vecsimple, 
> vecsimple,
>                  vecsimple, *,         *,         vecstore,  vecload")
>     (set_attr "num_insns"
>                 "*,         *,         *,         2,         *,         2,
> +                *,
>                  2,         2,         2,         2,         *,         *,
>                  *,         5,         2,         *,         *")
>     (set_attr "max_prefixed_insns"
>                 "*,         *,         *,         *,         *,         2,
> +                *,
>                  2,         2,         2,         2,         *,         *,
>                  *,         *,         *,         *,         *")
>     (set_attr "length"
>                 "*,         *,         *,         8,         *,         8,
> +                *,
>                  8,         8,         8,         8,         *,         *,
>                  *,         20,        8,         *,         *")
>     (set_attr "isa"
>                 "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
> +                p10,
>                  *,         *,         *,         *,         p9v,       *,
>                  <VSisa>,   *,         *,         *,         *")])
> 
>  ;;              VSX store  VSX load   VSX move   GPR load   GPR store  GPR 
> move
> +;;           XXSPLTIDP
>  ;;              XXSPLTIB   VSPLTISW   VSX 0/-1   VMX const  GPR const
>  ;;              LVX (VMX)  STVX (VMX)
>  (define_insn "*vsx_mov<mode>_32bit"
>    [(set (match_operand:VSX_M 0 "nonimmediate_operand"
>                 "=ZwO,      wa,        wa,        ??r,       ??Y,       <??r>,
> +                wa,
>                  wa,        v,         ?wa,       v,         <??r>,
>                  wZ,        v")
> 
>       (match_operand:VSX_M 1 "input_operand" 
>                 "wa,        ZwO,       wa,        Y,         r,         r,
> +                eF,
>                  wE,        jwM,       ?jwM,      W,         <nW>,
>                  v,         wZ"))]
> 
> @@ -1253,14 +1264,17 @@ (define_insn "*vsx_mov<mode>_32bit"
>  }
>    [(set_attr "type"
>                 "vecstore,  vecload,   vecsimple, load,      store,    *,
> +                vecperm,
>                  vecsimple, vecsimple, vecsimple, *,         *,
>                  vecstore,  vecload")
>     (set_attr "length"
>                 "*,         *,         *,         16,        16,        16,
> +                *,
>                  *,         *,         *,         20,        16,
>                  *,         *")
>     (set_attr "isa"
>                 "<VSisa>,   <VSisa>,   <VSisa>,   *,         *,         *,
> +                p10,
>                  p9v,       *,         <VSisa>,   *,         *,
>                  *,         *")])
> 
ok

> @@ -4580,6 +4594,23 @@ (define_insn "vsx_splat_<mode>_reg"
>     mtvsrdd %x0,%1,%1"
>    [(set_attr "type" "vecperm,vecmove")])
> 
> +(define_insn "*vsx_splat_v2df_xxspltidp"
> +  [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa")
> +     (vec_duplicate:V2DF
> +      (match_operand:DF 1 "xxspltidp_operand" "eF")))]
> +  "TARGET_POWER10"
> +{
> +  HOST_WIDE_INT value;
> +
> +  if (!xxspltidp_constant_p (operands[1], DFmode, &value))
> +    gcc_unreachable ();
> +
> +  operands[2] = GEN_INT (value);
> +  return "xxspltidp %x0,%1";
> +}
> +  [(set_attr "type" "vecperm")
> +   (set_attr "prefixed" "yes")])
> +
>  (define_insn "vsx_splat_<mode>_mem"
>    [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa")
>       (vec_duplicate:VSX_D
> @@ -6449,15 +6480,40 @@ (define_expand "xxspltidp_v2df"
>    DONE;
>  })
> 
> -(define_insn "xxspltidp_v2df_inst"
> -  [(set (match_operand:V2DF 0 "register_operand" "=wa")
> -     (unspec:V2DF [(match_operand:SI 1 "c32bit_cint_operand" "n")]
> -                  UNSPEC_XXSPLTIDP))]
> +(define_mode_iterator XXSPLTIDP [SF DF V2DF])
> +
> +(define_insn "xxspltidp_<mode>_inst"
> +  [(set (match_operand:XXSPLTIDP 0 "register_operand" "=wa")
> +     (unspec:XXSPLTIDP [(match_operand:SI 1 "c32bit_cint_operand" "n")]
> +                       UNSPEC_XXSPLTIDP))]
>    "TARGET_POWER10"
>    "xxspltidp %x0,%1"
>    [(set_attr "type" "vecperm")
>     (set_attr "prefixed" "yes")])
> 
> +;; Generate the XXSPLTIDP instruction to support SFmode and DFmode scalar
> +;; constants and V2DF vector constants where both elements are the same.  The
> +;; constant has to be expressible as a SFmode constant that is not a SFmode
> +;; denormal value.
> +(define_insn_and_split "*xxspltidp_<mode>_internal"
> +  [(set (match_operand:XXSPLTIDP 0 "vsx_register_operand" "=wa")
> +     (match_operand:XXSPLTIDP 1 "xxspltidp_operand"     "eF"))]

Extra spaces there.


> +  "TARGET_POWER10"
> +  "#"
> +  "&& 1"
> +  [(set (match_operand:XXSPLTIDP 0 "vsx_register_operand")
> +     (unspec:XXSPLTIDP [(match_dup 2)] UNSPEC_XXSPLTIDP))]
> +{
> +  HOST_WIDE_INT value = 0;
> +
> +  if (!xxspltidp_constant_p (operands[1], <MODE>mode, &value))
> +    gcc_unreachable ();
> +
> +  operands[2] = GEN_INT (value);
> +}
> + [(set_attr "type" "vecperm")
> +  (set_attr "prefixed" "yes")])
> +
>  ;; XXSPLTI32DX built-in function support
>  (define_expand "xxsplti32dx_v4si"
>    [(set (match_operand:V4SI 0 "register_operand" "=wa")

ok


Just briefly looed at testcases.. nothing jumped out at me below.
Thanks
-Will

> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c 
> b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
> new file mode 100644
> index 00000000000..8f6e176f9af
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c
> @@ -0,0 +1,60 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> +
> +#include <math.h>
> +
> +/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
> +   instruction.  */
> +
> +double
> +scalar_double_0 (void)
> +{
> +  return 0.0;                        /* XXSPLTIB or XXLXOR.  */
> +}
> +
> +double
> +scalar_double_1 (void)
> +{
> +  return 1.0;                        /* XXSPLTIDP.  */
> +}
> +
> +#ifndef __FAST_MATH__
> +double
> +scalar_double_m0 (void)
> +{
> +  return -0.0;                       /* XXSPLTIDP.  */
> +}
> +
> +double
> +scalar_double_nan (void)
> +{
> +  return __builtin_nan (""); /* XXSPLTIDP.  */
> +}
> +
> +double
> +scalar_double_inf (void)
> +{
> +  return __builtin_inf ();   /* XXSPLTIDP.  */
> +}
> +
> +double
> +scalar_double_m_inf (void)   /* XXSPLTIDP.  */
> +{
> +  return - __builtin_inf ();
> +}
> +#endif
> +
> +double
> +scalar_double_pi (void)
> +{
> +  return M_PI;                       /* PLFD.  */
> +}
> +
> +double
> +scalar_double_denorm (void)
> +{
> +  return 0x1p-149f;          /* PLFD.  */
> +}
> +
> +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c 
> b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
> new file mode 100644
> index 00000000000..72504bdfbbd
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c
> @@ -0,0 +1,60 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> +
> +#include <math.h>
> +
> +/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP
> +   instruction.  */
> +
> +float
> +scalar_float_0 (void)
> +{
> +  return 0.0f;                       /* XXSPLTIB or XXLXOR.  */
> +}
> +
> +float
> +scalar_float_1 (void)
> +{
> +  return 1.0f;                       /* XXSPLTIDP.  */
> +}
> +
> +#ifndef __FAST_MATH__
> +float
> +scalar_float_m0 (void)
> +{
> +  return -0.0f;                      /* XXSPLTIDP.  */
> +}
> +
> +float
> +scalar_float_nan (void)
> +{
> +  return __builtin_nanf ("");        /* XXSPLTIDP.  */
> +}
> +
> +float
> +scalar_float_inf (void)
> +{
> +  return __builtin_inff ();  /* XXSPLTIDP.  */
> +}
> +
> +float
> +scalar_float_m_inf (void)    /* XXSPLTIDP.  */
> +{
> +  return - __builtin_inff ();
> +}
> +#endif
> +
> +float
> +scalar_float_pi (void)
> +{
> +  return (float)M_PI;                /* XXSPLTIDP.  */
> +}
> +
> +float
> +scalar_float_denorm (void)
> +{
> +  return 0x1p-149f;          /* PLFS.  */
> +}
> +
> +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */
> diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c 
> b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
> new file mode 100644
> index 00000000000..82ffc86f8aa
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c
> @@ -0,0 +1,64 @@
> +/* { dg-do compile } */
> +/* { dg-require-effective-target power10_ok } */
> +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */
> +
> +#include <math.h>
> +
> +/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP
> +   instruction.  */
> +
> +vector double
> +v2df_double_0 (void)
> +{
> +  return (vector double) { 0.0, 0.0 };                       /* XXSPLTIB or 
> XXLXOR.  */
> +}
> +
> +vector double
> +v2df_double_1 (void)
> +{
> +  return (vector double) { 1.0, 1.0 };                       /* XXSPLTIDP.  
> */
> +}
> +
> +#ifndef __FAST_MATH__
> +vector double
> +v2df_double_m0 (void)
> +{
> +  return (vector double) { -0.0, -0.0 };             /* XXSPLTIDP.  */
> +}
> +
> +vector double
> +v2df_double_nan (void)
> +{
> +  return (vector double) { __builtin_nan (""),
> +                        __builtin_nan ("") };        /* XXSPLTIDP.  */
> +}
> +
> +vector double
> +v2df_double_inf (void)
> +{
> +  return (vector double) { __builtin_inf (),
> +                        __builtin_inf () };          /* XXSPLTIDP.  */
> +}
> +
> +vector double
> +v2df_double_m_inf (void)
> +{
> +  return (vector double) { - __builtin_inf (),
> +                        - __builtin_inf () };        /* XXSPLTIDP.  */
> +}
> +#endif
> +
> +vector double
> +v2df_double_pi (void)
> +{
> +  return (vector double) { M_PI, M_PI };             /* PLVX.  */
> +}
> +
> +vector double
> +v2df_double_denorm (void)
> +{
> +  return (vector double) { (double)0x1p-149f,
> +                        (double)0x1p-149f };         /* PLVX.  */
> +}
> +
> +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */
> -- 
> 2.31.1
> 
>

Re: [PATCH] Generate XXSPLTIDP on power10.

Reply via email to