On Wed, 2021-08-25 at 15:46 -0400, Michael Meissner wrote: > Generate XXSPLTIDP on power10. > > This patch implements XXSPLTIDP support for SF and DF scalar constants and > V2DF > vector constants. The XXSPLTIDP instruction is given a 32-bit immediate that > is converted to a vector of two DFmode constants. The immediate is in SFmode > format, so only constants that fit as SFmode values can be loaded with > XXSPLTIDP.
ok > > I added a new constraint (eF) to match constants that can be loaded with the > XXSPLTIDP instruction. > > I have added a temporary switch (-mxxspltidp) to control whether or not the > XXSPLTIDP instruction is generated. How temporary? > > I added 3 new tests to test loading up SF/DF scalar and V2DF vector > constants. > > I have tested this with bootstrap compilers on power10 systems and there was > no > regression. I have built GCC with these patches on little endian power9 and > big endian power8 systems, and there were no regressions. > > In addition, I have built and run the full Spec 2017 rate suite, comparing > with > the patches enabled and not enabled. There were roughly 66,000 XXSPLTIDP's > generated in the rate build for Spec 2017. On a stand-alone system that is > running single threaded, blender_r has a 1.9% increase in performance, and > rest > of the benchmarks are performance neutral. However, I would expect that in a > real world scenario, switching to use XXSPLTIDP will increase performance due > to removing all of the loads. ok > > Can I check this into the master branch? > > 2021-08-25 Michael Meissner <meiss...@linux.ibm.com> > > gcc/ > * config/rs6000/constraints.md (eF): New constraint. > * config/rs6000/predicates.md (easy_fp_constant): If we can load > the scalar constant with XXSPLTIDP, the floating point constant is > easy. Could be shortened to something like ? Add clause to accept xxspltidp_operand as easy. > (xxspltidp_operand): New predicate. Will there ever be another instruction using the SF/DF CONST_DOUBLE or V2DF CONST_VECTOR ? I tentatively question the name of the operand, but defer.. > (easy_vector_constant): If we can generate XXSPLTIDP, mark the > vector constant as easy. Duplicated from above. > * config/rs6000/rs6000-protos.h (xxspltidp_constant_p): New > declaration. > (prefixed_permute_p): Likewise. > * config/rs6000/rs6000.c (xxspltidp_constant_p): New function. > (output_vec_const_move): Add support for XXSPLTIDP. > (prefixed_permute_p): New function. Duplicated. > * config/rs6000/rs6000.md (prefixed attribute): Add support for > permute prefixed instructions. > (movsf_hardfloat): Add XXSPLTIDP support. > (mov<mode>_hardfloat32, FMOVE64 iterator): Likewise. > (mov<mode>_hardfloat64, FMOVE64 iterator): Likewise. > * config/rs6000/rs6000.opt (-mxxspltidp): New switch. > * config/rs6000/vsx.md (vsx_move<mode>_64bit): Add XXSPLTIDP > support. > (vsx_move<mode>_32bit): Likewise. No e in mov (per patch contents below). > (vsx_splat_v2df_xxspltidp): New insn. > (XXSPLTIDP): New mode iterator. > (xxspltidp_<mode>_internal): New insn and splits. > (xxspltidp_<mode>_inst): Replace xxspltidp_v2df_inst with an > iterated form that also does SFmode, and DFmode. Swap "an iterated form" with "xxspltidp_<mode>_inst ? > > gcc/testsuite/ > * gcc.target/powerpc/vec-splat-constant-sf.c: New test. > * gcc.target/powerpc/vec-splat-constant-df.c: New test. > * gcc.target/powerpc/vec-splat-constant-v2df.c: New test. > --- > gcc/config/rs6000/constraints.md | 5 + > gcc/config/rs6000/predicates.md | 17 +++ > gcc/config/rs6000/rs6000-protos.h | 2 + > gcc/config/rs6000/rs6000.c | 106 ++++++++++++++++++ > gcc/config/rs6000/rs6000.md | 45 +++++--- > gcc/config/rs6000/rs6000.opt | 4 + > gcc/config/rs6000/vsx.md | 64 ++++++++++- > .../powerpc/vec-splat-constant-df.c | 60 ++++++++++ > .../powerpc/vec-splat-constant-sf.c | 60 ++++++++++ > .../powerpc/vec-splat-constant-v2df.c | 64 +++++++++++ > 10 files changed, 405 insertions(+), 22 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c > > diff --git a/gcc/config/rs6000/constraints.md > b/gcc/config/rs6000/constraints.md > index c8cff1a3038..ea2e4a267c3 100644 > --- a/gcc/config/rs6000/constraints.md > +++ b/gcc/config/rs6000/constraints.md > @@ -208,6 +208,11 @@ (define_constraint "P" > (and (match_code "const_int") > (match_test "((- (unsigned HOST_WIDE_INT) ival) + 0x8000) < > 0x10000"))) > > +;; SF/DF/V2DF scalar or vector constant that can be loaded with XXSPLTIDP > +(define_constraint "eF" > + "A vector constant that can be loaded with the XXSPLTIDP instruction." > + (match_operand 0 "xxspltidp_operand")) > + > ;; 34-bit signed integer constant > (define_constraint "eI" > "A signed 34-bit integer constant if prefixed instructions are supported." > diff --git a/gcc/config/rs6000/predicates.md b/gcc/config/rs6000/predicates.md > index 956e42bc514..134243e404b 100644 > --- a/gcc/config/rs6000/predicates.md > +++ b/gcc/config/rs6000/predicates.md > @@ -601,6 +601,11 @@ (define_predicate "easy_fp_constant" > if (TARGET_VSX && op == CONST0_RTX (mode)) > return 1; > > + /* If we have the ISA 3.1 XXSPLTIDP instruction, see if the constant can > + be loaded with that instruction. */ > + if (xxspltidp_operand (op, mode)) > + return 1; > + > /* Otherwise consider floating point constants hard, so that the > constant gets pushed to memory during the early RTL phases. This > has the advantage that double precision constants that can be > @@ -640,6 +645,15 @@ (define_predicate "xxspltib_constant_nosplit" > return num_insns == 1; > }) > > +;; Return 1 if operand is a SF/DF CONST_DOUBLE or V2DF CONST_VECTOR that can > be > +;; loaded via the ISA 3.1 XXSPLTIDP instruction. "Return 1 if" doesnt seem right given the return statement here. > +(define_predicate "xxspltidp_operand" > + (match_code "const_double,const_vector,vec_duplicate") > +{ > + HOST_WIDE_INT value = 0; > + return xxspltidp_constant_p (op, mode, &value); > +}) > + > ;; Return 1 if the operand is a CONST_VECTOR and can be loaded into a > ;; vector register without using memory. > (define_predicate "easy_vector_constant" > @@ -653,6 +667,9 @@ (define_predicate "easy_vector_constant" > if (zero_constant (op, mode) || all_ones_constant (op, mode)) > return true; > > + if (xxspltidp_operand (op, mode)) > + return true; > + > if (TARGET_P9_VECTOR > && xxspltib_constant_p (op, mode, &num_insns, &value)) > return true; > diff --git a/gcc/config/rs6000/rs6000-protos.h > b/gcc/config/rs6000/rs6000-protos.h > index 14f6b313105..9bba57c22f2 100644 > --- a/gcc/config/rs6000/rs6000-protos.h > +++ b/gcc/config/rs6000/rs6000-protos.h > @@ -32,6 +32,7 @@ extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, > rtx, int, int, int, > > extern int easy_altivec_constant (rtx, machine_mode); > extern bool xxspltib_constant_p (rtx, machine_mode, int *, int *); > +extern bool xxspltidp_constant_p (rtx, machine_mode, HOST_WIDE_INT *); > extern int vspltis_shifted (rtx); > extern HOST_WIDE_INT const_vector_elt_as_int (rtx, unsigned int); > extern bool macho_lo_sum_memory_operand (rtx, machine_mode); > @@ -198,6 +199,7 @@ enum non_prefixed_form reg_to_non_prefixed (rtx reg, > machine_mode mode); > extern bool prefixed_load_p (rtx_insn *); > extern bool prefixed_store_p (rtx_insn *); > extern bool prefixed_paddi_p (rtx_insn *); > +extern bool prefixed_permute_p (rtx_insn *); > extern void rs6000_asm_output_opcode (FILE *); > extern void output_pcrel_opt_reloc (rtx); > extern void rs6000_final_prescan_insn (rtx_insn *, rtx [], int); > diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c > index e073b26b430..322b3c83925 100644 > --- a/gcc/config/rs6000/rs6000.c > +++ b/gcc/config/rs6000/rs6000.c > @@ -6533,6 +6533,74 @@ xxspltib_constant_p (rtx op, > return true; > } > > +/* Return true if OP is of the given MODE and can be synthesized with ISA 3.1 > + XXSPLTIDP instruction. > + > + Return the constant that is being split via CONSTANT_PTR to use in the > + XXSPLTIDP instruction. */ Appears to return true or false. Is the "Return the constant" comment meant to go on the predicate definition earlier? > + > +bool > +xxspltidp_constant_p (rtx op, > + machine_mode mode, > + HOST_WIDE_INT *constant_ptr) > +{ > + *constant_ptr = 0; > + > + if (!TARGET_XXSPLTIDP || !TARGET_PREFIXED || !TARGET_VSX) > + return false; > + > + if (mode == VOIDmode) > + mode = GET_MODE (op); > + > + rtx element = op; > + if (mode == V2DFmode) > + { > + if (CONST_VECTOR_P (op)) > + { > + element = CONST_VECTOR_ELT (op, 0); > + if (!rtx_equal_p (element, CONST_VECTOR_ELT (op, 1))) > + return false; > + } > + > + else if (GET_CODE (op) == VEC_DUPLICATE) > + element = XEXP (op, 0); > + > + else > + return false; > + > + mode = DFmode; > + } > + > + if (mode != SFmode && mode != DFmode) > + return false; > + > + if (GET_MODE (element) != mode) > + return false; > + > + if (!CONST_DOUBLE_P (element)) > + return false; > + > + /* Don't return true for 0.0 since that is easy to create without > + XXSPLTIDP. */ > + if (element == CONST0_RTX (mode)) > + return false; > + > + /* If the value doesn't fit in a SFmode, exactly, we can't use XXSPLTIDP. > */ > + const struct real_value *rv = CONST_DOUBLE_REAL_VALUE (element); > + if (!exact_real_truncate (SFmode, rv)) > + return false; The 'exactly' caught my eye. Per a glance at comments in extract_real_truncate this indicates that the value is identical after conversion to the new format. Ok. > + > + long value; > + REAL_VALUE_TO_TARGET_SINGLE (*rv, value); > + > + /* Test for SFmode denormal (exponent is 0, mantissa field is non-zero). > */ > + if (((value & 0x7F800000) == 0) && ((value & 0x7FFFFF) != 0)) > + return false; > + > + *constant_ptr = value; > + return true; > +} ok > + > const char * > output_vec_const_move (rtx *operands) > { > @@ -6548,6 +6616,7 @@ output_vec_const_move (rtx *operands) > { > bool dest_vmx_p = ALTIVEC_REGNO_P (REGNO (dest)); > int xxspltib_value = 256; > + HOST_WIDE_INT xxspltidp_value = 0; > int num_insns = -1; > > if (zero_constant (vec, mode)) > @@ -6577,6 +6646,12 @@ output_vec_const_move (rtx *operands) > gcc_unreachable (); > } > > + if (xxspltidp_constant_p (vec, mode, &xxspltidp_value)) > + { > + operands[2] = GEN_INT (xxspltidp_value); > + return "xxspltidp %x0,%2"; > + } > + > if (TARGET_P9_VECTOR > && xxspltib_constant_p (vec, mode, &num_insns, &xxspltib_value)) > { ok > @@ -26219,6 +26294,37 @@ prefixed_paddi_p (rtx_insn *insn) > return (iform == INSN_FORM_PCREL_EXTERNAL || iform == > INSN_FORM_PCREL_LOCAL); > } > > +/* Whether a permute type instruction is a prefixed instruction. This is > + called from the prefixed attribute processing. */ > + > +bool > +prefixed_permute_p (rtx_insn *insn) > +{ > + rtx set = single_set (insn); > + if (!set) > + return false; > + > + rtx dest = SET_DEST (set); > + rtx src = SET_SRC (set); > + machine_mode mode = GET_MODE (dest); > + > + if (!REG_P (dest) && !SUBREG_P (dest)) > + return false; > + > + switch (mode) > + { > + case DFmode: > + case SFmode: > + case V2DFmode: > + return xxspltidp_operand (src, mode); > + > + default: > + break; > + } > + > + return false; > +} > + ok > /* Whether the next instruction needs a 'p' prefix issued before the > instruction is printed out. */ > static bool prepend_p_to_next_insn; > diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md > index a84438f8545..bf3bfed3b88 100644 > --- a/gcc/config/rs6000/rs6000.md > +++ b/gcc/config/rs6000/rs6000.md > @@ -314,6 +314,11 @@ (define_attr "prefixed" "no,yes" > > (eq_attr "type" "integer,add") > (if_then_else (match_test "prefixed_paddi_p (insn)") > + (const_string "yes") > + (const_string "no")) > + > + (eq_attr "type" "vecperm") > + (if_then_else (match_test "prefixed_permute_p (insn)") > (const_string "yes") > (const_string "no"))] > > @@ -7723,17 +7728,17 @@ (define_split > ;; > ;; LWZ LFS LXSSP LXSSPX STFS STXSSP > ;; STXSSPX STW XXLXOR LI FMR XSCPSGNDP > -;; MR MT<x> MF<x> NOP > +;; MR MT<x> MF<x> NOP XXSPLTIDP > > (define_insn "movsf_hardfloat" > [(set (match_operand:SF 0 "nonimmediate_operand" > "=!r, f, v, wa, m, wY, > Z, m, wa, !r, f, wa, > - !r, *c*l, !r, *h") > + !r, *c*l, !r, *h, wa") > (match_operand:SF 1 "input_operand" > "m, m, wY, Z, f, v, > wa, r, j, j, f, wa, > - r, r, *h, 0"))] > + r, r, *h, 0, eF"))] > "(register_operand (operands[0], SFmode) > || register_operand (operands[1], SFmode)) > && TARGET_HARD_FLOAT > @@ -7755,15 +7760,16 @@ (define_insn "movsf_hardfloat" > mr %0,%1 > mt%0 %1 > mf%1 %0 > - nop" > + nop > + #" > [(set_attr "type" > "load, fpload, fpload, fpload, fpstore, fpstore, > fpstore, store, veclogical, integer, fpsimple, fpsimple, > - *, mtjmpr, mfjmpr, *") > + *, mtjmpr, mfjmpr, *, vecperm") > (set_attr "isa" > "*, *, p9v, p8v, *, p9v, > p8v, *, *, *, *, *, > - *, *, *, *")]) > + *, *, *, *, p10")]) OK, i think. The addition of vecperm for type and p10 for the isa entries catch my eye, but I expect this is obvious to others. > > ;; LWZ LFIWZX STW STFIWX MTVSRWZ MFVSRWZ > ;; FMR MR MT%0 MF%1 NOP > @@ -8023,18 +8029,18 @@ (define_split > > ;; STFD LFD FMR LXSD STXSD > ;; LXSD STXSD XXLOR XXLXOR GPR<-0 > -;; LWZ STW MR > +;; LWZ STW MR XXSPLTIDP > > > (define_insn "*mov<mode>_hardfloat32" > [(set (match_operand:FMOVE64 0 "nonimmediate_operand" > "=m, d, d, <f64_p9>, wY, > <f64_av>, Z, <f64_vsx>, <f64_vsx>, !r, > - Y, r, !r") > + Y, r, !r, wa") > (match_operand:FMOVE64 1 "input_operand" > "d, m, d, wY, <f64_p9>, > Z, <f64_av>, <f64_vsx>, <zero_fp>, <zero_fp>, > - r, Y, r"))] > + r, Y, r, eF"))] > "! TARGET_POWERPC64 && TARGET_HARD_FLOAT > && (gpc_reg_operand (operands[0], <MODE>mode) > || gpc_reg_operand (operands[1], <MODE>mode))" > @@ -8051,20 +8057,21 @@ (define_insn "*mov<mode>_hardfloat32" > # > # > # > + # > #" > [(set_attr "type" > "fpstore, fpload, fpsimple, fpload, fpstore, > fpload, fpstore, veclogical, veclogical, two, > - store, load, two") > + store, load, two, vecperm") > (set_attr "size" "64") > (set_attr "length" > "*, *, *, *, *, > *, *, *, *, 8, > - 8, 8, 8") > + 8, 8, 8, *") > (set_attr "isa" > "*, *, *, p9v, p9v, > p7v, p7v, *, *, *, > - *, *, *")]) > + *, *, *, p10")]) > > ;; STW LWZ MR G-const H-const F-const > > @@ -8091,19 +8098,19 @@ (define_insn "*mov<mode>_softfloat32" > ;; STFD LFD FMR LXSD STXSD > ;; LXSDX STXSDX XXLOR XXLXOR LI 0 > ;; STD LD MR MT{CTR,LR} MF{CTR,LR} > -;; NOP MFVSRD MTVSRD > +;; NOP MFVSRD MTVSRD XXSPLTIDP > > (define_insn "*mov<mode>_hardfloat64" > [(set (match_operand:FMOVE64 0 "nonimmediate_operand" > "=m, d, d, <f64_p9>, wY, > <f64_av>, Z, <f64_vsx>, <f64_vsx>, !r, > YZ, r, !r, *c*l, !r, > - *h, r, <f64_dm>") > + *h, r, <f64_dm>, wa") > (match_operand:FMOVE64 1 "input_operand" > "d, m, d, wY, <f64_p9>, > Z, <f64_av>, <f64_vsx>, <zero_fp>, <zero_fp>, > r, YZ, r, r, *h, > - 0, <f64_dm>, r"))] > + 0, <f64_dm>, r, eF"))] > "TARGET_POWERPC64 && TARGET_HARD_FLOAT > && (gpc_reg_operand (operands[0], <MODE>mode) > || gpc_reg_operand (operands[1], <MODE>mode))" > @@ -8125,18 +8132,19 @@ (define_insn "*mov<mode>_hardfloat64" > mf%1 %0 > nop > mfvsrd %0,%x1 > - mtvsrd %x0,%1" > + mtvsrd %x0,%1 > + #" > [(set_attr "type" > "fpstore, fpload, fpsimple, fpload, fpstore, > fpload, fpstore, veclogical, veclogical, integer, > store, load, *, mtjmpr, mfjmpr, > - *, mfvsr, mtvsr") > + *, mfvsr, mtvsr, vecperm") > (set_attr "size" "64") > (set_attr "isa" > "*, *, *, p9v, p9v, > p7v, p7v, *, *, *, > *, *, *, *, *, > - *, p8v, p8v")]) > + *, p8v, p8v, p10")]) > > ;; STD LD MR MT<SPR> MF<SPR> G-const > ;; H-const F-const Special Ok. > @@ -8170,6 +8178,7 @@ (define_insn "*mov<mode>_softfloat64" > (set_attr "length" > "*, *, *, *, *, 8, > 12, 16, *")]) > + > Unnecessarily blank line? > (define_expand "mov<mode>" > [(set (match_operand:FMOVE128 0 "general_operand") > diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt > index 0538db387dc..928c4fafe07 100644 > --- a/gcc/config/rs6000/rs6000.opt > +++ b/gcc/config/rs6000/rs6000.opt > @@ -639,3 +639,7 @@ Enable instructions that guard against return-oriented > programming attacks. > mprivileged > Target Var(rs6000_privileged) Init(0) > Generate code that will run in privileged state. > + > +mxxspltidp > +Target Undocumented Var(TARGET_XXSPLTIDP) Init(1) Save > +Generate (do not generate) XXSPLTIDP instructions. Ok. > diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md > index bf033e31c1c..af9a04870d4 100644 > --- a/gcc/config/rs6000/vsx.md > +++ b/gcc/config/rs6000/vsx.md > @@ -1191,16 +1191,19 @@ (define_insn_and_split "*xxspltib_<mode>_split" > ;; instruction). But generate XXLXOR/XXLORC if it will avoid a register move. > > ;; VSX store VSX load VSX move VSX->GPR GPR->VSX LQ > (GPR) > +;; XXSPLTIDP > ;; STQ (GPR) GPR load GPR store GPR move XXSPLTIB > VSPLTISW > ;; VSX 0/-1 VMX const GPR const LVX (VMX) STVX (VMX) > (define_insn "vsx_mov<mode>_64bit" > [(set (match_operand:VSX_M 0 "nonimmediate_operand" > "=ZwO, wa, wa, r, we, ?wQ, > + wa, > ?&r, ??r, ??Y, <??r>, wa, v, > ?wa, v, <??r>, wZ, v") > > (match_operand:VSX_M 1 "input_operand" > "wa, ZwO, wa, we, r, r, > + eF, > wQ, Y, r, r, wE, jwM, > ?jwM, W, <nW>, v, wZ"))] > > @@ -1212,36 +1215,44 @@ (define_insn "vsx_mov<mode>_64bit" > } > [(set_attr "type" > "vecstore, vecload, vecsimple, mtvsr, mfvsr, load, > + vecperm, > store, load, store, *, vecsimple, > vecsimple, > vecsimple, *, *, vecstore, vecload") > (set_attr "num_insns" > "*, *, *, 2, *, 2, > + *, > 2, 2, 2, 2, *, *, > *, 5, 2, *, *") > (set_attr "max_prefixed_insns" > "*, *, *, *, *, 2, > + *, > 2, 2, 2, 2, *, *, > *, *, *, *, *") > (set_attr "length" > "*, *, *, 8, *, 8, > + *, > 8, 8, 8, 8, *, *, > *, 20, 8, *, *") > (set_attr "isa" > "<VSisa>, <VSisa>, <VSisa>, *, *, *, > + p10, > *, *, *, *, p9v, *, > <VSisa>, *, *, *, *")]) > > ;; VSX store VSX load VSX move GPR load GPR store GPR > move > +;; XXSPLTIDP > ;; XXSPLTIB VSPLTISW VSX 0/-1 VMX const GPR const > ;; LVX (VMX) STVX (VMX) > (define_insn "*vsx_mov<mode>_32bit" > [(set (match_operand:VSX_M 0 "nonimmediate_operand" > "=ZwO, wa, wa, ??r, ??Y, <??r>, > + wa, > wa, v, ?wa, v, <??r>, > wZ, v") > > (match_operand:VSX_M 1 "input_operand" > "wa, ZwO, wa, Y, r, r, > + eF, > wE, jwM, ?jwM, W, <nW>, > v, wZ"))] > > @@ -1253,14 +1264,17 @@ (define_insn "*vsx_mov<mode>_32bit" > } > [(set_attr "type" > "vecstore, vecload, vecsimple, load, store, *, > + vecperm, > vecsimple, vecsimple, vecsimple, *, *, > vecstore, vecload") > (set_attr "length" > "*, *, *, 16, 16, 16, > + *, > *, *, *, 20, 16, > *, *") > (set_attr "isa" > "<VSisa>, <VSisa>, <VSisa>, *, *, *, > + p10, > p9v, *, <VSisa>, *, *, > *, *")]) > ok > @@ -4580,6 +4594,23 @@ (define_insn "vsx_splat_<mode>_reg" > mtvsrdd %x0,%1,%1" > [(set_attr "type" "vecperm,vecmove")]) > > +(define_insn "*vsx_splat_v2df_xxspltidp" > + [(set (match_operand:V2DF 0 "vsx_register_operand" "=wa") > + (vec_duplicate:V2DF > + (match_operand:DF 1 "xxspltidp_operand" "eF")))] > + "TARGET_POWER10" > +{ > + HOST_WIDE_INT value; > + > + if (!xxspltidp_constant_p (operands[1], DFmode, &value)) > + gcc_unreachable (); > + > + operands[2] = GEN_INT (value); > + return "xxspltidp %x0,%1"; > +} > + [(set_attr "type" "vecperm") > + (set_attr "prefixed" "yes")]) > + > (define_insn "vsx_splat_<mode>_mem" > [(set (match_operand:VSX_D 0 "vsx_register_operand" "=wa") > (vec_duplicate:VSX_D > @@ -6449,15 +6480,40 @@ (define_expand "xxspltidp_v2df" > DONE; > }) > > -(define_insn "xxspltidp_v2df_inst" > - [(set (match_operand:V2DF 0 "register_operand" "=wa") > - (unspec:V2DF [(match_operand:SI 1 "c32bit_cint_operand" "n")] > - UNSPEC_XXSPLTIDP))] > +(define_mode_iterator XXSPLTIDP [SF DF V2DF]) > + > +(define_insn "xxspltidp_<mode>_inst" > + [(set (match_operand:XXSPLTIDP 0 "register_operand" "=wa") > + (unspec:XXSPLTIDP [(match_operand:SI 1 "c32bit_cint_operand" "n")] > + UNSPEC_XXSPLTIDP))] > "TARGET_POWER10" > "xxspltidp %x0,%1" > [(set_attr "type" "vecperm") > (set_attr "prefixed" "yes")]) > > +;; Generate the XXSPLTIDP instruction to support SFmode and DFmode scalar > +;; constants and V2DF vector constants where both elements are the same. The > +;; constant has to be expressible as a SFmode constant that is not a SFmode > +;; denormal value. > +(define_insn_and_split "*xxspltidp_<mode>_internal" > + [(set (match_operand:XXSPLTIDP 0 "vsx_register_operand" "=wa") > + (match_operand:XXSPLTIDP 1 "xxspltidp_operand" "eF"))] Extra spaces there. > + "TARGET_POWER10" > + "#" > + "&& 1" > + [(set (match_operand:XXSPLTIDP 0 "vsx_register_operand") > + (unspec:XXSPLTIDP [(match_dup 2)] UNSPEC_XXSPLTIDP))] > +{ > + HOST_WIDE_INT value = 0; > + > + if (!xxspltidp_constant_p (operands[1], <MODE>mode, &value)) > + gcc_unreachable (); > + > + operands[2] = GEN_INT (value); > +} > + [(set_attr "type" "vecperm") > + (set_attr "prefixed" "yes")]) > + > ;; XXSPLTI32DX built-in function support > (define_expand "xxsplti32dx_v4si" > [(set (match_operand:V4SI 0 "register_operand" "=wa") ok Just briefly looed at testcases.. nothing jumped out at me below. Thanks -Will > diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c > b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c > new file mode 100644 > index 00000000000..8f6e176f9af > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-df.c > @@ -0,0 +1,60 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target power10_ok } */ > +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ > + > +#include <math.h> > + > +/* Test generating DFmode constants with the ISA 3.1 (power10) XXSPLTIDP > + instruction. */ > + > +double > +scalar_double_0 (void) > +{ > + return 0.0; /* XXSPLTIB or XXLXOR. */ > +} > + > +double > +scalar_double_1 (void) > +{ > + return 1.0; /* XXSPLTIDP. */ > +} > + > +#ifndef __FAST_MATH__ > +double > +scalar_double_m0 (void) > +{ > + return -0.0; /* XXSPLTIDP. */ > +} > + > +double > +scalar_double_nan (void) > +{ > + return __builtin_nan (""); /* XXSPLTIDP. */ > +} > + > +double > +scalar_double_inf (void) > +{ > + return __builtin_inf (); /* XXSPLTIDP. */ > +} > + > +double > +scalar_double_m_inf (void) /* XXSPLTIDP. */ > +{ > + return - __builtin_inf (); > +} > +#endif > + > +double > +scalar_double_pi (void) > +{ > + return M_PI; /* PLFD. */ > +} > + > +double > +scalar_double_denorm (void) > +{ > + return 0x1p-149f; /* PLFD. */ > +} > + > +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */ > diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c > b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c > new file mode 100644 > index 00000000000..72504bdfbbd > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-sf.c > @@ -0,0 +1,60 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target power10_ok } */ > +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ > + > +#include <math.h> > + > +/* Test generating SFmode constants with the ISA 3.1 (power10) XXSPLTIDP > + instruction. */ > + > +float > +scalar_float_0 (void) > +{ > + return 0.0f; /* XXSPLTIB or XXLXOR. */ > +} > + > +float > +scalar_float_1 (void) > +{ > + return 1.0f; /* XXSPLTIDP. */ > +} > + > +#ifndef __FAST_MATH__ > +float > +scalar_float_m0 (void) > +{ > + return -0.0f; /* XXSPLTIDP. */ > +} > + > +float > +scalar_float_nan (void) > +{ > + return __builtin_nanf (""); /* XXSPLTIDP. */ > +} > + > +float > +scalar_float_inf (void) > +{ > + return __builtin_inff (); /* XXSPLTIDP. */ > +} > + > +float > +scalar_float_m_inf (void) /* XXSPLTIDP. */ > +{ > + return - __builtin_inff (); > +} > +#endif > + > +float > +scalar_float_pi (void) > +{ > + return (float)M_PI; /* XXSPLTIDP. */ > +} > + > +float > +scalar_float_denorm (void) > +{ > + return 0x1p-149f; /* PLFS. */ > +} > + > +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 6 } } */ > diff --git a/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c > b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c > new file mode 100644 > index 00000000000..82ffc86f8aa > --- /dev/null > +++ b/gcc/testsuite/gcc.target/powerpc/vec-splat-constant-v2df.c > @@ -0,0 +1,64 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target power10_ok } */ > +/* { dg-options "-mdejagnu-cpu=power10 -O2" } */ > + > +#include <math.h> > + > +/* Test generating V2DFmode constants with the ISA 3.1 (power10) XXSPLTIDP > + instruction. */ > + > +vector double > +v2df_double_0 (void) > +{ > + return (vector double) { 0.0, 0.0 }; /* XXSPLTIB or > XXLXOR. */ > +} > + > +vector double > +v2df_double_1 (void) > +{ > + return (vector double) { 1.0, 1.0 }; /* XXSPLTIDP. > */ > +} > + > +#ifndef __FAST_MATH__ > +vector double > +v2df_double_m0 (void) > +{ > + return (vector double) { -0.0, -0.0 }; /* XXSPLTIDP. */ > +} > + > +vector double > +v2df_double_nan (void) > +{ > + return (vector double) { __builtin_nan (""), > + __builtin_nan ("") }; /* XXSPLTIDP. */ > +} > + > +vector double > +v2df_double_inf (void) > +{ > + return (vector double) { __builtin_inf (), > + __builtin_inf () }; /* XXSPLTIDP. */ > +} > + > +vector double > +v2df_double_m_inf (void) > +{ > + return (vector double) { - __builtin_inf (), > + - __builtin_inf () }; /* XXSPLTIDP. */ > +} > +#endif > + > +vector double > +v2df_double_pi (void) > +{ > + return (vector double) { M_PI, M_PI }; /* PLVX. */ > +} > + > +vector double > +v2df_double_denorm (void) > +{ > + return (vector double) { (double)0x1p-149f, > + (double)0x1p-149f }; /* PLVX. */ > +} > + > +/* { dg-final { scan-assembler-times {\mxxspltidp\M} 5 } } */ > -- > 2.31.1 > >