On Wed, 2020-07-08 at 12:59 -0700, Carl Love wrote: > [PATCH 6/6] rs6000 Add vector blend, permute builtin support > > ---------------------------------- > V4 Fixes: > > Rebased on mainline. Changed FUTURE to P10. > --------- > > v3 fixes: > Replace spaces with tabs in ChangeLog description. > Fix implementation comments for define_expand "xxpermx" in file > gcc/config/rs6000/alitvec.md. > Fix minor typos in the comments for the changes in > gcc/config/rs6000/rs6000-call.c. > > -------------------- > v2 changes: > > Updated ChangeLog per comments. > > Updated implementation of the define_expand "xxpermx". > > Fixed the comments and check for 3-bit immediate field for the > CODE_FOR_xxpermx check. > > gcc/doc/extend.texi: > comment "Maybe it should say it is related to vsel/xxsel, but > per > bigger element?", added comment. I took the description > directly > from spec. Don't really don't want to mess with the approved > description. > > fixed typo for Vector Permute Extendedextracth > > ---------- > > GCC maintainers: > > The following patch adds support for the vec_blendv and vec_permx > builtins. > > The patch has been compiled and tested on > > powerpc64le-unknown-linux-gnu (Power 9 LE) > > with no regression errors. > > The test cases were compiled on a Power 9 system and then tested on > Mambo. > > Carl Love > > --------------------------------------------------------------- > rs6000 RFC2609 vector blend, permute instructions > > gcc/ChangeLog > > 2020-07-06 Carl Love <c...@us.ibm.com> > > * config/rs6000/altivec.h (vec_blendv, vec_permx): Add define. > * config/rs6000/altivec.md (UNSPEC_XXBLEND, UNSPEC_XXPERMX.): > New > unspecs. > (VM3): New define_mode. > (VM3_char): New define_attr. > (xxblend_<mode> mode VM3): New define_insn. > (xxpermx): New define_expand. > (xxpermx_inst): New define_insn. > * config/rs6000/rs6000-builtin.def (VXXBLEND_V16QI, > VXXBLEND_V8HI, > VXXBLEND_V4SI, VXXBLEND_V2DI, VXXBLEND_V4SF, VXXBLEND_V2DF): > New > BU_P10V_3 definitions.
> (XXBLENDBU_P10_OVERLOAD_3): New BU_P10_OVERLOAD_3 definition. extra noise in (), should be just XXBLEND . > (XXPERMX): New BU_P10_OVERLOAD_4 definition. > * config/rs6000/rs6000-c.c > (altivec_resolve_overloaded_builtin): > (P10_BUILTIN_VXXPERMX): Add if case support. > * config/rs6000/rs6000-call.c (P10_BUILTIN_VXXBLEND_V16QI, > P10_BUILTIN_VXXBLEND_V8HI, P10_BUILTIN_VXXBLEND_V4SI, > P10_BUILTIN_VXXBLEND_V2DI, P10_BUILTIN_VXXBLEND_V4SF, > P10_BUILTIN_VXXBLEND_V2DF, P10_BUILTIN_VXXPERMX): Define > overloaded arguments. > (rs6000_expand_quaternop_builtin): Add if case for > CODE_FOR_xxpermx. s/if// > (builtin_quaternary_function_type): Add v16uqi_type and > xxpermx_type > variables, add case statement for P10_BUILTIN_VXXPERMX. > (builtin_function_type)[P10_BUILTIN_VXXBLEND_V16QI, > P10_BUILTIN_VXXBLEND_V8HI, P10_BUILTIN_VXXBLEND_V4SI, > P10_BUILTIN_VXXBLEND_V2DI]: Add case statements. Add space after (builtin_function_type) Reverse tense? (b_f_t) Add case statements for P10_BUILTIN_..., P10_BUILTIN_... > * doc/extend.texi: Add documentation for vec_blendv and > vec_permx. > > gcc/testsuite/ChangeLog > > 2020-07-06 Carl Love <c...@us.ibm.com> > gcc.target/powerpc/vec-blend-runnable.c: New test. > gcc.target/powerpc/vec-permute-ext-runnable.c: New test. > --- > gcc/config/rs6000/altivec.h | 2 + > gcc/config/rs6000/altivec.md | 71 +++++ > gcc/config/rs6000/rs6000-builtin.def | 13 + > gcc/config/rs6000/rs6000-c.c | 27 +- > gcc/config/rs6000/rs6000-call.c | 95 ++++++ > gcc/doc/extend.texi | 63 ++++ > .../gcc.target/powerpc/vec-blend-runnable.c | 276 ++++++++++++++++ > .../powerpc/vec-permute-ext-runnable.c | 294 > ++++++++++++++++++ > 8 files changed, 835 insertions(+), 6 deletions(-) > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-blend- > runnable.c > create mode 100644 gcc/testsuite/gcc.target/powerpc/vec-permute-ext- > runnable.c > > diff --git a/gcc/config/rs6000/altivec.h > b/gcc/config/rs6000/altivec.h > index 126409c168b..e8fdeb31b0b 100644 > --- a/gcc/config/rs6000/altivec.h > +++ b/gcc/config/rs6000/altivec.h > @@ -708,6 +708,8 @@ __altivec_scalar_pred(vec_any_nle, > #define vec_splati(a) __builtin_vec_xxspltiw (a) > #define vec_splatid(a) __builtin_vec_xxspltid (a) > #define vec_splati_ins(a, b, c) __builtin_vec_xxsplti32dx (a, > b, c) > +#define vec_blendv(a, b, c) __builtin_vec_xxblend (a, b, c) > +#define vec_permx(a, b, c, d) __builtin_vec_xxpermx (a, b, c, d) > > #define vec_gnb(a, b) __builtin_vec_gnb (a, b) > #define vec_clrl(a, b) __builtin_vec_clrl (a, b) > diff --git a/gcc/config/rs6000/altivec.md > b/gcc/config/rs6000/altivec.md > index f6858b5bf2a..226cf121f12 100644 > --- a/gcc/config/rs6000/altivec.md > +++ b/gcc/config/rs6000/altivec.md > @@ -177,6 +177,8 @@ > UNSPEC_XXSPLTIW > UNSPEC_XXSPLTID > UNSPEC_XXSPLTI32DX > + UNSPEC_XXBLEND > + UNSPEC_XXPERMX > ]) > > (define_c_enum "unspecv" > @@ -219,6 +221,21 @@ > (KF "FLOAT128_VECTOR_P (KFmode)") > (TF "FLOAT128_VECTOR_P (TFmode)")]) > > +;; Like VM2, just do char, short, int, long, float and double > +(define_mode_iterator VM3 [V4SI > + V8HI > + V16QI > + V4SF > + V2DF > + V2DI]) > + > +(define_mode_attr VM3_char [(V2DI "d") > + (V4SI "w") > + (V8HI "h") > + (V16QI "b") > + (V2DF "d") > + (V4SF "w")]) > + > ;; Map the Vector convert single precision to double precision for > integer > ;; versus floating point > (define_mode_attr VS_sxwsp [(V4SI "sxw") (V4SF "sp")]) > @@ -916,6 +933,60 @@ > "xxsplti32dx %x0,%2,%3" > [(set_attr "type" "vecsimple")]) > > +(define_insn "xxblend_<mode>" > + [(set (match_operand:VM3 0 "register_operand" "=wa") > + (unspec:VM3 [(match_operand:VM3 1 "register_operand" "wa") > + (match_operand:VM3 2 "register_operand" "wa") > + (match_operand:VM3 3 "register_operand" "wa")] > + UNSPEC_XXBLEND))] > + "TARGET_POWER10" > + "xxblendv<VM3_char> %x0,%x1,%x2,%x3" > + [(set_attr "type" "vecsimple")]) > + > +(define_expand "xxpermx" > + [(set (match_operand:V2DI 0 "register_operand" "+wa") > + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "wa") > + (match_operand:V2DI 2 "register_operand" "wa") > + (match_operand:V16QI 3 "register_operand" "wa") > + (match_operand:QI 4 "u8bit_cint_operand" "n")] > + UNSPEC_XXPERMX))] > + "TARGET_POWER10" > +{ > + if (BYTES_BIG_ENDIAN) > + emit_insn (gen_xxpermx_inst (operands[0], operands[1], > + operands[2], operands[3], > + operands[4])); > + else > + { > + /* Reverse value of byte element indexes by XORing with 0xFF. > + Reverse the 32-byte section identifier match by subracting > bits [0:2] > + of elemet from 7. */ > + int value = INTVAL (operands[4]); > + rtx vreg = gen_reg_rtx (V16QImode); > + > + emit_insn (gen_xxspltib_v16qi (vreg, GEN_INT (-1))); > + emit_insn (gen_xorv16qi3 (operands[3], operands[3], vreg)); > + value = 7 - value; > + emit_insn (gen_xxpermx_inst (operands[0], operands[2], > + operands[1], operands[3], > + GEN_INT (value))); > + } > + > + DONE; > +} > + [(set_attr "type" "vecsimple")]) > + > +(define_insn "xxpermx_inst" > + [(set (match_operand:V2DI 0 "register_operand" "+v") > + (unspec:V2DI [(match_operand:V2DI 1 "register_operand" "v") > + (match_operand:V2DI 2 "register_operand" "v") > + (match_operand:V16QI 3 "register_operand" "v") > + (match_operand:QI 4 "u3bit_cint_operand" "n")] > + UNSPEC_XXPERMX))] > + "TARGET_POWER10" > + "xxpermx %x0,%x1,%x2,%x3,%4" > + [(set_attr "type" "vecsimple")]) > + > (define_expand "vstrir_<mode>" > [(set (match_operand:VIshort 0 "altivec_register_operand") > (unspec:VIshort [(match_operand:VIshort 1 > "altivec_register_operand")] > diff --git a/gcc/config/rs6000/rs6000-builtin.def > b/gcc/config/rs6000/rs6000-builtin.def > index ddfe287efc8..3d45354c573 100644 > --- a/gcc/config/rs6000/rs6000-builtin.def > +++ b/gcc/config/rs6000/rs6000-builtin.def > @@ -2756,6 +2756,15 @@ BU_P10V_1 (VXXSPLTID, "vxxspltidp", CONST, > xxspltidp_v2df) > BU_P10V_3 (VXXSPLTI32DX_V4SI, "vxxsplti32dx_v4si", CONST, > xxsplti32dx_v4si) > BU_P10V_3 (VXXSPLTI32DX_V4SF, "vxxsplti32dx_v4sf", CONST, > xxsplti32dx_v4sf) > > +BU_P10V_3 (VXXBLEND_V16QI, "xxblend_v16qi", CONST, xxblend_v16qi) > +BU_P10V_3 (VXXBLEND_V8HI, "xxblend_v8hi", CONST, xxblend_v8hi) > +BU_P10V_3 (VXXBLEND_V4SI, "xxblend_v4si", CONST, xxblend_v4si) > +BU_P10V_3 (VXXBLEND_V2DI, "xxblend_v2di", CONST, xxblend_v2di) > +BU_P10V_3 (VXXBLEND_V4SF, "xxblend_v4sf", CONST, xxblend_v4sf) > +BU_P10V_3 (VXXBLEND_V2DF, "xxblend_v2df", CONST, xxblend_v2df) > + > +BU_P10V_4 (VXXPERMX, "xxpermx", CONST, xxpermx) > + > BU_P10V_1 (VSTRIBR, "vstribr", CONST, vstrir_v16qi) > BU_P10V_1 (VSTRIHR, "vstrihr", CONST, vstrir_v8hi) > BU_P10V_1 (VSTRIBL, "vstribl", CONST, vstril_v16qi) > @@ -2791,6 +2800,10 @@ BU_P10_OVERLOAD_1 (VSTRIL_P, "stril_p") > BU_P10_OVERLOAD_1 (XXSPLTIW, "xxspltiw") > BU_P10_OVERLOAD_1 (XXSPLTID, "xxspltid") > BU_P10_OVERLOAD_3 (XXSPLTI32DX, "xxsplti32dx") > + > +BU_P10_OVERLOAD_3 (XXBLEND, "xxblend") > +BU_P10_OVERLOAD_4 (XXPERMX, "xxpermx") > + > > /* 1 argument crypto functions. */ > BU_CRYPTO_1 (VSBOX, "vsbox", CONST, crypto_vsbox_v2di) > diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000- > c.c > index cb7d34dcdb5..db6aecfad2d 100644 > --- a/gcc/config/rs6000/rs6000-c.c > +++ b/gcc/config/rs6000/rs6000-c.c > @@ -1800,22 +1800,37 @@ altivec_resolve_overloaded_builtin > (location_t loc, tree fndecl, > unsupported_builtin = true; > } > } > - else if (fcode == P10_BUILTIN_VEC_XXEVAL) > + else if ((fcode == P10_BUILTIN_VEC_XXEVAL) > + || (fcode == P10_BUILTIN_VXXPERMX)) > { > - /* Need to special case __builtin_vec_xxeval because this takes > - 4 arguments, and the existing infrastructure handles no > - more than three. */ > + signed char op3_type; > + > + /* Need to special case the builins_xxeval because it takes > + 4 arguments, and the existing infrastructure handles > three. */ A couple typos and should probably add the xxpermx reference. so stl: .. the builtins __builtin_vec_xxeval and __builtin_vec_xxpermx because they require 4 arguments .. > if (nargs != 4) > { > - error ("builtin %qs requires 4 arguments", > - "__builtin_vec_xxeval"); > + if (fcode == P10_BUILTIN_VEC_XXEVAL) > + error ("builtin %qs requires 4 arguments", > + "__builtin_vec_xxeval"); > + else > + error ("builtin %qs requires 4 arguments", > + "__builtin_vec_xxpermx"); > + May be able to compress this a bit, see what was done with the argument checking for ALTIVEC_BUILTIN_VEC_ADDEC. > return error_mark_node; > } > + > + /* Set value for vec_xxpermx here as it is a constant. */ > + op3_type = RS6000_BTI_V16QI; > + > for ( ; desc->code == fcode; desc++) > { > + if (fcode == P10_BUILTIN_VEC_XXEVAL) > + op3_type = desc->op3; I had to confirm the op3_type change below was proper.. Since the only use of op3_type is within this sub-block, i'd say combine the previous assignment with the logic here so it's clear that op3_type has been preperly set before the call into rs600)_builtin_type_compatible(). something like if (fcode == P10_BUILTIN_VEC_XXEVAL) op3_type = desc->op3; else /* P10_BUILTIN_VXXPERMX */ op3_type = RS6000_BTI_V16QI; > + > if (rs6000_builtin_type_compatible (types[0], desc->op1) > && rs6000_builtin_type_compatible (types[1], desc->op2) > && rs6000_builtin_type_compatible (types[2], desc->op3) > + && rs6000_builtin_type_compatible (types[2], op3_type) > && rs6000_builtin_type_compatible (types[3], > RS6000_BTI_UINTSI)) > { > diff --git a/gcc/config/rs6000/rs6000-call.c > b/gcc/config/rs6000/rs6000-call.c > index 06320279138..dc69d4873a0 100644 > --- a/gcc/config/rs6000/rs6000-call.c > +++ b/gcc/config/rs6000/rs6000-call.c > @@ -5563,6 +5563,39 @@ const struct altivec_builtin_types > altivec_overloaded_builtins[] = { > RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI, > RS6000_BTI_unsigned_V1TI, RS6000_BTI_unsigned_V1TI }, > > + /* The overloaded XXPERMX definitions are handled specially > because the > + fourth unsigned char operand is not encoded in this table. */ > + { P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX, > + RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, > + RS6000_BTI_unsigned_V16QI }, > + { P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX, > + RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, > + RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI }, > + { P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX, > + RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, > + RS6000_BTI_unsigned_V16QI }, > + { P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX, > + RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, > + RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V16QI }, > + { P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX, > + RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, > + RS6000_BTI_unsigned_V16QI }, > + { P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX, > + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, > + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V16QI }, > + { P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX, > + RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, > + RS6000_BTI_unsigned_V16QI }, > + { P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX, > + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, > + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V16QI }, > + { P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX, > + RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, > + RS6000_BTI_unsigned_V16QI }, > + { P10_BUILTIN_VEC_XXPERMX, P10_BUILTIN_VXXPERMX, > + RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, > + RS6000_BTI_unsigned_V16QI }, > + > { P10_BUILTIN_VEC_EXTRACTL, P10_BUILTIN_VEXTRACTBL, > RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V16QI, > RS6000_BTI_unsigned_V16QI, RS6000_BTI_UINTQI }, > @@ -5704,6 +5737,37 @@ const struct altivec_builtin_types > altivec_overloaded_builtins[] = { > { P10_BUILTIN_VEC_XXSPLTI32DX, P10_BUILTIN_VXXSPLTI32DX_V4SF, > RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_UINTQI, > RS6000_BTI_float }, > > + { P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V16QI, > + RS6000_BTI_V16QI, RS6000_BTI_V16QI, RS6000_BTI_V16QI, > + RS6000_BTI_unsigned_V16QI }, > + { P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V16QI, > + RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI, > + RS6000_BTI_unsigned_V16QI, RS6000_BTI_unsigned_V16QI }, > + { P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V8HI, > + RS6000_BTI_V8HI, RS6000_BTI_V8HI, RS6000_BTI_V8HI, > + RS6000_BTI_unsigned_V8HI }, > + { P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V8HI, > + RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI, > + RS6000_BTI_unsigned_V8HI, RS6000_BTI_unsigned_V8HI }, > + { P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V4SI, > + RS6000_BTI_V4SI, RS6000_BTI_V4SI, RS6000_BTI_V4SI, > + RS6000_BTI_unsigned_V4SI }, > + { P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V4SI, > + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI, > + RS6000_BTI_unsigned_V4SI, RS6000_BTI_unsigned_V4SI }, > + { P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V2DI, > + RS6000_BTI_V2DI, RS6000_BTI_V2DI, RS6000_BTI_V2DI, > + RS6000_BTI_unsigned_V2DI }, > + { P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V2DI, > + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI, > + RS6000_BTI_unsigned_V2DI, RS6000_BTI_unsigned_V2DI }, > + { P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V4SF, > + RS6000_BTI_V4SF, RS6000_BTI_V4SF, RS6000_BTI_V4SF, > + RS6000_BTI_unsigned_V4SI }, > + { P10_BUILTIN_VEC_XXBLEND, P10_BUILTIN_VXXBLEND_V2DF, > + RS6000_BTI_V2DF, RS6000_BTI_V2DF, RS6000_BTI_V2DF, > + RS6000_BTI_unsigned_V2DI }, > + > { P10_BUILTIN_VEC_SRDB, P10_BUILTIN_VSRDB_V16QI, > RS6000_BTI_V16QI, RS6000_BTI_V16QI, > RS6000_BTI_V16QI, RS6000_BTI_UINTQI }, > @@ -10101,6 +10165,19 @@ rs6000_expand_quaternop_builtin (enum > insn_code icode, tree exp, rtx target) > return CONST0_RTX (tmode); > } > } > + > + else if (icode == CODE_FOR_xxpermx) > + { > + /* Only allow 3-bit unsigned literals. */ > + STRIP_NOPS (arg3); > + if (TREE_CODE (arg3) != INTEGER_CST > + || TREE_INT_CST_LOW (arg3) & ~0x7) > + { > + error ("argument 4 must be an 3-bit unsigned literal"); s/an/a/ > + return CONST0_RTX (tmode); > + } > + } > + > else if (icode == CODE_FOR_vreplace_elt_v4si > || icode == CODE_FOR_vreplace_elt_v4sf) > { > @@ -13788,12 +13865,17 @@ builtin_quaternary_function_type > (machine_mode mode_ret, > tree function_type = NULL; > > static tree v2udi_type = builtin_mode_to_type[V2DImode][1]; > + static tree v16uqi_type = builtin_mode_to_type[V16QImode][1]; > static tree uchar_type = builtin_mode_to_type[QImode][1]; > > static tree xxeval_type = > build_function_type_list (v2udi_type, v2udi_type, v2udi_type, > v2udi_type, uchar_type, NULL_TREE); > > + static tree xxpermx_type = > + build_function_type_list (v2udi_type, v2udi_type, v2udi_type, > + v16uqi_type, uchar_type, NULL_TREE); > + > switch (builtin) { > > case P10_BUILTIN_XXEVAL: > @@ -13805,6 +13887,15 @@ builtin_quaternary_function_type > (machine_mode mode_ret, > function_type = xxeval_type; > break; > > + case P10_BUILTIN_VXXPERMX: > + gcc_assert ((mode_ret == V2DImode) > + && (mode_arg0 == V2DImode) > + && (mode_arg1 == V2DImode) > + && (mode_arg2 == V16QImode) > + && (mode_arg3 == QImode)); > + function_type = xxpermx_type; > + break; > + > default: > /* A case for each quaternary built-in must be provided > above. */ > gcc_unreachable (); > @@ -13986,6 +14077,10 @@ builtin_function_type (machine_mode > mode_ret, machine_mode mode_arg0, > case P10_BUILTIN_VREPLACE_ELT_UV2DI: > case P10_BUILTIN_VREPLACE_UN_UV4SI: > case P10_BUILTIN_VREPLACE_UN_UV2DI: > + case P10_BUILTIN_VXXBLEND_V16QI: > + case P10_BUILTIN_VXXBLEND_V8HI: > + case P10_BUILTIN_VXXBLEND_V4SI: > + case P10_BUILTIN_VXXBLEND_V2DI: > h.uns_p[0] = 1; > h.uns_p[1] = 1; > h.uns_p[2] = 1; ok > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi > index e9aa06553aa..0e4d91a43f6 100644 > --- a/gcc/doc/extend.texi > +++ b/gcc/doc/extend.texi > @@ -21200,6 +21200,69 @@ result. The other words of argument 1 are > unchanged. > > @findex vec_splati_ins > > +Vector Blend Variable > + > +@smallexample > +@exdent vector signed char vec_blendv (vector signed char, vector > signed char, > +vector unsigned char); > +@exdent vector unsigned char vec_blendv (vector unsigned char, > +vector unsigned char, vector unsigned char); > +@exdent vector signed short vec_blendv (vector signed short, > +vector signed short, vector unsigned short); > +@exdent vector unsigned short vec_blendv (vector unsigned short, > +vector unsigned short, vector unsigned short); > +@exdent vector signed int vec_blendv (vector signed int, vector > signed int, > +vector unsigned int); > +@exdent vector unsigned int vec_blendv (vector unsigned int, > +vector unsigned int, vector unsigned int); > +@exdent vector signed long long vec_blendv (vector signed long long, > +vector signed long long, vector unsigned long long); > +@exdent vector unsigned long long vec_blendv (vector unsigned long > long, > +vector unsigned long long, vector unsigned long long); > +@exdent vector float vec_blendv (vector float, vector float, > +vector unsigned int); > +@exdent vector double vec_blendv (vector double, vector double, > +vector unsigned long long); > +@end smallexample > + > +Blend the first and second argument vectors according to the sign > bits of the > +corresponding elements of the third argument vector. This is > similar to the > +vsel and xxsel instructions but for bigger elements. @code{} around vsel,xxsel > + > +@findex vec_blendv > + > +Vector Permute Extended > + > +@smallexample > +@exdent vector signed char vec_permx (vector signed char, vector > signed char, > +vector unsigned char, const int); > +@exdent vector unsigned char vec_permx (vector unsigned char, > +vector unsigned char, vector unsigned char, const int); > +@exdent vector signed short vec_permx (vector signed short, > +vector signed short, vector unsigned char, const int); > +@exdent vector unsigned short vec_permx (vector unsigned short, > +vector unsigned short, vector unsigned char, const int); > +@exdent vector signed int vec_permx (vector signed int, vector > signed int, > +vector unsigned char, const int); > +@exdent vector unsigned int vec_permx (vector unsigned int, > +vector unsigned int, vector unsigned char, const int); > +@exdent vector signed long long vec_permx (vector signed long long, > +vector signed long long, vector unsigned char, const int); > +@exdent vector unsigned long long vec_permx (vector unsigned long > long, > +vector unsigned long long, vector unsigned char, const int); > +@exdent vector float (vector float, vector float, vector unsigned > char, > +const int); > +@exdent vector double (vector double, vector double, vector unsigned > char, > +const int); > +@end smallexample > + > +Perform a partial permute of the first two arguments, which form a > 32-byte > +section of an emulated vector up to 256 bytes wide, using the > partial permute > +control vector in the third argument. The fourth argument > (constrained to > +values of 0-7) identifies which 32-byte section of the emulated > vector is > +contained in the first two arguments. > +@findex vec_permx > + > @smallexample > @exdent vector unsigned long long int > @exdent vec_pext (vector unsigned long long int, vector unsigned > long long int) ok Glanced at tests, nothing jumped out at me there. <snip> Thanks, -Will