Hi, on 2023/6/7 03:54, Carl Love wrote: > On Mon, 2023-06-05 at 16:45 +0800, Kewen.Lin wrote: >> Hi Carl, >> >> on 2023/5/2 23:52, Carl Love via Gcc-patches wrote: >>> GCC maintainers: >>> >>> The following patch adds three buitins for inserting and extracting >>> the >>> exponent and significand for an IEEE 128-bit floating point >>> values. >>> The builtins are valid for Power 9 and Power 10. >> >> We already have: >> >> unsigned long long int scalar_extract_exp (__ieee128 source); >> unsigned __int128 scalar_extract_sig (__ieee128 source); >> ieee_128 scalar_insert_exp (unsigned __int128 significand, >> unsigned long long int exponent); >> ieee_128 scalar_insert_exp (ieee_128 significand, unsigned long long >> int exponent); >> >> you need to say something about the requirements or the justification >> for >> adding more, for this patch itself, some comments are inline below, >> thanks! > > I implemented the patch based on a request for the builtins. It didn't > include any justification so I reached out to Steve Monroe who > requested the builtins to understand why he wanted them. Here is his > reply: > > Basically there is no clean and performant way to transfer between a > vector type and the ieee128 scalar, despite the fact that both > reside in vector registers. Also a union transfer does not work > correctly on most GCC versions (and will likely break again in the > next release). I offer the long sad history of the IBM long double > float runtime.
Thanks for clarifying this. As the proposed changes, I think he meant to say "Basically there is no clean and performant way to transfer between a vector type and the scalar **types**". :) Because the proposed changes are: scalar_extract_exp: unsigned long long => vector unsigned long long scalar_extract_sig: unsigned __int128 => vector unsigned __int128 scalar_insert_exp: unsigned __int128 => vector unsigned __int128 unsigned long long => vector unsigned long long. > > Also there are __ieee128 operations that are provided by builtins > for POWER9 but are not provided by libgcc (for POWER8). > > Finally I can prove that a softfloat __ieee128 implementation using > VMX integer operations, out-performs the current libgcc > implementation using DW GPRs. > > The details are in the PVECLIB documentation > pveclib/vec__f128__ppc.h > > >> >>> The patch has been tested on both Power 9 and Power 10. >>> >>> Please let me know if this patch is acceptable for >>> mainline. Thanks. >>> >>> Carl >>> >>> >>> ---------------------- >>> From a20cc81f98cce1140fc95775a7c25b55d1ca7cba Mon Sep 17 00:00:00 >>> 2001 >>> From: Carl Love <c...@us.ibm.com> >>> Date: Wed, 12 Apr 2023 17:46:37 -0400 >>> Subject: [PATCH] rs6000: Add builtins for IEEE 128-bit floating >>> point values >>> >>> Add support for the following builtins: >>> >>> __vector unsigned long long int __builtin_extractf128_exp >>> (__ieee128); >> >> Could you make the name similar to the existing one? The existing >> one >> >> unsigned long long int scalar_extract_exp (__ieee128 source); >> >> has nothing like f128 on its name, this variant is just to change the >> return type to vector type, how about scalar_extract_exp_to_vec? > > I changed the name __builtin_extractf128_exp to > __builtin_scalar_extract_exp_to_vec. > >> >>> __vector unsigned __int128 __builtin_extractf128_sig (__ieee128); >> >> Ditto. > > I changed the name __builtin_extractf128_sig to > __builtin_scalar_extract_sig_to_vec. > >> >>> __ieee128 __builtin_insertf128_exp (__vector unsigned __int128, >>> __vector unsigned long long); >> >> This one can just overload the existing scalar_insert_exp? > > > I tried making this one an overloaded version of > scalar_insert_exp. However, the overload with the vector arguments > isn't recognized when I put the overload definition at the end of the > list of overloads. When I tried putting the vector version as the > first overloaded definition, I get an internal error > on __builtin_vsx_scalar_insert_exp_q which is has the same arguments > types but as scalars not vectors. Best I can tell, there is an issue > with mixing scalar and vector arguments in an overloaded builtin. No, it's not due to mixing scalar and vector arguments, I looked into this and found we have some special handling for this builtin in altivec_resolve_overloaded_builtin, see the code: case RS6000_OVLD_VEC_VSIE: { machine_mode arg1_mode = TYPE_MODE (types[0]); /* If supplied first argument is wider than 64 bits, resolve to 128-bit variant of built-in function. */ if (GET_MODE_PRECISION (arg1_mode) > 64) { /* If first argument is of float variety, choose variant that expects __ieee128 argument. Otherwise, expect __int128 argument. */ if (GET_MODE_CLASS (arg1_mode) == MODE_FLOAT) instance_code = RS6000_BIF_VSIEQPF; else instance_code = RS6000_BIF_VSIEQP; } .... When the 1st arg's precision is larger than 64, it just picks up RS6000_BIF_VSIEQPF or RS6000_BIF_VSIEQP, you need to update this for what you newly added. Something like (against your v2): const _Float128 __builtin_vsx_scalar_insert_exp_vqp (vuq, vull); - VSIEDP_VULL xsiexpqpf_f128_kf {} + VSIEQPV xsiexpqpf_f128_kf {} @@ -1934,6 +1934,8 @@ altivec_resolve_overloaded_builtin (location_t loc, tree fndecl, __int128 argument. */ if (GET_MODE_CLASS (arg1_mode) == MODE_FLOAT) instance_code = RS6000_BIF_VSIEQPF; + else if (VECTOR_MODE_P (arg1_mode)) + instance_code = RS6000_BIF_VSIEQPV; else instance_code = RS6000_BIF_VSIEQP; } diff --git a/gcc/config/rs6000/rs6000-overload.def b/gcc/config/rs6000/rs6000-overload.def index 102ead9f80b..84743114c8e 100644 --- a/gcc/config/rs6000/rs6000-overload.def +++ b/gcc/config/rs6000/rs6000-overload.def @@ -4515,8 +4515,8 @@ VSIEQP _Float128 __builtin_vec_scalar_insert_exp (_Float128, unsigned long long); VSIEQPF - _Float128 __builtin_vsx_scalar_insert_exp_vqp (vuq, vull); - VSIEDP_VULL + _Float128 __builtin_vec_scalar_insert_exp (vuq, vull); + VSIEQPV > > I renamed __builtin_insertf128_exp as > __builtin_vsx_scalar_insert_exp_vqp which is just the vector version of > the existing __builtin_vsx_scalar_insert_exp_qp builtin. >> >> gcc/ >>> * config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp, >>> __builtin_extractf128_sig, __builtin_insertf128_exp): Add new >>> builtin definitions. >>> * config/rs6000.md (extractf128_exp_<mode>, >>> insertf128_exp_<mode>, >>> extractf128_sig_<mode>): Add define_expand for new builtins. >>> (xsxexpqp_f128_<mode>, xsxsigqp_f128_<mode>, >>> siexpqpf_f128_<mode>): >>> Add define_insn for new builtins. >>> * doc/extend.texi (__builtin_extractf128_exp, >>> __builtin_extractf128_sig, >>> __builtin_insertf128_exp): Add documentation for new builtins. >>> >>> gcc/testsuite/ >>> * gcc.target/powerpc/bfp/extract-exp-ieee128.c: New test case. >>> * gcc.target/powerpc/bfp/extract-sig-ieee128.c: New test case. >>> * gcc.target/powerpc/bfp/insert-exp-ieee128.c: New test case. >>> --- >>> gcc/config/rs6000/rs6000-builtins.def | 9 +++ >>> gcc/config/rs6000/vsx.md | 66 >>> ++++++++++++++++++- >>> gcc/doc/extend.texi | 28 ++++++++ >>> .../powerpc/bfp/extract-exp-ieee128.c | 49 ++++++++++++++ >>> .../powerpc/bfp/extract-sig-ieee128.c | 56 >>> ++++++++++++++++ >>> .../powerpc/bfp/insert-exp-ieee128.c | 58 >>> ++++++++++++++++ >>> 6 files changed, 265 insertions(+), 1 deletion(-) >>> create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract- >>> exp-ieee128.c >>> create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract- >>> sig-ieee128.c >>> create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/insert- >>> exp-ieee128.c >>> >>> diff --git a/gcc/config/rs6000/rs6000-builtins.def >>> b/gcc/config/rs6000/rs6000-builtins.def >>> index 638d0bc72ca..3247a7f7673 100644 >>> --- a/gcc/config/rs6000/rs6000-builtins.def >>> +++ b/gcc/config/rs6000/rs6000-builtins.def >>> @@ -2876,6 +2876,15 @@ >>> pure vsc __builtin_vsx_xl_len_r (void *, signed long); >>> XL_LEN_R xl_len_r {} >>> >>> + vull __builtin_extractf128_exp (_Float128); >>> + EEXPKF extractf128_exp_kf {} >>> + >>> + vuq __builtin_extractf128_sig (_Float128); >>> + ESIGKF extractf128_sig_kf {} >>> + >>> + _Float128 __builtin_insertf128_exp (vuq, vull); >>> + IEXPKF_VULL insertf128_exp_kf {} >>> + >> >> Put them to be near its related ones like >> __builtin_vsx_scalar_extract_expq >> etc. >> > > I moved the definitions to be near the other definitions. > >>> >>> ; Builtins requiring hardware support for IEEE-128 floating-point. >>> [ieee128-hw] >>> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md >>> index 7d845df5c2d..2a9f875ba57 100644 >>> --- a/gcc/config/rs6000/vsx.md >>> +++ b/gcc/config/rs6000/vsx.md >>> @@ -369,7 +369,10 @@ >>> UNSPEC_XXSPLTI32DX >>> UNSPEC_XXBLEND >>> UNSPEC_XXPERMX >>> - ]) >>> + UNSPEC_EXTRACTEXPIEEE >>> + UNSPEC_EXTRACTSIGIEEE >>> + UNSPEC_INSERTEXPIEEE >> >> These are not necessary, just use the existing UNSPEC_VSX_SXEXPDP >> etc. > > I removed these, as they are not needed. > >> >>> +]) >>> >>> (define_int_iterator XVCVBF16 [UNSPEC_VSX_XVCVSPBF16 >>> UNSPEC_VSX_XVCVBF16SPN]) >>> @@ -4155,6 +4158,38 @@ >>> "vins<wd>rx %0,%1,%2" >>> [(set_attr "type" "vecsimple")]) >>> >>> +(define_expand "extractf128_exp_<mode>" >>> + [(set (match_operand:V2DI 0 "altivec_register_operand") >>> + (unspec:IEEE128 [(match_operand:IEEE128 1 >>> "altivec_register_operand")] >>> + UNSPEC_EXTRACTEXPIEEE))] >>> +"TARGET_P9_VECTOR" >>> +{ >>> + emit_insn (gen_xsxexpqp_f128_<mode> (operands[0], operands[1])); >>> + DONE; >>> +}) >>> + >>> +(define_expand "insertf128_exp_<mode>" >>> + [(set (match_operand:IEEE128 0 "altivec_register_operand") >>> + (unspec:IEEE128 [(match_operand:V1TI 1 >>> "altivec_register_operand") >>> + (match_operand:V2DI 2 "altivec_register_operand")] >>> + UNSPEC_INSERTEXPIEEE))] >>> +"TARGET_P9_VECTOR" >>> +{ >>> + emit_insn (gen_xsiexpqpf_f128_<mode> (operands[0], operands[1], >>> + operands[2])); >>> + DONE; >>> +}) >>> + >>> +(define_expand "extractf128_sig_<mode>" >>> + [(set (match_operand:V2DI 0 "altivec_register_operand") >>> + (unspec:IEEE128 [(match_operand:IEEE128 1 >>> "altivec_register_operand")] >>> + UNSPEC_EXTRACTSIGIEEE))] >>> +"TARGET_P9_VECTOR" >>> +{ >>> + emit_insn (gen_xsxsigqp_f128_<mode> (operands[0], operands[1])); >>> + DONE; >>> +}) >> >> These define_expands are not necessary, the called gen functions from >> the >> define_insns can be directly used as bif pattern in rs6000- >> builtins.def. > > Right, I just mapped directly to the instruction the above were > calling. >> >>> + >>> (define_expand "vreplace_elt_<mode>" >>> [(set (match_operand:REPLACE_ELT 0 "register_operand") >>> (unspec:REPLACE_ELT [(match_operand:REPLACE_ELT 1 >>> "register_operand") >>> @@ -5016,6 +5051,15 @@ >>> "xsxexpqp %0,%1" >>> [(set_attr "type" "vecmove")]) >>> >>> +;; VSX Scalar to Vector Extract Exponent IEEE 128-bit floating >>> point format >>> +(define_insn "xsxexpqp_f128_<mode>" >>> + [(set (match_operand:V2DI 0 "altivec_register_operand" "=v") >>> + (unspec:V2DI [(match_operand:IEEE128 1 >>> "altivec_register_operand" "v")] >>> + UNSPEC_VSX_SXEXPDP))] >>> + "TARGET_P9_VECTOR" >>> + "xsxexpqp %0,%1" >>> + [(set_attr "type" "vecmove")]) >> >> I think you can just merge this into the existing xsxexpqp_<mode> by >> extending >> with one mode like xsxexpqp_<IEEE128:mode>_<xxx:mode> for unspec:xxx. >> >> xxx can be one mode iterator for DI and V2DI. > > > OK, I played with this, it is rather sophisticated syntax. I didn't > seem to be able to get the syntax correct and thus get all this to > work. It also seemed to require updating some case statements for the > instructions in various files. I opted to not change this and just > keep it simple. Not sure if that is OK or not? > I meant something like: (define_mode_iterator VSEEQP_DI [V2DI DI]) ;; VSX Scalar Extract Exponent Quad-Precision (define_insn "xsxexpqp_<IEEE128:mode>_<VSEEQP_DI:mode>" [(set (match_operand:VSEEQP_DI 0 "altivec_register_operand" "=v") (unspec:VSEEQP_DI [(match_operand:IEEE128 1 "altivec_register_operand" "v")] UNSPEC_VSX_SXEXPDP))] "TARGET_P9_VECTOR" "xsxexpqp %0,%1" [(set_attr "type" "vecmove")]) since what you need just one different mode from DI for the operand 0, using one mode iterator can make you not need to duplicate it with a different mode. With above, you can use xsxexpqp_kf_di for the previous xsxexpqp_kf, and xsxexpqp_kf_v2di for your proposed __builtin_scalar_extract_exp_to_vec. BR, Kewen