Kewen: On Wed, 2023-06-07 at 17:36 +0800, Kewen.Lin wrote: > Hi, > > on 2023/6/7 03:54, Carl Love wrote: > > On Mon, 2023-06-05 at 16:45 +0800, Kewen.Lin wrote: > > > Hi Carl, > > > > > > on 2023/5/2 23:52, Carl Love via Gcc-patches wrote: > > > > GCC maintainers: > > > > > > > > The following patch adds three buitins for inserting and > > > > extracting > > > > the > > > > exponent and significand for an IEEE 128-bit floating point > > > > values. > > > > The builtins are valid for Power 9 and Power 10. > > > > > > We already have: > > > > > > unsigned long long int scalar_extract_exp (__ieee128 source); > > > unsigned __int128 scalar_extract_sig (__ieee128 source); > > > ieee_128 scalar_insert_exp (unsigned __int128 significand, > > > unsigned long long int exponent); > > > ieee_128 scalar_insert_exp (ieee_128 significand, unsigned long > > > long > > > int exponent); > > > > > > you need to say something about the requirements or the > > > justification > > > for > > > adding more, for this patch itself, some comments are inline > > > below, > > > thanks! > > > > I implemented the patch based on a request for the builtins. It > > didn't > > include any justification so I reached out to Steve Monroe who > > requested the builtins to understand why he wanted them. Here is > > his > > reply: > > > > Basically there is no clean and performant way to transfer > > between a > > vector type and the ieee128 scalar, despite the fact that both > > reside in vector registers. Also a union transfer does not work > > correctly on most GCC versions (and will likely break again in > > the > > next release). I offer the long sad history of the IBM long > > double > > float runtime. > > Thanks for clarifying this. As the proposed changes, I think he > meant > to say "Basically there is no clean and performant way to transfer > between > a vector type and the scalar **types**". :) Because the proposed > changes > are: > scalar_extract_exp: unsigned long long => vector unsigned long long > scalar_extract_sig: unsigned __int128 => vector unsigned __int128 > scalar_insert_exp: unsigned __int128 => vector unsigned __int128 > unsigned long long => vector unsigned long long. > > > Also there are __ieee128 operations that are provided by > > builtins > > for POWER9 but are not provided by libgcc (for POWER8). > > > > Finally I can prove that a softfloat __ieee128 implementation > > using > > VMX integer operations, out-performs the current libgcc > > implementation using DW GPRs. > > > > The details are in the PVECLIB documentation > > pveclib/vec__f128__ppc.h > > > > > > > > The patch has been tested on both Power 9 and Power 10. > > > > > > > > Please let me know if this patch is acceptable for > > > > mainline. Thanks. > > > > > > > > Carl > > > > > > > > > > > > ---------------------- > > > > From a20cc81f98cce1140fc95775a7c25b55d1ca7cba Mon Sep 17 > > > > 00:00:00 > > > > 2001 > > > > From: Carl Love <c...@us.ibm.com> > > > > Date: Wed, 12 Apr 2023 17:46:37 -0400 > > > > Subject: [PATCH] rs6000: Add builtins for IEEE 128-bit floating > > > > point values > > > > > > > > Add support for the following builtins: > > > > > > > > __vector unsigned long long int __builtin_extractf128_exp > > > > (__ieee128); > > > > > > Could you make the name similar to the existing one? The > > > existing > > > one > > > > > > unsigned long long int scalar_extract_exp (__ieee128 source); > > > > > > has nothing like f128 on its name, this variant is just to change > > > the > > > return type to vector type, how about scalar_extract_exp_to_vec? > > > > I changed the name __builtin_extractf128_exp to > > __builtin_scalar_extract_exp_to_vec. > > > > > > __vector unsigned __int128 __builtin_extractf128_sig > > > > (__ieee128); > > > > > > Ditto. > > > > I changed the name __builtin_extractf128_sig to > > __builtin_scalar_extract_sig_to_vec. > > > > > > __ieee128 __builtin_insertf128_exp (__vector unsigned > > > > __int128, > > > > __vector unsigned long > > > > long); > > > > > > This one can just overload the existing scalar_insert_exp? > > > > I tried making this one an overloaded version of > > scalar_insert_exp. However, the overload with the vector arguments > > isn't recognized when I put the overload definition at the end of > > the > > list of overloads. When I tried putting the vector version as the > > first overloaded definition, I get an internal error > > on __builtin_vsx_scalar_insert_exp_q which is has the same > > arguments > > types but as scalars not vectors. Best I can tell, there is an > > issue > > with mixing scalar and vector arguments in an overloaded builtin. > > No, it's not due to mixing scalar and vector arguments, I looked into > this and found we have some special handling for this builtin in > altivec_resolve_overloaded_builtin, see the code: > > case RS6000_OVLD_VEC_VSIE: > { > machine_mode arg1_mode = TYPE_MODE (types[0]); > > /* If supplied first argument is wider than 64 bits, resolve to > 128-bit variant of built-in function. */ > if (GET_MODE_PRECISION (arg1_mode) > 64) > { > /* If first argument is of float variety, choose variant > that expects __ieee128 argument. Otherwise, expect > __int128 argument. */ > if (GET_MODE_CLASS (arg1_mode) == MODE_FLOAT) > instance_code = RS6000_BIF_VSIEQPF; > else > instance_code = RS6000_BIF_VSIEQP; > } > .... > > When the 1st arg's precision is larger than 64, it just picks up > RS6000_BIF_VSIEQPF or RS6000_BIF_VSIEQP, you need to update this for > what you newly added. Something like (against your v2): > > const _Float128 __builtin_vsx_scalar_insert_exp_vqp (vuq, vull); > - VSIEDP_VULL xsiexpqpf_f128_kf {} > + VSIEQPV xsiexpqpf_f128_kf {} > > > @@ -1934,6 +1934,8 @@ altivec_resolve_overloaded_builtin (location_t > loc, tree fndecl, > __int128 argument. */ > if (GET_MODE_CLASS (arg1_mode) == MODE_FLOAT) > instance_code = RS6000_BIF_VSIEQPF; > + else if (VECTOR_MODE_P (arg1_mode)) > + instance_code = RS6000_BIF_VSIEQPV; > else > instance_code = RS6000_BIF_VSIEQP; > } > > diff --git a/gcc/config/rs6000/rs6000-overload.def > b/gcc/config/rs6000/rs6000-overload.def > index 102ead9f80b..84743114c8e 100644 > --- a/gcc/config/rs6000/rs6000-overload.def > +++ b/gcc/config/rs6000/rs6000-overload.def > @@ -4515,8 +4515,8 @@ > VSIEQP > _Float128 __builtin_vec_scalar_insert_exp (_Float128, unsigned > long long); > VSIEQPF > - _Float128 __builtin_vsx_scalar_insert_exp_vqp (vuq, vull); > - VSIEDP_VULL > + _Float128 __builtin_vec_scalar_insert_exp (vuq, vull);
OK, I missed the special handling of the VSIE builtin in the function. With the suggested changes the overloading works. Thanks for the help. > <snip> > > > > + > > > > (define_expand "vreplace_elt_<mode>" > > > > [(set (match_operand:REPLACE_ELT 0 "register_operand") > > > > (unspec:REPLACE_ELT [(match_operand:REPLACE_ELT 1 > > > > "register_operand") > > > > @@ -5016,6 +5051,15 @@ > > > > "xsxexpqp %0,%1" > > > > [(set_attr "type" "vecmove")]) > > > > > > > > +;; VSX Scalar to Vector Extract Exponent IEEE 128-bit floating > > > > point format > > > > +(define_insn "xsxexpqp_f128_<mode>" > > > > + [(set (match_operand:V2DI 0 "altivec_register_operand" "=v") > > > > + (unspec:V2DI [(match_operand:IEEE128 1 > > > > "altivec_register_operand" "v")] > > > > + UNSPEC_VSX_SXEXPDP))] > > > > + "TARGET_P9_VECTOR" > > > > + "xsxexpqp %0,%1" > > > > + [(set_attr "type" "vecmove")]) > > > > > > I think you can just merge this into the existing xsxexpqp_<mode> > > > by > > > extending > > > with one mode like xsxexpqp_<IEEE128:mode>_<xxx:mode> for > > > unspec:xxx. > > > > > > xxx can be one mode iterator for DI and V2DI. > > > > OK, I played with this, it is rather sophisticated syntax. I > > didn't > > seem to be able to get the syntax correct and thus get all this to > > work. It also seemed to require updating some case statements for > > the > > instructions in various files. I opted to not change this and just > > keep it simple. Not sure if that is OK or not? > > > > I meant something like: > > (define_mode_iterator VSEEQP_DI [V2DI DI]) > > ;; VSX Scalar Extract Exponent Quad-Precision > (define_insn "xsxexpqp_<IEEE128:mode>_<VSEEQP_DI:mode>" > [(set (match_operand:VSEEQP_DI 0 "altivec_register_operand" "=v") > (unspec:VSEEQP_DI > [(match_operand:IEEE128 1 "altivec_register_operand" "v")] > UNSPEC_VSX_SXEXPDP))] > "TARGET_P9_VECTOR" > "xsxexpqp %0,%1" > [(set_attr "type" "vecmove")]) OK, I the define_insn part right. I was using an existing iterator that included both DI and V2DI as well as two additional types. That results in two unused patterns. Buit it looks like the issue I got stuck on was not getting the tweaks in file rs6000-builtins.def all correct. I thought the errors I was getting was due to incorrect syntax in the define_insn, but were actually errors in rs6000- builtins.def. Once I sorted that out, it works. I did go with the define_mode iterator you suggested so we don't generate additional unused patterns. Again, thanks for the help on that. > > since what you need just one different mode from DI for the operand > 0, > using one mode iterator can make you not need to duplicate it with > a different mode. With above, you can use xsxexpqp_kf_di for the > previous xsxexpqp_kf, and xsxexpqp_kf_v2di for your proposed > __builtin_scalar_extract_exp_to_vec. > > BR, > Kewen Carl