Re: [PATCH] rs6000: Add builtins for IEEE 128-bit floating point values

Carl Love via Gcc-patches Thu, 08 Jun 2023 08:35:06 -0700


Kewen:
On Wed, 2023-06-07 at 17:36 +0800, Kewen.Lin wrote:
> Hi,
> 
> on 2023/6/7 03:54, Carl Love wrote:
> > On Mon, 2023-06-05 at 16:45 +0800, Kewen.Lin wrote:
> > > Hi Carl,
> > > 
> > > on 2023/5/2 23:52, Carl Love via Gcc-patches wrote:
> > > > GCC maintainers:
> > > > 
> > > > The following patch adds three buitins for inserting and
> > > > extracting
> > > > the
> > > > exponent and significand for an IEEE 128-bit floating point
> > > > values. 
> > > > The builtins are valid for Power 9 and Power 10.  
> > > 
> > > We already have:
> > > 
> > > unsigned long long int scalar_extract_exp (__ieee128 source);
> > > unsigned __int128 scalar_extract_sig (__ieee128 source);
> > > ieee_128 scalar_insert_exp (unsigned __int128 significand,
> > >                             unsigned long long int exponent);
> > > ieee_128 scalar_insert_exp (ieee_128 significand, unsigned long
> > > long
> > > int exponent);
> > > 
> > > you need to say something about the requirements or the
> > > justification
> > > for
> > > adding more, for this patch itself, some comments are inline
> > > below,
> > > thanks!
> > 
> > I implemented the patch based on a request for the builtins.  It
> > didn't
> > include any justification so I reached out to Steve Monroe who
> > requested the builtins to understand why he wanted them.  Here is
> > his
> > reply:
> > 
> >    Basically there is no clean and performant way to transfer
> > between a
> >    vector type and the ieee128 scalar, despite the fact that both
> >    reside in vector registers. Also a union transfer does not work
> >    correctly on most GCC versions (and will likely break again in
> > the
> >    next release). I offer the long sad history of the IBM long
> > double
> >    float runtime.
> 
> Thanks for clarifying this.  As the proposed changes, I think he
> meant
> to say "Basically there is no clean and performant way to transfer
> between
> a vector type and the scalar **types**". :) Because the proposed
> changes
> are:
>   scalar_extract_exp: unsigned long long => vector unsigned long long
>   scalar_extract_sig: unsigned __int128  => vector unsigned __int128
>   scalar_insert_exp: unsigned __int128 => vector unsigned __int128
>                      unsigned long long => vector unsigned long long.
> 
> >    Also there are __ieee128 operations that are provided by
> > builtins
> >    for POWER9 but are not provided by libgcc (for POWER8).
> > 
> >    Finally I can prove that a softfloat __ieee128 implementation
> > using
> >    VMX integer operations, out-performs the current libgcc
> >    implementation using DW GPRs.
> > 
> >    The details are in the PVECLIB documentation
> >    pveclib/vec__f128__ppc.h
> > 
> > 
> > > > The patch has been tested on both Power 9 and Power 10.
> > > > 
> > > > Please let me know if this patch is acceptable for
> > > > mainline.  Thanks.
> > > > 
> > > >             Carl 
> > > > 
> > > > 
> > > > ----------------------
> > > > From a20cc81f98cce1140fc95775a7c25b55d1ca7cba Mon Sep 17
> > > > 00:00:00
> > > > 2001
> > > > From: Carl Love <c...@us.ibm.com>
> > > > Date: Wed, 12 Apr 2023 17:46:37 -0400
> > > > Subject: [PATCH] rs6000: Add builtins for IEEE 128-bit floating
> > > > point values
> > > > 
> > > > Add support for the following builtins:
> > > > 
> > > >  __vector unsigned long long int __builtin_extractf128_exp
> > > > (__ieee128);
> > > 
> > > Could you make the name similar to the existing one?  The
> > > existing
> > > one
> > >   
> > >   unsigned long long int scalar_extract_exp (__ieee128 source);
> > > 
> > > has nothing like f128 on its name, this variant is just to change
> > > the
> > > return type to vector type, how about scalar_extract_exp_to_vec?
> > 
> > I changed the name  __builtin_extractf128_exp  to
> > __builtin_scalar_extract_exp_to_vec.
> > 
> > > >  __vector unsigned __int128 __builtin_extractf128_sig
> > > > (__ieee128);
> > > 
> > > Ditto.
> > 
> > I changed the name  __builtin_extractf128_sig to
> > __builtin_scalar_extract_sig_to_vec.
> > 
> > > >  __ieee128 __builtin_insertf128_exp (__vector unsigned
> > > > __int128,
> > > >                                      __vector unsigned long
> > > > long);
> > > 
> > > This one can just overload the existing scalar_insert_exp?
> > 
> > I tried making this one an overloaded version of
> > scalar_insert_exp.  However, the overload with the vector arguments
> > isn't recognized when I put the overload definition at the end of
> > the
> > list of overloads.  When I tried putting the vector version as the
> > first overloaded definition, I get an internal error
> > on  __builtin_vsx_scalar_insert_exp_q which is has the same
> > arguments
> > types but as scalars not vectors.  Best I can tell, there is an
> > issue
> > with mixing scalar and vector arguments in an overloaded builtin.
> 
> No, it's not due to mixing scalar and vector arguments, I looked into
> this and found we have some special handling for this builtin in
> altivec_resolve_overloaded_builtin, see the code:
> 
>     case RS6000_OVLD_VEC_VSIE:
>       {
>       machine_mode arg1_mode = TYPE_MODE (types[0]);
> 
>       /* If supplied first argument is wider than 64 bits, resolve to
>          128-bit variant of built-in function.  */
>       if (GET_MODE_PRECISION (arg1_mode) > 64)
>         {
>           /* If first argument is of float variety, choose variant
>              that expects __ieee128 argument.  Otherwise, expect
>              __int128 argument.  */
>           if (GET_MODE_CLASS (arg1_mode) == MODE_FLOAT)
>             instance_code = RS6000_BIF_VSIEQPF;
>           else
>             instance_code = RS6000_BIF_VSIEQP;
>         }
>           ....
> 
> When the 1st arg's precision is larger than 64, it just picks up
> RS6000_BIF_VSIEQPF or RS6000_BIF_VSIEQP, you need to update this for
> what you newly added.  Something like (against your v2):
> 
>    const _Float128 __builtin_vsx_scalar_insert_exp_vqp (vuq, vull);
> -    VSIEDP_VULL xsiexpqpf_f128_kf {}
> +    VSIEQPV xsiexpqpf_f128_kf {}
> 
> 
> @@ -1934,6 +1934,8 @@ altivec_resolve_overloaded_builtin (location_t
> loc, tree fndecl,
>                 __int128 argument.  */
>              if (GET_MODE_CLASS (arg1_mode) == MODE_FLOAT)
>                instance_code = RS6000_BIF_VSIEQPF;
> +            else if (VECTOR_MODE_P (arg1_mode))
> +              instance_code = RS6000_BIF_VSIEQPV;
>              else
>                instance_code = RS6000_BIF_VSIEQP;
>            }
> 
> diff --git a/gcc/config/rs6000/rs6000-overload.def
> b/gcc/config/rs6000/rs6000-overload.def
> index 102ead9f80b..84743114c8e 100644
> --- a/gcc/config/rs6000/rs6000-overload.def
> +++ b/gcc/config/rs6000/rs6000-overload.def
> @@ -4515,8 +4515,8 @@
>      VSIEQP
>    _Float128 __builtin_vec_scalar_insert_exp (_Float128, unsigned
> long long);
>      VSIEQPF
> -  _Float128 __builtin_vsx_scalar_insert_exp_vqp (vuq, vull);
> -    VSIEDP_VULL
> +  _Float128 __builtin_vec_scalar_insert_exp (vuq, vull);


OK, I missed the special handling of the VSIE builtin in the function. 
With the suggested changes the overloading works.  Thanks for the help.

> 

<snip>

> > > > +
> > > >  (define_expand "vreplace_elt_<mode>"
> > > >    [(set (match_operand:REPLACE_ELT 0 "register_operand")
> > > >    (unspec:REPLACE_ELT [(match_operand:REPLACE_ELT 1
> > > > "register_operand")
> > > > @@ -5016,6 +5051,15 @@
> > > >    "xsxexpqp %0,%1"
> > > >    [(set_attr "type" "vecmove")])
> > > >  
> > > > +;; VSX Scalar to Vector Extract Exponent IEEE 128-bit floating
> > > > point format
> > > > +(define_insn "xsxexpqp_f128_<mode>"
> > > > +  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
> > > > +       (unspec:V2DI [(match_operand:IEEE128 1
> > > > "altivec_register_operand" "v")]
> > > > +        UNSPEC_VSX_SXEXPDP))]
> > > > +  "TARGET_P9_VECTOR"
> > > > +  "xsxexpqp %0,%1"
> > > > +  [(set_attr "type" "vecmove")])
> > > 
> > > I think you can just merge this into the existing xsxexpqp_<mode>
> > > by
> > > extending
> > > with one mode like xsxexpqp_<IEEE128:mode>_<xxx:mode> for
> > > unspec:xxx.
> > > 
> > > xxx can be one mode iterator for DI and V2DI.
> > 
> > OK, I played with this, it is rather sophisticated syntax.  I
> > didn't
> > seem to be able to get the syntax correct and thus get all this to
> > work.  It also seemed to require updating some case statements for
> > the
> > instructions in various files.  I opted to not change this and just
> > keep it simple.  Not sure if that is OK or not?
> > 
> 
> I meant something like:
> 
> (define_mode_iterator VSEEQP_DI  [V2DI DI])
> 
> ;; VSX Scalar Extract Exponent Quad-Precision
> (define_insn "xsxexpqp_<IEEE128:mode>_<VSEEQP_DI:mode>"
>   [(set (match_operand:VSEEQP_DI 0 "altivec_register_operand" "=v")
>         (unspec:VSEEQP_DI
>           [(match_operand:IEEE128 1 "altivec_register_operand" "v")]
>           UNSPEC_VSX_SXEXPDP))]
>   "TARGET_P9_VECTOR"
>   "xsxexpqp %0,%1"
>   [(set_attr "type" "vecmove")])

OK, I the define_insn part right.  I was using an existing iterator
that included both DI and V2DI as well as two additional types.  That
results in two unused patterns.  Buit it looks like the issue I got
stuck on was not getting the tweaks in file rs6000-builtins.def all
correct.  I thought the errors I was getting was due to incorrect
syntax in the define_insn, but were actually errors in rs6000-
builtins.def.  Once I sorted that out, it works.  

I did go with the define_mode iterator you suggested so we don't
generate additional unused patterns.  Again, thanks for the help on
that.
> 
> since what you need just one different mode from DI for the operand
> 0,
> using one mode iterator can make you not need to duplicate it with
> a different mode.  With above, you can use xsxexpqp_kf_di for the
> previous xsxexpqp_kf, and xsxexpqp_kf_v2di for your proposed
> __builtin_scalar_extract_exp_to_vec.
> 
> BR,
> Kewen

              Carl

Re: [PATCH] rs6000: Add builtins for IEEE 128-bit floating point values

Reply via email to