Re: [PATCH] rs6000: Add builtins for IEEE 128-bit floating point values

Kewen.Lin via Gcc-patches Wed, 07 Jun 2023 02:36:45 -0700

Hi,

on 2023/6/7 03:54, Carl Love wrote:
> On Mon, 2023-06-05 at 16:45 +0800, Kewen.Lin wrote:
>> Hi Carl,
>>
>> on 2023/5/2 23:52, Carl Love via Gcc-patches wrote:
>>> GCC maintainers:
>>>
>>> The following patch adds three buitins for inserting and extracting
>>> the
>>> exponent and significand for an IEEE 128-bit floating point
>>> values. 
>>> The builtins are valid for Power 9 and Power 10.  
>>
>> We already have:
>>
>> unsigned long long int scalar_extract_exp (__ieee128 source);
>> unsigned __int128 scalar_extract_sig (__ieee128 source);
>> ieee_128 scalar_insert_exp (unsigned __int128 significand,
>>                             unsigned long long int exponent);
>> ieee_128 scalar_insert_exp (ieee_128 significand, unsigned long long
>> int exponent);
>>
>> you need to say something about the requirements or the justification
>> for
>> adding more, for this patch itself, some comments are inline below,
>> thanks!
> 
> I implemented the patch based on a request for the builtins.  It didn't
> include any justification so I reached out to Steve Monroe who
> requested the builtins to understand why he wanted them.  Here is his
> reply:
> 
>    Basically there is no clean and performant way to transfer between a
>    vector type and the ieee128 scalar, despite the fact that both
>    reside in vector registers. Also a union transfer does not work
>    correctly on most GCC versions (and will likely break again in the
>    next release). I offer the long sad history of the IBM long double
>    float runtime.


Thanks for clarifying this.  As the proposed changes, I think he meant
to say "Basically there is no clean and performant way to transfer between
a vector type and the scalar **types**". :) Because the proposed changes
are:
  scalar_extract_exp: unsigned long long => vector unsigned long long
  scalar_extract_sig: unsigned __int128  => vector unsigned __int128
  scalar_insert_exp: unsigned __int128 => vector unsigned __int128
                     unsigned long long => vector unsigned long long.

> 
>    Also there are __ieee128 operations that are provided by builtins
>    for POWER9 but are not provided by libgcc (for POWER8).
> 
>    Finally I can prove that a softfloat __ieee128 implementation using
>    VMX integer operations, out-performs the current libgcc
>    implementation using DW GPRs.
> 
>    The details are in the PVECLIB documentation
>    pveclib/vec__f128__ppc.h
> 
> 
>>
>>> The patch has been tested on both Power 9 and Power 10.
>>>
>>> Please let me know if this patch is acceptable for
>>> mainline.  Thanks.
>>>
>>>             Carl 
>>>
>>>
>>> ----------------------
>>> From a20cc81f98cce1140fc95775a7c25b55d1ca7cba Mon Sep 17 00:00:00
>>> 2001
>>> From: Carl Love <c...@us.ibm.com>
>>> Date: Wed, 12 Apr 2023 17:46:37 -0400
>>> Subject: [PATCH] rs6000: Add builtins for IEEE 128-bit floating
>>> point values
>>>
>>> Add support for the following builtins:
>>>
>>>  __vector unsigned long long int __builtin_extractf128_exp
>>> (__ieee128);
>>
>> Could you make the name similar to the existing one?  The existing
>> one
>>   
>>   unsigned long long int scalar_extract_exp (__ieee128 source);
>>
>> has nothing like f128 on its name, this variant is just to change the
>> return type to vector type, how about scalar_extract_exp_to_vec?
> 
> I changed the name  __builtin_extractf128_exp  to
> __builtin_scalar_extract_exp_to_vec.
> 
>>
>>>  __vector unsigned __int128 __builtin_extractf128_sig (__ieee128);
>>
>> Ditto.
> 
> I changed the name  __builtin_extractf128_sig to
> __builtin_scalar_extract_sig_to_vec.
> 
>>
>>>  __ieee128 __builtin_insertf128_exp (__vector unsigned __int128,
>>>                                  __vector unsigned long long);
>>
>> This one can just overload the existing scalar_insert_exp?
> 
> 
> I tried making this one an overloaded version of
> scalar_insert_exp.  However, the overload with the vector arguments
> isn't recognized when I put the overload definition at the end of the
> list of overloads.  When I tried putting the vector version as the
> first overloaded definition, I get an internal error
> on  __builtin_vsx_scalar_insert_exp_q which is has the same arguments
> types but as scalars not vectors.  Best I can tell, there is an issue
> with mixing scalar and vector arguments in an overloaded builtin.

No, it's not due to mixing scalar and vector arguments, I looked into
this and found we have some special handling for this builtin in
altivec_resolve_overloaded_builtin, see the code:

    case RS6000_OVLD_VEC_VSIE:
      {
        machine_mode arg1_mode = TYPE_MODE (types[0]);

        /* If supplied first argument is wider than 64 bits, resolve to
           128-bit variant of built-in function.  */
        if (GET_MODE_PRECISION (arg1_mode) > 64)
          {
            /* If first argument is of float variety, choose variant
               that expects __ieee128 argument.  Otherwise, expect
               __int128 argument.  */
            if (GET_MODE_CLASS (arg1_mode) == MODE_FLOAT)
              instance_code = RS6000_BIF_VSIEQPF;
            else
              instance_code = RS6000_BIF_VSIEQP;
          }
          ....
 
When the 1st arg's precision is larger than 64, it just picks up
RS6000_BIF_VSIEQPF or RS6000_BIF_VSIEQP, you need to update this for
what you newly added.  Something like (against your v2):

   const _Float128 __builtin_vsx_scalar_insert_exp_vqp (vuq, vull);
-    VSIEDP_VULL xsiexpqpf_f128_kf {}
+    VSIEQPV xsiexpqpf_f128_kf {}


@@ -1934,6 +1934,8 @@ altivec_resolve_overloaded_builtin (location_t loc, tree 
fndecl,
                __int128 argument.  */
             if (GET_MODE_CLASS (arg1_mode) == MODE_FLOAT)
               instance_code = RS6000_BIF_VSIEQPF;
+            else if (VECTOR_MODE_P (arg1_mode))
+              instance_code = RS6000_BIF_VSIEQPV;
             else
               instance_code = RS6000_BIF_VSIEQP;
           }

diff --git a/gcc/config/rs6000/rs6000-overload.def 
b/gcc/config/rs6000/rs6000-overload.def
index 102ead9f80b..84743114c8e 100644
--- a/gcc/config/rs6000/rs6000-overload.def
+++ b/gcc/config/rs6000/rs6000-overload.def
@@ -4515,8 +4515,8 @@
     VSIEQP
   _Float128 __builtin_vec_scalar_insert_exp (_Float128, unsigned long long);
     VSIEQPF
-  _Float128 __builtin_vsx_scalar_insert_exp_vqp (vuq, vull);
-    VSIEDP_VULL
+  _Float128 __builtin_vec_scalar_insert_exp (vuq, vull);
+    VSIEQPV

> 
> I renamed __builtin_insertf128_exp as
> __builtin_vsx_scalar_insert_exp_vqp which is just the vector version of
>   the existing __builtin_vsx_scalar_insert_exp_qp builtin.
>>
>> gcc/
>>>     * config/rs6000/rs6000-buildin.def (__builtin_extractf128_exp,
>>>      __builtin_extractf128_sig, __builtin_insertf128_exp): Add new
>>>     builtin definitions.
>>>     * config/rs6000.md (extractf128_exp_<mode>,
>>> insertf128_exp_<mode>,
>>>     extractf128_sig_<mode>): Add define_expand for new builtins.
>>>     (xsxexpqp_f128_<mode>, xsxsigqp_f128_<mode>,
>>> siexpqpf_f128_<mode>):
>>>     Add define_insn for new builtins.
>>>     * doc/extend.texi (__builtin_extractf128_exp,
>>> __builtin_extractf128_sig,
>>>     __builtin_insertf128_exp): Add documentation for new builtins.
>>>
>>> gcc/testsuite/
>>>     * gcc.target/powerpc/bfp/extract-exp-ieee128.c: New test case.
>>>     * gcc.target/powerpc/bfp/extract-sig-ieee128.c: New test case.
>>>     * gcc.target/powerpc/bfp/insert-exp-ieee128.c: New test case.
>>> ---
>>>  gcc/config/rs6000/rs6000-builtins.def         |  9 +++
>>>  gcc/config/rs6000/vsx.md                      | 66
>>> ++++++++++++++++++-
>>>  gcc/doc/extend.texi                           | 28 ++++++++
>>>  .../powerpc/bfp/extract-exp-ieee128.c         | 49 ++++++++++++++
>>>  .../powerpc/bfp/extract-sig-ieee128.c         | 56
>>> ++++++++++++++++
>>>  .../powerpc/bfp/insert-exp-ieee128.c          | 58
>>> ++++++++++++++++
>>>  6 files changed, 265 insertions(+), 1 deletion(-)
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-
>>> exp-ieee128.c
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/extract-
>>> sig-ieee128.c
>>>  create mode 100644 gcc/testsuite/gcc.target/powerpc/bfp/insert-
>>> exp-ieee128.c
>>>
>>> diff --git a/gcc/config/rs6000/rs6000-builtins.def
>>> b/gcc/config/rs6000/rs6000-builtins.def
>>> index 638d0bc72ca..3247a7f7673 100644
>>> --- a/gcc/config/rs6000/rs6000-builtins.def
>>> +++ b/gcc/config/rs6000/rs6000-builtins.def
>>> @@ -2876,6 +2876,15 @@
>>>    pure vsc __builtin_vsx_xl_len_r (void *, signed long);
>>>      XL_LEN_R xl_len_r {}
>>>  
>>> +  vull __builtin_extractf128_exp (_Float128);
>>> +    EEXPKF extractf128_exp_kf {}
>>> +
>>> +  vuq __builtin_extractf128_sig (_Float128);
>>> +    ESIGKF extractf128_sig_kf {}
>>> +
>>> +  _Float128 __builtin_insertf128_exp (vuq, vull);
>>> +    IEXPKF_VULL insertf128_exp_kf {}
>>> +
>>
>> Put them to be near its related ones like
>> __builtin_vsx_scalar_extract_expq
>> etc.
>>
> 
> I moved the definitions to be near the other definitions.
> 
>>>  
>>>  ; Builtins requiring hardware support for IEEE-128 floating-point.
>>>  [ieee128-hw]
>>> diff --git a/gcc/config/rs6000/vsx.md b/gcc/config/rs6000/vsx.md
>>> index 7d845df5c2d..2a9f875ba57 100644
>>> --- a/gcc/config/rs6000/vsx.md
>>> +++ b/gcc/config/rs6000/vsx.md
>>> @@ -369,7 +369,10 @@
>>>     UNSPEC_XXSPLTI32DX
>>>     UNSPEC_XXBLEND
>>>     UNSPEC_XXPERMX
>>> -  ])
>>> +   UNSPEC_EXTRACTEXPIEEE
>>> +   UNSPEC_EXTRACTSIGIEEE
>>> +   UNSPEC_INSERTEXPIEEE
>>
>> These are not necessary, just use the existing UNSPEC_VSX_SXEXPDP
>> etc.
> 
> I removed these, as they are not needed.
> 
>>
>>> +])
>>>  
>>>  (define_int_iterator XVCVBF16      [UNSPEC_VSX_XVCVSPBF16
>>>                              UNSPEC_VSX_XVCVBF16SPN])
>>> @@ -4155,6 +4158,38 @@
>>>   "vins<wd>rx %0,%1,%2"
>>>   [(set_attr "type" "vecsimple")])
>>>  
>>> +(define_expand "extractf128_exp_<mode>"
>>> +  [(set (match_operand:V2DI 0 "altivec_register_operand")
>>> +  (unspec:IEEE128 [(match_operand:IEEE128 1
>>> "altivec_register_operand")]
>>> +             UNSPEC_EXTRACTEXPIEEE))]
>>> +"TARGET_P9_VECTOR"
>>> +{
>>> +  emit_insn (gen_xsxexpqp_f128_<mode> (operands[0], operands[1]));
>>> +  DONE;
>>> +})
>>> +
>>> +(define_expand "insertf128_exp_<mode>"
>>> +  [(set (match_operand:IEEE128 0 "altivec_register_operand")
>>> +  (unspec:IEEE128 [(match_operand:V1TI 1
>>> "altivec_register_operand")
>>> +              (match_operand:V2DI 2 "altivec_register_operand")]
>>> +             UNSPEC_INSERTEXPIEEE))]
>>> +"TARGET_P9_VECTOR"
>>> +{
>>> +  emit_insn (gen_xsiexpqpf_f128_<mode> (operands[0], operands[1],
>>> +                                   operands[2]));
>>> +  DONE;
>>> +})
>>> +
>>> +(define_expand "extractf128_sig_<mode>"
>>> +  [(set (match_operand:V2DI 0 "altivec_register_operand")
>>> +  (unspec:IEEE128 [(match_operand:IEEE128 1
>>> "altivec_register_operand")]
>>> +             UNSPEC_EXTRACTSIGIEEE))]
>>> +"TARGET_P9_VECTOR"
>>> +{
>>> +  emit_insn (gen_xsxsigqp_f128_<mode> (operands[0], operands[1]));
>>> +  DONE;
>>> +})
>>
>> These define_expands are not necessary, the called gen functions from
>> the
>> define_insns can be directly used as bif pattern in rs6000-
>> builtins.def.
> 
> Right, I just mapped directly to the instruction the above were
> calling.
>>
>>> +
>>>  (define_expand "vreplace_elt_<mode>"
>>>    [(set (match_operand:REPLACE_ELT 0 "register_operand")
>>>    (unspec:REPLACE_ELT [(match_operand:REPLACE_ELT 1
>>> "register_operand")
>>> @@ -5016,6 +5051,15 @@
>>>    "xsxexpqp %0,%1"
>>>    [(set_attr "type" "vecmove")])
>>>  
>>> +;; VSX Scalar to Vector Extract Exponent IEEE 128-bit floating
>>> point format
>>> +(define_insn "xsxexpqp_f128_<mode>"
>>> +  [(set (match_operand:V2DI 0 "altivec_register_operand" "=v")
>>> +   (unspec:V2DI [(match_operand:IEEE128 1
>>> "altivec_register_operand" "v")]
>>> +    UNSPEC_VSX_SXEXPDP))]
>>> +  "TARGET_P9_VECTOR"
>>> +  "xsxexpqp %0,%1"
>>> +  [(set_attr "type" "vecmove")])
>>
>> I think you can just merge this into the existing xsxexpqp_<mode> by
>> extending
>> with one mode like xsxexpqp_<IEEE128:mode>_<xxx:mode> for unspec:xxx.
>>
>> xxx can be one mode iterator for DI and V2DI.
> 
> 
> OK, I played with this, it is rather sophisticated syntax.  I didn't
> seem to be able to get the syntax correct and thus get all this to
> work.  It also seemed to require updating some case statements for the
> instructions in various files.  I opted to not change this and just
> keep it simple.  Not sure if that is OK or not?
> 

I meant something like:

(define_mode_iterator VSEEQP_DI  [V2DI DI])

;; VSX Scalar Extract Exponent Quad-Precision
(define_insn "xsxexpqp_<IEEE128:mode>_<VSEEQP_DI:mode>"
  [(set (match_operand:VSEEQP_DI 0 "altivec_register_operand" "=v")
        (unspec:VSEEQP_DI
          [(match_operand:IEEE128 1 "altivec_register_operand" "v")]
          UNSPEC_VSX_SXEXPDP))]
  "TARGET_P9_VECTOR"
  "xsxexpqp %0,%1"
  [(set_attr "type" "vecmove")])

since what you need just one different mode from DI for the operand 0,
using one mode iterator can make you not need to duplicate it with
a different mode.  With above, you can use xsxexpqp_kf_di for the
previous xsxexpqp_kf, and xsxexpqp_kf_v2di for your proposed
__builtin_scalar_extract_exp_to_vec.

BR,
Kewen

Re: [PATCH] rs6000: Add builtins for IEEE 128-bit floating point values

Reply via email to