Re: [PATCH 1/3] expr: Allow same precision modes conversion between {ibm_extended, ieee_quad}_format [PR112993]

Richard Sandiford Mon, 08 Jul 2024 04:14:58 -0700

"Kewen.Lin" <li...@linux.ibm.com> writes:
> Hi Richard,
>
> Thanks for the review comments!
>
> on 2024/7/4 23:58, Richard Sandiford wrote:
>> "Kewen.Lin" <li...@linux.ibm.com> writes:
>>> Hi,
>>>
>>> With some historical reasons, rs6000 defines KFmode, TFmode
>>> and IFmode to have different mode precision, but it causes
>>> some issues and needs some workarounds such as r14-6478 for
>>> PR112788.  So we are going to make all rs6000 128 bit scalar
>>> FP modes have 128 bit precision.  Be prepared for that, this
>>> patch is to make function convert_mode_scalar allow same
>>> precision FP modes conversion if their underlying formats are
>>> ibm_extended_format and ieee_quad_format respectively, just
>>> like the existing special treatment on arm_bfloat_half_format
>>> <-> ieee_half_format.  It also factors out all the relevant
>>> checks into a lambda function.
>>>
>>> Bootstrapped and regtested on x86_64-redhat-linux and
>>> powerpc64{,le}-linux-gnu.
>>>
>>> Is it ok for trunk?
>>>
>>> BR,
>>> Kewen
>>> -----
>>>     PR target/112993
>>>
>>> gcc/ChangeLog:
>>>
>>>     * expr.cc (convert_mode_scalar): Allow same precision conversion
>>>     between scalar floating point modes if whose underlying format is
>>>     ibm_extended_format or ieee_quad_format, and refactor assertion
>>>     with new lambda function acceptable_same_precision_modes.
>>> ---
>>>  gcc/expr.cc | 30 ++++++++++++++++++++++++------
>>>  1 file changed, 24 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/gcc/expr.cc b/gcc/expr.cc
>>> index ffbac513692..eac4dcc982e 100644
>>> --- a/gcc/expr.cc
>>> +++ b/gcc/expr.cc
>>> @@ -338,6 +338,29 @@ convert_mode_scalar (rtx to, rtx from, int unsignedp)
>>>    enum rtx_code equiv_code = (unsignedp < 0 ? UNKNOWN
>>>                           : (unsignedp ? ZERO_EXTEND : SIGN_EXTEND));
>>>
>>> +  auto acceptable_same_precision_modes
>>> +    = [] (scalar_mode from_mode, scalar_mode to_mode) -> bool
>>> +    {
>>> +      if (DECIMAL_FLOAT_MODE_P (from_mode) != DECIMAL_FLOAT_MODE_P 
>>> (to_mode))
>>> +   return true;
>>> +
>>> +      /* arm_bfloat_half_format <-> ieee_half_format */
>>> +      if ((REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
>>> +      && REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
>>> +     || (REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
>>> +         && REAL_MODE_FORMAT (from_mode) == &ieee_half_format))
>>> +   return true;
>>> +
>>> +      /* ibm_extended_format <-> ieee_quad_format */
>>> +      if ((REAL_MODE_FORMAT (from_mode) == &ibm_extended_format
>>> +      && REAL_MODE_FORMAT (to_mode) == &ieee_quad_format)
>>> +     || (REAL_MODE_FORMAT (from_mode) == &ieee_quad_format
>>> +         && REAL_MODE_FORMAT (to_mode) == &ibm_extended_format))
>>> +   return true;
>>> +
>>> +      return false;
>>> +    };
>>> +
>>>    if (to_real)
>>>      {
>>>        rtx value;
>>> @@ -346,12 +369,7 @@ convert_mode_scalar (rtx to, rtx from, int unsignedp)
>>>
>>>        gcc_assert ((GET_MODE_PRECISION (from_mode)
>>>                != GET_MODE_PRECISION (to_mode))
>>> -             || (DECIMAL_FLOAT_MODE_P (from_mode)
>>> -                 != DECIMAL_FLOAT_MODE_P (to_mode))
>>> -             || (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
>>> -                 && REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
>>> -             || (REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
>>> -                 && REAL_MODE_FORMAT (from_mode) == &ieee_half_format));
>>> +             || acceptable_same_precision_modes (from_mode, to_mode));
>>>
>>>        if (GET_MODE_PRECISION (from_mode) == GET_MODE_PRECISION (to_mode))
>>>     {
>>> --
>>> 2.39.1
>> 
>> This part looks good to me FWIW, but what's the correct behaviour of:
>> 
>>       if (GET_MODE_PRECISION (from_mode) == GET_MODE_PRECISION (to_mode))
>>      {
>>        if (REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
>>            && REAL_MODE_FORMAT (from_mode) == &ieee_half_format)
>>          /* libgcc implements just __trunchfbf2, not __extendhfbf2.  */
>>          tab = trunc_optab;
>>        else
>>          /* Conversion between decimal float and binary float, same
>>             size.  */
>>          tab = DECIMAL_FLOAT_MODE_P (from_mode) ? trunc_optab : sext_optab;
>> 
>> for the new pairing?  The intent for bfloat/half seems to be that bfloat
>> is treated as arbitrarily “lesser than” half, so half->bfloat is a
>> truncation and bfloat->half is an extension.  It seems like it would be
>> good to do something similar for the new pair, so that the float modes
>> still form a total order in terms of extension & truncation.
>
> Good question!  If I read it right, this special handling for half->bfloat is
> to align with the previous implementation in libgcc and easy for backporting
> , but isn't to keep the modes to form a total order, as Jakub's comments
> PR114907 #c6 and #c8.  Similar to half vs. bfloat, neither of ibm128 nor 
> ieee128
> is a subset or superset of the other, the current behavior for this new 
> paring is
> that:
>   1) define sext_optab for any two of TF/KF/IF (also bi-direction), since 
> generic
>      code above adopts sext_optab for same size conversion.


But before your patch, that code only expected equal precisions if:

                  || (DECIMAL_FLOAT_MODE_P (from_mode)
                      != DECIMAL_FLOAT_MODE_P (to_mode))
                  || (REAL_MODE_FORMAT (from_mode) == &arm_bfloat_half_format
                      && REAL_MODE_FORMAT (to_mode) == &ieee_half_format)
                  || (REAL_MODE_FORMAT (to_mode) == &arm_bfloat_half_format
                      && REAL_MODE_FORMAT (from_mode) == &ieee_half_format)

So the effect was:

binary->decimal: extend
decimal->binary: truncate
bfloat->IEEE: extend
IEEE->bfloat: truncate

AFAICT there was no X and Y for which X->Y and Y->X are both extensions.

>   2) for each define_expand in 1), it actually calls 
> rs6000_expand_float128_convert
>      which is one existing helper to handle conversions to/from __float128 and
>      __ibm128, it will take care of the remaining (generate __extend* or 
> __trunc*).
>      Similar to half vs. bfloat which has extend and trunc libgcc functions, 
> we
>      have some extend and trunc libgcc functions for __float128 vs. __ibm128.
>
> We can add some special treatment for the new pairing like what's for half 
> vs. bfloat,
> but since extend and truncate can be arbitrary for them, I thought we can 
> just define
> sext_optab to align with what generic code does and wants, then do the actual 
> handling
> in the corresponding expander (someone may argue that sext_optab is expected 
> to be end
> with libcall __extend*, it's surprising to end up with __trunc*, but 
> considering
> the extend/truncate here is arbitrary, guessing it's acceptable?).

Yeah, using a trunc libgcc function for sext_optab seems surprising to me.

Like Jakub said in #c8 above, both extension and truncation are conceptually
wrong.  But given that both are wrong, we can't use the literal meaning to
choose between them, and need to find something else instead.

Making the sext/trunc choice mirror the current libgcc functions seems
cleaner.  It also avoids the (IMO) odd situation that you can construct
a valid and infinitely long chain of extensions from a finite set of types.
IMO the types should be well ordered if possible.

Thanks,
Richard

Re: [PATCH 1/3] expr: Allow same precision modes conversion between {ibm_extended, ieee_quad}_format [PR112993]

Reply via email to