On Mon, Oct 24, 2016 at 02:44:37PM +0100, James Greenhalgh wrote:
> 
> Hi,
> 
> I'm adapting this patch from work started by Matthew Wahab.
> 
> Conversions from double precision floats to the ARM __fp16 are required
> to round only once. A conversion function for double to __fp16 to
> support this on soft-fp targets. This and the following patch add this
> conversion function by reusing the exising float to __fp16 function
> config/arm/fp16.c:__gnu_f2h_internal.
> 
> This patch generalizes __gnu_f2h_internal by adding a specification of
> the source format and reworking the code to make use of it. Initially,
> only the binary32 format is supported.
> 
> Bootstrapped on an ARMv8-A machine with no issues, and cross-tested with
> a reasonable multi-lib range.
> 
> 2016-10-24  James Greenhalgh  <james.greenha...@arm.com>
>           Matthew Wahab  <matthew.wa...@arm.com>
> 
>       * config/arm/fp16.c (struct format): New.
>       (binary32): New.
>       (__gnu_float2h_internal): New.  Body moved from
>       __gnu_f2h_internal and generalize.
>       (_gnu_f2h_internal): Move body to function __gnu_float2h_internal.
>       Call it with binary32.

Looking at it carefully, this patch has a bug in the way it handles
rounding for doubles. The relevant code looks like this:

  if (aexp < -14)
    {
      mask = point | (point - 1);
      /* Minimum exponent for half-precision is 2^-24.  */
      if (aexp >= -25)
        mask >>= 25 + aexp;
    }
  else
    mask = 0x00001fff;

  /* Round.  */
  if (mantissa & mask)
    {
      increment = (mask + 1) >> 1;
      if ((mantissa & mask) == increment)
        increment = mantissa & (increment << 1);
      mantissa += increment;
      if (mantissa >= (point << 1))
        {
          mantissa >>= 1;
          aexp++;
        }
    }

But clearly that mask is not going to be sufficient when we are deciding
whether we ought to round a double or not as there are many more bits for
us to look at.

I'll work on a new spin of this patch.

I'm still hopeful that both this and the AArch64 (and generic bits and
pieces) support for _Float16 will make it for GCC 7.

Thanks,
James

Reply via email to