[Bug libfortran/93871] COTAN is slow for complex types

sgk at troutmask dot apl.washington.edu Mon, 02 Mar 2020 13:00:31 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93871

--- Comment #36 from Steve Kargl <sgk at troutmask dot apl.washington.edu> ---
On Mon, Mar 02, 2020 at 11:20:56AM +0000, thenlich at gcc dot gnu.org wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=93871
> 
> --- Comment #35 from Thomas Henlich <thenlich at gcc dot gnu.org> ---
> (In reply to Steve Kargl from comment #34)
> > Even this appears to have some irregularities as my exhaustive
> > test in the interval [1.e-8,1) with direct call to sinf() yields
> > 
> 
> This looks like a job for FMA: "Correctly rounded multiplication
> by arbitrary precision constants"
> (http://perso.ens-lyon.fr/jean-michel.muller/MultConstant.html)
> show that it can be proven to always work.
> 

It can also be show to be slow. :-)

I spent the last few days looking into an implementation.  Here is
a summary of my findings.  Someone, who does numerical analysis for
a living and is smarter than I, may be able to do better.

Exhaustive testing in the indicated intervals for "float sindf(float x)",
"float cosdf(float x)", and "float tandf(float x)".  The reference values
are taken to be that of double (e.g., "double sindd(double x)".

All tests were done with GCC 9.2.0 with FreeBSD's libm on amd64.

The methods are described below.  It is noted the selected algorithm
should be used for all precision to minimize maintenance.  The choice
seems to come down to method 2 or method 4.  Those methods are likely
slower than method 5.

The domain [1,360] is folded into [0,45] by symmetries in sine and
cosine, and the computation is done by selecting sinf() or cosf().
I've only started to consider tandf(), so have only looked at [0,90]
with [0,45] shown below as [45,90] get messes with tanf(x)->inf
for x->90.

sindf in [1, 360]
----------+----------+----------+----------+----------+----------+
          | Method 1 | Method 2 | Method 3 | Method 4 | Method 5 | 
----------+----------+----------+----------+----------+----------+
    count | 70516737 | 70516737 | 70516737 | 70516737 | 70516737 |
ulp > 1.0 | 101589   | 101586   | 279027   | 101599   | 303168   |
ulp > 1.5 | 0        | 0        | 682      | 0        | 79       |
ulp > 1.6 | 0        | 0        | 9        | 0        | 18       |
ulp > 1.7 | 0        | 0        | 0        | 0        | 3        |
max ulp   | 1.485345 | 1.485345 | 1.621218 | 1.485345 | 1.717592 |
x max ulp | 7.173496 | 7.173496 | 1.790705 | 7.173496 | 1.790707 |
----------+----------+----------+----------+----------+----------+

cosdf in [1, 360]
----------+-----------+-----------+-----------+-----------+-----------+
          | Method 1  | Method 2  | Method 3  | Method 4  | Method 5  |
----------+-----------+-----------+-----------+-----------+-----------+
    count | 70516737  | 70516737  | 70516737  | 70516737  | 70516737  |
ulp > 1.0 | 59680     | 59681     | 94071     | 59682     | 70892     |
ulp > 1.5 | 0         | 0         | 191       | 0         | 8         |
ulp > 1.6 | 0         | 0         | 0         | 0         | 0         |
ulp > 1.7 | 0         | 0         | 0         | 0         | 0         |
max ulp   | 1.475156  | 1.475156  | 1.587572  | 1.475156  | 1.571779  |
x max ulp | 88.209297 | 88.209297 | 82.834091 | 88.209297 | 86.417046 |
----------+-----------+-----------+-----------+-----------+-----------+

tandf in [1, 45]
----------+-----------+-----------+-----------+-----------+-----------+
          | Method 1  | Method 2  | Method 3  | Method 4  | Method 5  |
----------+-----------+-----------+-----------+-----------+-----------+
    count | 45350913  | 45350913  | 45350913  | 45350913  | 45350913  |
ulp > 1.0 | 779426    | 779588    | 1317335   | 779423    | 981293    |
ulp > 1.5 | 4945      | 4946      | 14704     | 4947      | 5017      |
ulp > 1.6 | 796       | 794       | 4905      | 796       | 820       |
ulp > 1.7 | 25        | 25        | 812       | 25        | 33        |
max ulp   | 1.748144  | 1.748144  | 1.947416  | 1.748144  | 1.771557  |
x max ulp | 44.998692 | 44.998692 | 44.998466 | 44.998692 | 44.998627 |
----------+-----------+-----------+-----------+-----------+-----------+

Method 1:
   Split x = xhi + xlo and (pi/180) = shi + slo into high and low parts.
   For 32-bit float, the splitting has 12 bits in high parts shi and xhi,
   and low parts contain the trailing 12 and 24 bits in xlo and slo.
   The conversion from degrees to radians, is then 

   f = slo * xlo + slo * xhi + shi * xlo + shi * xhi.

Method 2:
   Do the multiplication of x*(pi/180) is a higher precision type.  This
   works for 32-bit float with 53-bit double, because the multiplication
   provides the needed 48 bits.  For higher precisions, the methods fails
   or will depend of software extended precision.

Method 3:
   Break x and (pi/180) into their significands and exponents with frexpf,
   then multiply significands followed by scaling with scalbnf.

Method 4:
   Split (pi/180) = shi + slo into high and low parts, then use
   fmaf(x, shi, slo).

Method 5:
   Use a naive splitting of x into an integer and remainder via 
   n = x - (int)x and r = x - n. Split (pi/180) = shi + slo into high and
   low parts.  The conversion is then

   f = slo * r + slo * n + shi * r + shi * n.

   This method while slightly worse with ULP is likely the fastest!

[Bug libfortran/93871] COTAN is slow for complex types

Reply via email to