I see your point about other floating point formats.
According to the C standard, DOUBLE precision must
have a DBL_MAX of at least 1E+37. That is (slightly)
greater than 0x1.0p+63.

Would
#define RMIN2 (0x1.0p-53)
#define RMINSCAL (0x1.0p+51)
be acceptable?
Those will be in range for any architecture that supports the C standard.

I estimate using those values will cause a typical complex divide to run
somewhat slower for the full exponent dataset because the test for
whether fabs(d) < RMIN2 [or fabs(c) < RMIN2 ] will be taken more often,
adding four scaling multiplies to the overall operation.

On the positive side, using the new scaling factor significantly reduces
the frequency of 2 ulp or larger inaccuracies for the full range dataset.
The alternative would be to remove the RMIN2 test altogether.
That provides a modest reduction in overhead but increases
the inaccuracy for the 8 ulp case to 0.04147%.

Errors   Full Dataset
gtr eq     current   RMIN2=-512 RMIN2=-53   no RMIN2
======    ========   ========   ========   ========
 1 ulp      1.87%    0.19755%   0.16788%   0.22169%
 2 ulp      1.70%    0.03857%   0.01007%   0.06182%
 8 ulp      1.61%    0.02394%   0.00199%   0.04147%
16 ulp      1.51%    0.01548%   0.00087%   0.02727%
24 ulp      1.41%    0.00945%   0.00042%   0.01694%
52 ulp      1.13%    0.00001%   0.00001%   0.00001%
===================================================

The RMIN2=-53% option greatly reduces 8 ulp or larger errors.
The "No RMIN2" option has about double the error rate
of the initial proposal of RMIN2=-512, but still has a error
rate at the 8 ulp level 40 times lower than the current code.

I welcome any thoughts from the community about the relative importance
of accuracy vs performance tradeoffs at these levels.

- Patrick McGehearty


On 6/4/2020 6:45 PM, Joseph Myers wrote:
On Fri, 5 Jun 2020, Patrick McGehearty wrote:

diff --git a/libgcc/libgcc2.c b/libgcc/libgcc2.c
index e0a9fd7..2a1d3dc 100644
--- a/libgcc/libgcc2.c
+++ b/libgcc/libgcc2.c
@@ -2036,26 +2036,77 @@ CONCAT3(__mul,MODE,3) (MTYPE a, MTYPE b, MTYPE c, MTYPE 
d)
  CTYPE
  CONCAT3(__div,MODE,3) (MTYPE a, MTYPE b, MTYPE c, MTYPE d)
  {
+#define RBIG   ((DBL_MAX)/2.0)
+#define RMIN   (DBL_MIN)
+#define RMIN2  (0x1.0p-512)
+#define RMINSCAL (0x1.0p+510)
This code is used for many different machine modes and floating-point
formats (the type and format corresponding to a particular machine mode
may depend on the target for which GCC is configured).  You can't hardcode
particular values specific to DFmode (and to DFmode being IEEE binary64)
here.


Reply via email to