I see your point about other floating point formats.
According to the C standard, DOUBLE precision must
have a DBL_MAX of at least 1E+37. That is (slightly)
greater than 0x1.0p+63.
Would
#define RMIN2 (0x1.0p-53)
#define RMINSCAL (0x1.0p+51)
be acceptable?
Those will be in range for any architecture that supports the C standard.
I estimate using those values will cause a typical complex divide to run
somewhat slower for the full exponent dataset because the test for
whether fabs(d) < RMIN2 [or fabs(c) < RMIN2 ] will be taken more often,
adding four scaling multiplies to the overall operation.
On the positive side, using the new scaling factor significantly reduces
the frequency of 2 ulp or larger inaccuracies for the full range dataset.
The alternative would be to remove the RMIN2 test altogether.
That provides a modest reduction in overhead but increases
the inaccuracy for the 8 ulp case to 0.04147%.
Errors Full Dataset
gtr eq current RMIN2=-512 RMIN2=-53 no RMIN2
====== ======== ======== ======== ========
1 ulp 1.87% 0.19755% 0.16788% 0.22169%
2 ulp 1.70% 0.03857% 0.01007% 0.06182%
8 ulp 1.61% 0.02394% 0.00199% 0.04147%
16 ulp 1.51% 0.01548% 0.00087% 0.02727%
24 ulp 1.41% 0.00945% 0.00042% 0.01694%
52 ulp 1.13% 0.00001% 0.00001% 0.00001%
===================================================
The RMIN2=-53% option greatly reduces 8 ulp or larger errors.
The "No RMIN2" option has about double the error rate
of the initial proposal of RMIN2=-512, but still has a error
rate at the 8 ulp level 40 times lower than the current code.
I welcome any thoughts from the community about the relative importance
of accuracy vs performance tradeoffs at these levels.
- Patrick McGehearty
On 6/4/2020 6:45 PM, Joseph Myers wrote:
On Fri, 5 Jun 2020, Patrick McGehearty wrote:
diff --git a/libgcc/libgcc2.c b/libgcc/libgcc2.c
index e0a9fd7..2a1d3dc 100644
--- a/libgcc/libgcc2.c
+++ b/libgcc/libgcc2.c
@@ -2036,26 +2036,77 @@ CONCAT3(__mul,MODE,3) (MTYPE a, MTYPE b, MTYPE c, MTYPE
d)
CTYPE
CONCAT3(__div,MODE,3) (MTYPE a, MTYPE b, MTYPE c, MTYPE d)
{
+#define RBIG ((DBL_MAX)/2.0)
+#define RMIN (DBL_MIN)
+#define RMIN2 (0x1.0p-512)
+#define RMINSCAL (0x1.0p+510)
This code is used for many different machine modes and floating-point
formats (the type and format corresponding to a particular machine mode
may depend on the target for which GCC is configured). You can't hardcode
particular values specific to DFmode (and to DFmode being IEEE binary64)
here.