On Tue, Sep 8, 2020 at 8:50 PM Patrick McGehearty via Gcc-patches
<gcc-patches@gcc.gnu.org> wrote:
>
> (Version 4)
>
> (Added in version 4)
> Fixed Changelog entry to include __divsc3, __divdc3, __divxc3, __divtc3.
> Revised description to avoid incorrect use of "ulp (units last place)".
> Modified float precison case to use double precision when double
> precision hardware is available. Otherwise float uses the new algorithm.
> Added code to scale subnormal numerator arguments when appropriate.
> This change reduces 16 bit errors in double precision by a factor of 140.
> Revised results charts to match current version of code.
> Added background of tuning approach.
>
> Summary of Purpose
>
> The following patch to libgcc/libgcc2.c __divdc3 provides an
> opportunity to gain important improvements to the quality of answers
> for the default complex divide routine (half, float, double, extended,
> long double precisions) when dealing with very large or very small exponents.
>
> The current code correctly implements Smith's method (1962) [2]
> further modified by c99's requirements for dealing with NaN (not a
> number) results. When working with input values where the exponents
> are greater than *_MAX_EXP/2 or less than -(*_MAX_EXP)/2, results are
> substantially different from the answers provided by quad precision
> more than 1% of the time. This error rate may be unacceptable for many
> applications that cannot a priori restrict their computations to the
> safe range. The proposed method reduces the frequency of
> "substantially different" answers by more than 99% for double
> precision at a modest cost of performance.
>
> Differences between current gcc methods and the new method will be
> described. Then accuracy and performance differences will be discussed.
>
> Background
>
> This project started with an investigation related to
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59714.  Study of Beebe[1]
> provided an overview of past and recent practice for computing complex
> divide. The current glibc implementation is based on Robert Smith's
> algorithm [2] from 1962.  A google search found the paper by Baudin
> and Smith [3] (same Robert Smith) published in 2012. Elen Kalda's
> proposed patch [4] is based on that paper.
>
> I developed two sets of test set by randomly distributing values over
> a restricted range and the full range of input values. The current
> complex divide handled the restricted range well enough, but failed on
> the full range more than 1% of the time. Baudin and Smith's primary
> test for "ratio" equals zero reduced the cases with 16 or more error
> bits by a factor of 5, but still left too many flawed answers. Adding
> debug print out to cases with substantial errors allowed me to see the
> intermediate calculations for test values that failed. I noted that
> for many of the failures, "ratio" was a subnormal. Changing the
> "ratio" test from check for zero to check for subnormal reduced the 16
> bit error rate by another factor of 12. This single modified test
> provides the greatest benefit for the least cost, but the percentage
> of cases with greater than 16 bit errors (double precision data) is
> still greater than 0.027% (2.7 in 10,000).
>
> Continued examination of remaining errors and their intermediate
> computations led to the various tests of input value tests and scaling
> to avoid under/overflow. The current patch does not handle some of the
> rarest and most extreme combinations of input values, but the random
> test data is only showing 1 case in 10 million that has an error of
> greater than 12 bits. That case has 18 bits of error and is due to
> subtraction cancellation. These results are significantly better
> than the results reported by Baudin and Smith.
>
> Support for half, float, double, extended, and long double precision
> is included as all are handled with suitable preprocessor symbols in a
> single source routine. Since half precision is computed with float
> precision as per current libgcc practice, the enhanced algorithm
> provides no benefit for half precision and would cost performance.
> Therefore half precision is left unchanged.
>
> The existing constants for each precision:
> float: FLT_MAX, FLT_MIN;
> double: DBL_MAX, DBL_MIN;
> extended and/or long double: LDBL_MAX, LDBL_MIN
> are used for avoiding the more common overflow/underflow cases.
>
> Testing for when both parts of the denominator had exponents roughly
> small enough to allow shifting any subnormal values to normal values,
> all input values could be scaled up without risking unnecessary
> overflow and gaining a clear improvement in accuracy. Similarly, when
> either numerator was subnormal and the other numerator and both
> denominator values were not too large, scaling could be used to reduce
> risk of computing with subnormals.  The test and scaling values used
> all fit within the allowed exponent range for each precision required
> by the C standard.
>
> Float precision has even more difficulty with getting correct answers
> than double precision. When hardware for double precision floating
> point operations is available, float precision is now handled in
> double precision intermediate calculations with the original Smith
> algorithm (i.e. the current approach). Using the higher precision
> yields exact results for all tested input values (64-bit double,
> 32-bit float) with the only performance cost being the requirement to
> convert the four input values from float to double. If double
> precision hardware is not available, then float complex divide will
> use the same algorithm as the other precisions with similar
> decrease in performance.
>
> Further Improvement
>
> The most common remaining substantial errors are due to accuracy loss
> when subtracting nearly equal values. This patch makes no attempt to
> improve that situation.
>
> NOTATION
>
> For all of the following, the notation is:
> Input complex values:
>   a+bi  (a= real part, b= imaginary part)
>   c+di
> Output complex value:
>   e+fi = (a+bi)/(c+di)
>
> For the result tables:
> current = current method (SMITH)
> b1div = method proposed by Elen Kalda
> b2div = alternate method considered by Elen Kalda
> new = new method proposed by this patch
>
> DESCRIPTIONS of different complex divide methods:
>
> NAIVE COMPUTATION (-fcx-limited-range):
>   e = (a*c + b*d)/(c*c + d*d)
>   f = (b*c - a*d)/(c*c + d*d)
>
> Note that c*c and d*d will overflow or underflow if either
> c or d is outside the range 2^-538 to 2^512.
>
> This method is available in gcc when the switch -fcx-limited-range is
> used. That switch is also enabled by -ffast-math. Only one who has a
> clear understanding of the maximum range of all intermediate values
> generated by an application should consider using this switch.
>
> SMITH's METHOD (current libgcc):
>   if(fabs(c)<fabs(d) {
>     r = c/d;
>     denom = (c*r) + d;
>     e = (a*r + b) / denom;
>     f = (b*r - a) / denom;
>   } else {
>     r = d/c;
>     denom = c + (d*r);
>     e = (a + b*r) / denom;
>     f = (b - a*r) / denom;
>   }
>
> Smith's method is the current default method available with __divdc3.
>
> Elen Kalda's METHOD
>
> Elen Kalda proposed a patch about a year ago, also based on Baudin and
> Smith, but not including tests for subnormals:
> https://gcc.gnu.org/legacy-ml/gcc-patches/2019-08/msg01629.html [4]
> It is compared here for accuracy with this patch.
>
> This method applies the most significant part of the algorithm
> proposed by Baudin&Smith (2012) in the paper "A Robust Complex
> Division in Scilab" [3]. Elen's method also replaces two divides by
> one divide and two multiplies due to the high cost of divide on
> aarch64. In the comparison sections, this method will be labeled
> b1div. A variation discussed in that patch which does not replace the
> two divides will be labeled b2div.
>
>   inline void improved_internal (MTYPE a, MTYPE b, MTYPE c, MTYPE d)
>   {
>     r = d/c;
>     t = 1.0 / (c + (d * r));
>     if (r != 0) {
>         x = (a + (b * r)) * t;
>         y = (b - (a * r)) * t;
>     }  else {
>     /* Changing the order of operations avoids the underflow of r impacting
>      the result. */
>         x = (a + (d * (b / c))) * t;
>         y = (b - (d * (a / c))) * t;
>     }
>   }
>
>   if (FABS (d) < FABS (c)) {
>       improved_internal (a, b, c, d);
>   } else {
>       improved_internal (b, a, d, c);
>       y = -y;
>   }
>
> NEW METHOD (proposed by patch) to replace the current default method:
>
> The proposed method starts with an algorithm proposed by Baudin&Smith
> (2012) in the paper "A Robust Complex Division in Scilab" [3]. The
> patch makes additional modifications to that method for further
> reductions in the error rate. The following code shows the #define
> values for double precision. See the patch for #define values used
> for other precisions.
>
>   #define RBIG ((DBL_MAX)/2.0)
>   #define RMIN (DBL_MIN)
>   #define RMIN2 (0x1.0p-53)
>   #define RMINSCAL (0x1.0p+51)
>   #define RMAX2  ((RBIG)*(RMIN2))
>
>   if (FABS(c) < FABS(d)) {
>   /* prevent overflow when arguments are near max representable */
>   if ((FABS (d) > RBIG) || (FABS (a) > RBIG) || (FABS (b) > RBIG) ) {
>       a = a * 0.5;
>       b = b * 0.5;
>       c = c * 0.5;
>       d = d * 0.5;
>   }
>   /* minimize overflow/underflow issues when c and d are small */
>   else if (FABS (d) < RMIN2) {
>       a = a * RMINSCAL;
>       b = b * RMINSCAL;
>       c = c * RMINSCAL;
>       d = d * RMINSCAL;
>   }
>   else {
>     if(((FABS (a) < RMIN) && (FABS (b) < RMAX2) && (FABS (d) < RMAX2)) ||
>        ((FABS (b) < RMIN) && (FABS (a) < RMAX2) && (FABS (d) < RMAX2))) {
>         a = a * RMINSCAL;
>         b = b * RMINSCAL;
>         c = c * RMINSCAL;
>         d = d * RMINSCAL;
>     }
>   }
>   r = c/d; denom = (c*r) + d;
>   if( r > RMIN ) {
>       e = (a*r + b) / denom   ;
>       f = (b*r - a) / denom
>   } else {
>       e = (c * (a/d) + b) / denom;
>       f = (c * (b/d) - a) / denom;
>   }
>   }
> [ only presenting the fabs(c) < fabs(d) case here, full code in patch. ]
>
> Before any computation of the answer, the code checks for any input
> values near maximum to allow down scaling to avoid overflow.  These
> scalings almost never harm the accuracy since they are by 2. Values that
> are over RBIG are relatively rare but it is easy to test for them and
> allow aviodance of overflows.
>
> Testing for RMIN2 reveals when both c and d are less than 2^-53 (for
> double precision, see patch for other values).  By scaling all values
> by 2^51, the code avoids many underflows in intermediate computations
> that otherwise might occur. If scaling a and b by 2^51 causes either
> to overflow, then the computation will overflow whatever method is
> used.
>
> Finally, we test for either a or b being subnormal (RMIN) and if so,
> for the other three values being small enough to allow scaling.  We
> only need to test a single denominator value since we have already
> determined which of c and d is larger.
>
> Next, r (the ratio of c to d) is checked for being near zero. Baudin
> and Smith checked r for zero. This code improves that approach by
> checking for values less than DBL_MIN (subnormal) covers roughly 12
> times as many cases and substantially improves overall accuracy. If r
> is too small, then when it is used in a multiplication, there is a
> high chance that the result will underflow to zero, losing significant
> accuracy. That underflow is avoided by reordering the computation.
> When r is subnormal, the code replaces a*r (= a*(c/d)) with ((a/d)*c)
> which is mathematically the same but avoids the unnecessary underflow.
>
> TEST Data
>
> Two sets of data are presented to test these methods. Both sets
> contain 10 million pairs of complex values.  The exponents and
> mantissas are generated using multiple calls to random() and then
> combining the results. Only values which give results to complex
> divide that are representable in the appropriate precision after
> being computed in quad precision are used.
>
> The first data set is labeled "moderate exponents".
> The exponent range is limited to -DBL_MAX_EXP/2 to DBL_MAX_EXP/2
> for Double Precision (use FLT_MAX_EXP or LDBL_MAX_EXP for the
> appropriate precisions.
> The second data set is labeled "full exponents".
> The exponent range for these cases is the full exponent range
> including subnormals for a given precision.
>
> ACCURACY Test results:
>
> Note: The following accuracy tests are based on IEEE-754 arithmetic.
>
> Note: All results reporteed are based on use of fused multiply-add. If
> fused multiply-add is not used, the error rate increases, giving more
> 1 and 2 bit errors for both current and new complex divide.
> Differences between using fused multiply and not using it that are
> greater than 2 bits are less than 1 in a million.
>
> The complex divide methods are evaluated by determining the percentage
> of values that exceed differences in low order bits.  If a "2 bit"
> test results show 1%, that would mean that 1% of 10,000,000 values
> (100,000) have either a real or imaginary part that differs from the
> quad precision result by more than the last 2 bits.
>
> Results are reported for differences greater than or equal to 1 bit, 2
> bits, 8 bits, 16 bits, 24 bits, and 52 bits for double precision.  Even
> when the patch avoids overflows and underflows, some input values are
> expected to have errors due to the potential for catastrophic roundoff
> from floating point subtraction. For example, when b*c and a*d are
> nearly equal, the result of subtraction may lose several places of
> accuracy. This patch does not attempt to detect or minimize this type
> of error, but neither does it increase them.
>
> I only show the results for Elen Kalda's method (with both 1 and
> 2 divides) and the new method for only 1 divide in the double
> precision table.
>
> In the following charts, lower values are better.
>
> current - current complex divide in libgcc
> b1div - Elen Kalda's method from Baudin & Smith with one divide
> b2div - Elen Kalda's method from Baudin & Smith with two divides
> new   - This patch which uses 2 divides
>
> ===================================================
> Errors   Moderate Dataset
> gtr eq     current    b1div      b2div        new
> ======    ========   ========   ========   ========
>  1 bit    0.24707%   0.92986%   0.24707%   0.24707%
>  2 bits   0.01762%   0.01770%   0.01762%   0.01762%
>  8 bits   0.00026%   0.00026%   0.00026%   0.00026%
> 16 bits   0.00000%   0.00000%   0.00000%   0.00000%
> 24 bits         0%         0%         0%         0%
> 52 bits         0%         0%         0%         0%
> ===================================================
> Table 1: Errors with Moderate Dataset (Double Precision)
>
> Note in Table 1 that both the old and new methods give identical error
> rates for data with moderate exponents. Errors exceeding 16 bits are
> exceedingly rare. There are substantial increases in the 1 bit error
> rates for b1div (the 1 divide/2 multiplys method) as compared to b2div
> (the 2 divides method). These differences are minimal for 2 bits and
> larger error measurements.
>
> ===================================================
> Errors   Full Dataset
> gtr eq     current    b1div      b2div        new
> ======    ========   ========   ========   ========
>  1 bit      2.05%   1.23842%    0.67130%   0.16664%
>  2 bits     1.88%   0.51615%    0.50354%   0.00900%
>  8 bits     1.77%   0.42856%    0.42168%   0.00011%
> 16 bits     1.63%   0.33840%    0.32879%   0.00001%
> 24 bits     1.51%   0.25583%    0.24405%   0.00000%
> 52 bits     1.13%   0.01886%    0.00350%   0.00000%
> ===================================================
> Table 2: Errors with Full Dataset (Double Precision)
>
> Table 2 shows significant differences in error rates. First, the
> difference between b1div and b2div show a significantly higher error
> rate for the b1div method both for single bit errros and well
> beyond. Even for 52 bits, we see the b1div method gets completely
> wrong answers more than 5 times as often as b2div. To retain
> comparable accuracy with current complex divide results for small
> exponents and due to the increase in errors for large exponents, I
> choose to use the more accurate method of two divides.
>
> The current method has more 1.6% of cases where it is getting results
> where the low 24 bits of the mantissa differ from the correct
> answer. More than 1.1% of cases where the answer is completely wrong.
> The new method shows less than one case in 10,000 with greater than
> two bits of error and only one case in 10 million with greater than
> 16 bits of errors. The new patch reduces 8 bit errors by
> a factor of 16,000 and virtually eliminates completely wrong
> answers.
>
> As noted above, for architectures with double precision
> hardware, the new method uses that hardware for the
> intermediate calculations before returning the
> result in float precision. Testing of the new patch
> has shown zero errors found as seen in Tables 3 and 4.
>
> Correctness for float
> =============================
> Errors   Moderate Dataset
> gtr eq     current     new
> ======    ========   ========
>  1 bit   28.68070%         0%
>  2 bits   0.64386%         0%
>  8 bits   0.00401%         0%
> 16 bits   0.00001%         0%
> 24 bits         0%         0%
> =============================
> Table 3: Errors with Moderate Dataset (float)
>
> =============================
> Errors   Full Dataset
> gtr eq     current     new
> ======    ========   ========
>  1 bit     19.97%         0%
>  2 bits     3.20%         0%
>  8 bits     1.90%         0%
> 16 bits     1.03%         0%
> 24 bits     0.52%         0%
> =============================
> Table 4: Errors with Full Dataset (float)
>
> As before, the current method shows an troubling rate of extreme
> errors.
>
> There is no change in accuracy for half-precision since the code is
> unchanged. libgcc computes half-precision functions in float precision
> allowing the existing methods to avoid overflow/underflow issues
> for the allowed range of exponents for half-precision.
>
> Extended precision (using x87 80-bit format on x86) and Long double
> (using IEEE-754 128-bit on x86 and aarch64) both have 15-bit exponents
> as compared to 11-bit exponents in double precision. We note that the
> C standard also allows Long Double to be implemented in the equivalent
> range of Double. The RMIN2 and RMINSCAL constants are selected to work
> within the Double range as well as with extended and 128-bit ranges.
> We will limit our performance and accurancy discussions to the 80-bit
> and 128-bit formats as seen on x86 here.
>
> The extended and long double precision investigations were more
> limited. Aarch64 does not support extended precision but does support
> the software implementation of 128-bit long double precision. For x86,
> long double defaults to the 80-bit precision but using the
> -mlong-double-128 flag switches to using the software implementation
> of 128-bit precision. Both 80-bit and 128-bit precisions have the same
> exponent range, with the 128-bit precision has extended mantissas.
> Since this change is only aimed at avoiding underflow/overflow for
> extreme exponents, I studied the extended precision results on x86 for
> 100,000 values. The limited exponent dataset showed no differences.
> For the dataset with full exponent range, the current and new values
> showed major differences (greater than 32 bits) in 567 cases out of
> 100,000 (0.56%). In every one of these cases, the ratio of c/d or d/c
> (as appropriate) was zero or subnormal, indicating the advantage of
> the new method and its continued correctness where needed.
>
> PERFORMANCE Test results
>
> In order for a library change to be practical, it is necessary to show
> the slowdown is tolerable. The slowdowns observed are much less than
> would be seen by (for example) switching from hardware double precison
> to a software quad precision, which on the tested machines causes a
> slowdown of around 100x).
>
> The actual slowdown depends on the machine architecture. It also
> depends on the nature of the input data. If underflow/overflow is
> rare, then implementations that have strong branch prediction will
> only slowdown by a few cycles. If underflow/overflow is common, then
> the branch predictors will be less accurate and the cost will be
> higher.
>
> Results from two machines are presented as examples of the overhead
> for the new method. The one labeled x86 is a 5 year old Intel x86
> processor and the one labeled aarch64 is a 3 year old arm64 processor.
>
> In the following chart, the times are averaged over a one million
> value data set. All values are scaled to set the time of the current
> method to be 1.0. Lower values are better. A value of less than 1.0
> would be faster than the current method and a value greater than 1.0
> would be slower than the current method.
>
> ================================================
>                Moderate set          full set
>                x86  aarch64        x86  aarch64
> ========     ===============     ===============
> float         0.68    1.26        0.79    1.26
> double        1.08    1.33        1.47    1.76
> long double   1.08    1.23        1.08    1.24
> ================================================
> Table 5: Performance Comparisons (ratio new/current)
>
> The above tables omit the timing for the 1 divide and 2 multiply
> comparison with the 2 divide approach.
>
> For the proposed change, the moderate dataset shows less overhead for
> the newer methods than the full dataset. That's because the moderate
> dataset does not ever take the new branches which protect from
> under/overflow. The better the branch predictor, the lower the cost
> for these untaken branches. Both platforms are somewhat dated, with
> the x86 having a better branch predictor which reduces the cost of the
> additional branches in the new code. Of course, the relative slowdown
> may be greater for some architectures, especially those with limited
> branch prediction combined with a high cost of misprediction.
>
> Special note for x86 float: On the particular model of x86 used for
> these tests, float complex divide runs slower than double complex
> divide. While the issue has not been investigated, I suspect
> the issue to be floating point register assignment which results
> in false sharing being detected by the hardware. A combination
> of HW/compiler without the glitch would likely show something
> like 10-20% slowdown for the new method.
>
> The observed cost for all precisions is claimed to be tolerable on the
> grounds that:
>
> (a) the cost is worthwhile considering the accuracy improvement shown.
> (b) most applications will only spend a small fraction of their time
>     calculating complex divide.
> (c) it is much less than the cost of extended precision
> (d) users are not forced to use it (as described below)
>
> Those users who find this degree of slowdown unsatisfactory may use
> the gcc switch -fcx-fortran-rules which does not use the library
> routine, instead inlining Smith's method without the C99 requirement
> for dealing with NaN results. The proposed patch for libgcc complex
> divide does not affect the code generated by -fcx-fortran-rules.
>
> SUMMARY
>
> When input data to complex divide has exponents whose absolute value
> is less than half of *_MAX_EXP, this patch makes no changes in
> accuracy and has only a modest effect on performance.  When input data
> contains values outside those ranges, the patch eliminates more than
> 99.9% of major errors with a tolerable cost in performance.
>
> In comparison to Elen Kalda's method, this patch introduces more
> performance overhead but reduces major errors by a factor of
> greater than 4000.

Thanks for working on this.  Speaking about performance and
accuracy I spot a few opportunities to use FMAs [and eventually
vectorization] - do FMAs change anything on the accuracy analysis
(is there the chance they'd make it worse?).  We might want to use
IFUNCs in libgcc to version for ISA variants (with/without FMA)?

Thanks,
Richard.

> REFERENCES
>
> [1] Nelson H.F. Beebe, "The Mathematical-Function Computation Handbook.
> Springer International Publishing AG, 2017.
>
> [2] Robert L. Smith. Algorithm 116: Complex division.  Commun. ACM,
>  5(8):435, 1962.
>
> [3] Michael Baudin and Robert L. Smith. "A robust complex division in
> Scilab," October 2012, available at http://arxiv.org/abs/1210.4539.
>
> [4] Elen Kalda: Complex division improvements in libgcc
> https://gcc.gnu.org/legacy-ml/gcc-patches/2019-08/msg01629.html
> ---
>  gcc/c-family/c-cppbuiltin.c |   5 ++
>  libgcc/ChangeLog            |   7 ++
>  libgcc/libgcc2.c            | 178 
> ++++++++++++++++++++++++++++++++++++++++++--
>  3 files changed, 182 insertions(+), 8 deletions(-)
>
> diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
> index 74ecca8..02c06d8 100644
> --- a/gcc/c-family/c-cppbuiltin.c
> +++ b/gcc/c-family/c-cppbuiltin.c
> @@ -1343,6 +1343,11 @@ c_cpp_builtins (cpp_reader *pfile)
>        builtin_define_with_value ("__LIBGCC_INIT_SECTION_ASM_OP__",
>                                  INIT_SECTION_ASM_OP, 1);
>  #endif
> +      /* For libgcc float/double optimization */
> +#ifdef HAVE_adddf3
> +      builtin_define_with_int_value ("__LIBGCC_HAVE_HWDBL__",
> +                                HAVE_adddf3);
> +#endif
>  #ifdef INIT_ARRAY_SECTION_ASM_OP
>        /* Despite the name of this target macro, the expansion is not
>          actually used, and may be empty rather than a string
> diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog
> index ccfd6f6..8bd66c5 100644
> --- a/libgcc/ChangeLog
> +++ b/libgcc/ChangeLog
> @@ -1,3 +1,10 @@
> +2020-08-27  Patrick McGehearty <patrick.mcgehea...@oracle.com>
> +
> +       * libgcc2.c (__divsc3, __divdc3, __divxc3, __divtc3): Enhance
> +       accuracy of complex divide by avoiding underflow/overflow when
> +       ratio underflows or when arguments have very large or very
> +       small exponents.
> +
>  2020-08-26  Jozef Lawrynowicz  <joze...@mittosystems.com>
>
>         * config/msp430/slli.S (__gnu_mspabi_sllp): New.
> diff --git a/libgcc/libgcc2.c b/libgcc/libgcc2.c
> index e0a9fd7..a9866f3 100644
> --- a/libgcc/libgcc2.c
> +++ b/libgcc/libgcc2.c
> @@ -2036,27 +2036,189 @@ CONCAT3(__mul,MODE,3) (MTYPE a, MTYPE b, MTYPE c, 
> MTYPE d)
>  CTYPE
>  CONCAT3(__div,MODE,3) (MTYPE a, MTYPE b, MTYPE c, MTYPE d)
>  {
> +#if defined(L_divsc3)
> +#define RBIG   ((FLT_MAX)/2.0)
> +#define RMIN   (FLT_MIN)
> +#define RMIN2  (0x1.0p-21)
> +#define RMINSCAL (0x1.0p+19)
> +#define RMAX2  ((RBIG)*(RMIN2))
> +#endif
> +
> +#if defined(L_divdc3)
> +#define RBIG   ((DBL_MAX)/2.0)
> +#define RMIN   (DBL_MIN)
> +#define RMIN2  (0x1.0p-53)
> +#define RMINSCAL (0x1.0p+51)
> +#define RMAX2  ((RBIG)*(RMIN2))
> +#endif
> +
> +#if (defined(L_divxc3) || defined(L_divtc3))
> +#define RBIG   ((LDBL_MAX)/2.0)
> +#define RMIN   (LDBL_MIN)
> +#define RMIN2  (0x1.0p-53)
> +#define RMINSCAL (0x1.0p+51)
> +#define RMAX2  ((RBIG)*(RMIN2))
> +#endif
> +
> +#if defined(L_divhc3)
> +  /* half precision is handled with float precision.
> +     no extra measures are needed to avoid overflow/underflow */
> +
> +  float aa, bb, cc, dd;
> +  float denom, ratio, x, y;
> +  CTYPE res;
> +  aa = a;
> +  bb = b;
> +  cc = c;
> +  dd = d;
> +
> +  /* Scale by max(c,d) to reduce chances of denominator overflowing */
> +  if (FABS (c) < FABS (d))
> +    {
> +      ratio = cc / dd;
> +      denom = (cc * ratio) + dd;
> +      x = ((aa * ratio) + bb) / denom;
> +      y = ((bb * ratio) - aa) / denom;
> +    }
> +  else
> +    {
> +      ratio = dd / cc;
> +      denom = (dd * ratio) + cc;
> +      x = ((bb * ratio) + aa) / denom;
> +      y = (bb - (aa * ratio)) / denom;
> +    }
> +
> +#elif (defined(L_divsc3) && \
> +       (defined(__LIBGCC_HAVE_HWDBL__) && __LIBGCC_HAVE_HWDBL__ == 1))
> +
> +  /* float is handled with double precision,
> +     no extra measures are needed to avoid overflow/underflow */
> +  double aa, bb, cc, dd;
> +  double denom, ratio, x, y;
> +  CTYPE res;
> +
> +  aa = a;
> +  bb = b;
> +  cc = c;
> +  dd = d;
> +  /* Scale by max(c,d) to reduce chances of denominator overflowing */
> +  if (FABS (c) < FABS (d))
> +    {
> +      ratio = cc / dd;
> +      denom = (cc * ratio) + dd;
> +      x = ((aa * ratio) + bb) / denom;
> +      y = ((bb * ratio) - aa) / denom;
> +    }
> +  else
> +    {
> +      ratio = dd / cc;
> +      denom = (dd * ratio) + cc;
> +      x = ((bb * ratio) + aa) / denom;
> +      y = (bb - (aa * ratio)) / denom;
> +    }
> +
> +#else
>    MTYPE denom, ratio, x, y;
>    CTYPE res;
>
> -  /* ??? We can get better behavior from logarithmic scaling instead of
> -     the division.  But that would mean starting to link libgcc against
> -     libm.  We could implement something akin to ldexp/frexp as gcc builtins
> -     fairly easily...  */
> +  /* double, extended, long double have significant potential
> +    underflow/overflow errors than can be greatly reduced with
> +    a limited number of adjustments.
> +    float is handled the same way when no HW double is available
> +  */
> +
> +  /* Scale by max(c,d) to reduce chances of denominator overflowing */
>    if (FABS (c) < FABS (d))
>      {
> +      /* prevent underflow when denominator is near max representable */
> +      if (FABS (d) >= RBIG)
> +       {
> +         a = a * 0.5;
> +         b = b * 0.5;
> +         c = c * 0.5;
> +         d = d * 0.5;
> +       }
> +      /* avoid overflow/underflow issues when c and d are small */
> +      /* scaling up helps avoid some underflows */
> +      /* No new overflow possible since c&d < RMIN2 */
> +      if (FABS (d) < RMIN2)
> +       {
> +         a = a * RMINSCAL;
> +         b = b * RMINSCAL;
> +         c = c * RMINSCAL;
> +         d = d * RMINSCAL;
> +       }
> +      else
> +       {
> +         if(((FABS (a) < RMIN) && (FABS (b) < RMAX2) && (FABS (d) < RMAX2)) 
> ||
> +            ((FABS (b) < RMIN) && (FABS (a) < RMAX2) && (FABS (d) < RMAX2)))
> +           {
> +             a = a * RMINSCAL;
> +             b = b * RMINSCAL;
> +             c = c * RMINSCAL;
> +             d = d * RMINSCAL;
> +           }
> +       }
>        ratio = c / d;
>        denom = (c * ratio) + d;
> -      x = ((a * ratio) + b) / denom;
> -      y = ((b * ratio) - a) / denom;
> +      /* choose alternate order of computation if ratio is subnormal */
> +      if (FABS (ratio) > RMIN)
> +       {
> +         x = ((a * ratio) + b) / denom;
> +         y = ((b * ratio) - a) / denom;
> +       }
> +      else
> +       {
> +         x = ((c * (a / d)) + b) / denom;
> +         y = ((c * (b / d)) - a) / denom;
> +       }
>      }
>    else
>      {
> +      /* prevent underflow when denominator is near max representable */
> +      if (FABS(c) >= RBIG)
> +       {
> +         a = a * 0.5;
> +         b = b * 0.5;
> +         c = c * 0.5;
> +         d = d * 0.5;
> +       }
> +      /* avoid overflow/underflow issues when both c and d are small */
> +      /* scaling up helps avoid some underflows */
> +      /* No new overflow possible since both c&d are less than RMIN2 */
> +      if (FABS(c) < RMIN2)
> +       {
> +         a = a * RMINSCAL;
> +         b = b * RMINSCAL;
> +         c = c * RMINSCAL;
> +         d = d * RMINSCAL;
> +       }
> +      else
> +       {
> +         if(((FABS(a) < RMIN) && (FABS(b) < RMAX2) && (FABS(c) < RMAX2)) ||
> +            ((FABS(b) < RMIN) && (FABS(a) < RMAX2) && (FABS(c) < RMAX2)))
> +           {
> +             a = a * RMINSCAL;
> +             b = b * RMINSCAL;
> +             c = c * RMINSCAL;
> +             d = d * RMINSCAL;
> +           }
> +       }
>        ratio = d / c;
>        denom = (d * ratio) + c;
> -      x = ((b * ratio) + a) / denom;
> -      y = (b - (a * ratio)) / denom;
> +      /* choose alternate order of computation if ratio is subnormal */
> +      if (FABS(ratio) > RMIN)
> +       {
> +         x = ((b * ratio) + a) / denom;
> +         y = (b - (a * ratio)) / denom;
> +       }
> +      else
> +       {
> +         x = (a + (d * (b / c))) / denom;
> +         y = (b - (d * (a / c))) / denom;
> +       }
>      }
> +#endif
>
>    /* Recover infinities and zeros that computed as NaN+iNaN; the only cases
>       are nonzero/zero, infinite/finite, and finite/infinite.  */
> --
> 1.8.3.1
>

Reply via email to