Re: [PATCH] Practical Improvement to libgcc Complex Divide

Patrick McGehearty via Gcc-patches Tue, 17 Nov 2020 08:59:34 -0800

Joseph, thank you for your detailed review and comments.


I will get to work on the necessary revisions as well
as find for a suitable place for sharing my random number
generating tests.

- patrick

On 11/16/2020 8:34 PM, Joseph Myers wrote:

On Tue, 8 Sep 2020, Patrick McGehearty via Gcc-patches wrote:

This project started with an investigation related to
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=59714.  Study of Beebe[1]
provided an overview of past and recent practice for computing complex
divide. The current glibc implementation is based on Robert Smith's
algorithm [2] from 1962.  A google search found the paper by Baudin
and Smith [3] (same Robert Smith) published in 2012. Elen Kalda's
proposed patch [4] is based on that paper.

Thanks, I've now read Baudin and Smith so can review the patch properly.
I'm fine with the overall algorithm, so my comments generally relate to
how the code should best be integrated into libgcc while keeping it
properly machine-mode-generic as far as possible.

I developed two sets of test set by randomly distributing values over
a restricted range and the full range of input values. The current

Are these tests available somewhere?

Support for half, float, double, extended, and long double precision
is included as all are handled with suitable preprocessor symbols in a
single source routine. Since half precision is computed with float
precision as per current libgcc practice, the enhanced algorithm
provides no benefit for half precision and would cost performance.
Therefore half precision is left unchanged.

The existing constants for each precision:
float: FLT_MAX, FLT_MIN;
double: DBL_MAX, DBL_MIN;
extended and/or long double: LDBL_MAX, LDBL_MIN
are used for avoiding the more common overflow/underflow cases.

In general, libgcc code works with modes, not types; hardcoding references
to a particular mapping between modes and types is problematic.  Rather,
the existing code in c-cppbuiltin.c that has a loop over modes should be
extended to provide whatever information is needed, as macros defined for
each machine mode.

   /* For libgcc-internal use only.  */
   if (flag_building_libgcc)
     {
       /* Properties of floating-point modes for libgcc2.c.  */
       opt_scalar_float_mode mode_iter;
       FOR_EACH_MODE_IN_CLASS (mode_iter, MODE_FLOAT)
         {
[...]

For example, that defines macros such as __LIBGCC_DF_FUNC_EXT__ and
__LIBGCC_DF_MANT_DIG__.  The _FUNC_EXT__ definition involves that code
computing a mapping to types.

I'd suggest defining additional macros such as __LIBGCC_DF_MAX__ in the
same code - for each supported floating-point mode.  They can be defined
to __FLT_MAX__ etc. (the predefined macros rather than the ones in
float.h) - the existing code that computes a suffix for functions can be
adjusted so it also computes the string such as "FLT", "DBL", "LDBL",
"FLT128" etc.

(I suggest defining to __FLT_MAX__ rather than to the expansion of
__FLT_MAX__ because that avoids any tricky interactions with the logic to
compute such expansions lazily.  I suggest __FLT_MAX__ rather than the
FLT_MAX name from float.h because that way you avoid any need to define
feature test macros to access names such as FLT128_MAX.)

diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 74ecca8..02c06d8 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -1343,6 +1343,11 @@ c_cpp_builtins (cpp_reader *pfile)
        builtin_define_with_value ("__LIBGCC_INIT_SECTION_ASM_OP__",
                                 INIT_SECTION_ASM_OP, 1);
  #endif
+      /* For libgcc float/double optimization */
+#ifdef HAVE_adddf3
+      builtin_define_with_int_value ("__LIBGCC_HAVE_HWDBL__",
+                                HAVE_adddf3);
+#endif

This is another thing to handle more generically - possibly with something
like the mode_has_fma function, and defining a macro for each mode, named
after the mode, rather than only for DFmode.  For an alternative, see the
discussion below.

diff --git a/libgcc/ChangeLog b/libgcc/ChangeLog
index ccfd6f6..8bd66c5 100644
--- a/libgcc/ChangeLog
+++ b/libgcc/ChangeLog
@@ -1,3 +1,10 @@
+2020-08-27  Patrick McGehearty <patrick.mcgehea...@oracle.com>
+
+       * libgcc2.c (__divsc3, __divdc3, __divxc3, __divtc3): Enhance
+       accuracy of complex divide by avoiding underflow/overflow when
+       ratio underflows or when arguments have very large or very
+       small exponents.

Note that diffs to ChangeLog files should not now be included in patches;
the ChangeLog content needs to be included in the proposed commit message
instead for automatic ChangeLog generation.

+#if defined(L_divsc3)
+#define RBIG   ((FLT_MAX)/2.0)
+#define RMIN   (FLT_MIN)
+#define RMIN2  (0x1.0p-21)
+#define RMINSCAL (0x1.0p+19)
+#define RMAX2  ((RBIG)*(RMIN2))
+#endif

I'd expect all of these to use generic macros based on the mode.  For the
division by 2.0, probably also divide by integer 2 not 2.0 to avoid
unwanted conversions to/from double.

+#if defined(L_divdc3)
+#define RBIG   ((DBL_MAX)/2.0)
+#define RMIN   (DBL_MIN)
+#define RMIN2  (0x1.0p-53)
+#define RMINSCAL (0x1.0p+51)

A plausible definition of RMIN2 might be based on the EPSILON macro for
the mode (with RMINSCAL then being defined in terms of RMIN2).  But you're
defining RMIN2 for SCmode to a value equal to 4 *
FLT_EPSILON, while for DCmode you're using DBL_EPSILON / 2.  Why the
difference?

+#if defined(L_divhc3)
+  /* half precision is handled with float precision.
+     no extra measures are needed to avoid overflow/underflow */

Note for comment style that each sentence should start with an uppercase
letter (unless that would be incorrect, e.g. for "float" meaning the C
type) and comments should end with ".  " (two spaces after '.').

You're duplicating essentially the same code for HCmode, and for SCmode
when hardware DFmode is available.  I think that's a bad idea.  Rather, in
one place you should conditionally define a macro for "wider-range
floating-point type to use for computations in these functions" (the
relevant thing being that it has at least twice (plus a bit) the exponent
range, rather than extra precision).  This would use types such as SFtype
or DFtype, not float or double, as elsewhere in libgcc.  And if that macro
is defined, then use this simple implementation instead of the more
complicate one.

That leaves the question of how to define the macro.  The simple approach
is certainly to define it to SFtype in the HCmode case (if there is
hardware SFmode) and to DFtype in the SCmode case (if there is hardware
DFmode).  A more complicated approach involves using various macros to
examine properties of various possible wider-range modes to attempt to
identity one that could be used (probably new macros from c-cppbuiltin.c
rather than following the existing code elsewhere in libgcc2.c using
AVOID_FP_TYPE_CONVERSION, given the incomplete coverage of definitions of
WIDEST_HARDWARE_FP_SIZE).  But hardcoding two cases would reduce the risk
of unexpected results for now.

In any case, note libgcc/config/rs6000/_divkc3.c should have similar
changes to those in libgcc2.c (to implement the more complicated
algorithm, as no wider mode is available there).

+  /* Scale by max(c,d) to reduce chances of denominator overflowing */
+  if (FABS (c) < FABS (d))

The comment seems questionable in the code using the simpler algorithm.
And do you need this "if" and two cases at all there?

+      /* prevent underflow when denominator is near max representable */
+      if (FABS (d) >= RBIG)
+       {
+         a = a * 0.5;
+         b = b * 0.5;
+         c = c * 0.5;
+         d = d * 0.5;

This sort of thing might best use "/ 2" (integer 2) to avoid unwanted use
of double.

+       {
+         if(((FABS (a) < RMIN) && (FABS (b) < RMAX2) && (FABS (d) < RMAX2)) ||
+            ((FABS (b) < RMIN) && (FABS (a) < RMAX2) && (FABS (d) < RMAX2)))

GNU style breaks lines before operators such as "||", not after.  Missing
space after "if".

I think the patch should add some testcases to the GCC testsuite that
verify expected results for particular inputs to within +/- a few ulp (and
that are checked to fail before and pass after the patch), to illustrate
the sort of cases the patch improves.  Those will need to take the inputs
from volatile variables to ensure the evaluation is at runtime, and it may
be easiest to write such tests only for the cases of _Complex float and
_Complex double to avoid complications making them portable when testing
other types.

Re: [PATCH] Practical Improvement to libgcc Complex Divide

Reply via email to