Re: [PATCH] -fftz-math: assume that denorms _must_ be flushed to zero optimizations

Pekka Jääskeläinen Tue, 22 Aug 2017 06:29:47 -0700

Hi Richard and Joseph,

Replies for both inline:

I wrote:
>> Both the inputs and outputs must be flushed to zero in the HSAIL’s
>> ‘ftz’ semantics.
>> FTZ operations were previously always “explicit” in the BRIG FE output, like 
>> you
>> propose here; there were builtin calls injected for all inputs and the
>> output of ‘ftz’-marked
>> float HSAIL instructions. This is still provided as a fallback for
>> targets which do not
>> support a CPU mode flag.

On Mon, Aug 14, 2017 at 1:17 PM, Richard Biener
<richard.guent...@gmail.com> wrote:
> I see.  But how does making them implicit fix cases in the conformance
> testsuite?  That is, isn't the error in the runtime implementation of
> __hsail_ftz_*?  I'd have used a "simple" [...]

There are two parts in the story here:

1) Making the FTZ/DAZ “the default”, meaning no builtin calls or
similar are used to flush
the operands/results, but relying on that the runtime flips on the
FTZ/DAZ CPU flags
before executing this code. This is purely a performance optimization because
those FTZ/DAZ builtin calls (three per HSAIL instruction) ruin the performance
for multiple reasons. We implemented this optimization already in our
staging branch of
the BRIG FE.

2) Ensuring GCC does not perform certain compile-time optimizations with the
assumption that FTZ/DAZ is optional, but make it assume that ftz
should happen for
correctness. The proposed patch addresses this part for the compiler
side by disabling
the currently known optimizations which should be flushed at runtime
when “ftz denorm
math” is desired.

>> The problem with a special FTZ ‘operation’ of some kind in the generic 
>> output is
>> that the basic optimizations get confused by a new operation and we’d need to
>> add knowledge of the ‘FTZ’ operation to a bunch of existing optimizer
>> code, which
>> seems unnecessary to support this case as the optimizations typically apply 
>> also
>> for the ‘FTZ semantics’ when the FTZ/DAZ flag is on.
>
> Apart from the exceptions you needed to guard ... do you have an example of
> a transform that is confused by explicit FTZ and that would be valid if that 
> FTZ
> were implicit?  An explicit FTZ should be much safer.  I think the builtins
> should also be CONST and not only PURE.

Explicit builtin calls ruin many optimizations starting from a simple
common subexpression
elimination if they don’t understand what the builtin returns for any
given operand. Thus,
inlining the builtin function’s code would be needed first and there
would be a lot of code
inlined due to the abundance of ftz calls required and you cannot
eliminate it all (as at
compile time you don’t know if the operand is a denorm or not). Another approach
would be to introduce special cases to the optimizations affected so
they understand
the FTZ builtin and might be able to remove the useless ones. This potentially
touches _a lot_ of code. And in the end, if the CPU could flush
denorms efficiently
using hardware (typically it’s faster to do FTZ in HW than gradual
underflow so this
is likely the case), any builtin call to do it that cannot be
optimized away presents
additional, possibly major, runtime overhead.

We tested if a simple common subexpression elimination case works with
the ftz builtins
and it didn’t. CONST didn’t help here.

However, I understand your concern that there might be optimizations
that still break the
FTZ semantics if there are no explicit builtin calls, but we are
prepared to fix them case by
case if/when they appear. The attached updated patch fixes a few
additional cases we noticed,
e.g. it disables several constant folding cases.

On Mon, Aug 14, 2017 at 2:30 PM, Joseph Myers <jos...@codesourcery.com> wrote:
> Presumably this means that constant folding needs to know about those
> semantics, both for operations with a subnormal floating-point argument
> (whether or not the output is floating point, or floating point in the
> same format), and those with such a result?
> Can assignments copy subnormals without converting them to zero?  Should
> comparisons flush input subnormals to zero before comparing?  Should
> conversions e.g. from float to double convert a float subnormal input to
> zero?

I can answer yes to all of these questions.

BR,
Pekka

From 0b97ccde3ec837329b4c551ccd7f98c074ca7a7b Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= <henry.linjam...@parmance.com>
Date: Mon, 24 Jul 2017 09:28:00 +0300
Subject: [PATCH] Added common -fftz-math flag

With the flag set on the compiler assumes the floating-point
operations must flush received and resulting subnormal floating-point
values to zero.
---
 gcc/common.opt                  |   5 +
 gcc/doc/invoke.texi             |  11 ++
 gcc/fold-const-call.c           |   9 +-
 gcc/fold-const.c                |  22 +++
 gcc/match.pd                    |  14 +-
 gcc/simplify-rtx.c              |  30 +++-
 gcc/testsuite/gcc.dg/ftz-math.c | 330 ++++++++++++++++++++++++++++++++++++++++
 7 files changed, 405 insertions(+), 16 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ftz-math.c

diff --git a/gcc/common.opt b/gcc/common.opt
index 13305558d2d..fd77d00d814 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2266,6 +2266,11 @@ fsingle-precision-constant
 Common Report Var(flag_single_precision_constant) Optimization
 Convert floating point constants to single precision constants.
 
+fftz-math
+Common Report Var(flag_ftz_math) Optimization
+Optimizations handle floating-point operations as they must flush
+subnormal floating-point values to zero.
+
 fsplit-ivs-in-unroller
 Common Report Var(flag_split_ivs_in_unroller) Init(1) Optimization
 Split lifetimes of induction variables when loops are unrolled.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a6ce483d890..c3da6c8ebe3 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -9330,6 +9330,17 @@ The default is @option{-fno-signaling-nans}.
 This option is experimental and does not currently guarantee to
 disable all GCC optimizations that affect signaling NaN behavior.
 
+@item -fftz-math
+@opindex ftz-math
+This option is experimental. With this flag on GCC treats
+floating-point operations (except abs, classify, copysign and
+negation) as they must flush subnormal input operands and results to
+zero (FTZ). The FTZ rules are derived from HSA Programmers Reference
+Manual for the base profile. This alters optimizations that would
+break the rules, for example X * 1 -> X simplification. The option
+assumes the target supports FTZ in hardware and has it enabled -
+either by default or set by the user.
+
 @item -fno-fp-int-builtin-inexact
 @opindex fno-fp-int-builtin-inexact
 Do not allow the built-in functions @code{ceil}, @code{floor},
diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c
index 381cb7fd290..21715f090da 100644
--- a/gcc/fold-const-call.c
+++ b/gcc/fold-const-call.c
@@ -1049,7 +1049,8 @@ fold_const_call_1 (combined_fn fn, tree type, tree arg)
   if (real_cst_p (arg))
     {
       gcc_checking_assert (SCALAR_FLOAT_MODE_P (arg_mode));
-      if (mode == arg_mode)
+      /* For -fftz-math subnormals are not folded correctly.  */
+      if (mode == arg_mode && !flag_ftz_math)
 	{
 	  /* real -> real.  */
 	  REAL_VALUE_TYPE result;
@@ -1299,7 +1300,8 @@ fold_const_call_1 (combined_fn fn, tree type, tree arg0, tree arg1)
       && real_cst_p (arg1))
     {
       gcc_checking_assert (SCALAR_FLOAT_MODE_P (arg0_mode));
-      if (mode == arg0_mode)
+      /* For -fftz-math subnormals are not folded correctly.  */
+      if (mode == arg0_mode && !flag_ftz_math)
 	{
 	  /* real, real -> real.  */
 	  REAL_VALUE_TYPE result;
@@ -1494,7 +1496,8 @@ fold_const_call_1 (combined_fn fn, tree type, tree arg0, tree arg1, tree arg2)
       && real_cst_p (arg2))
     {
       gcc_checking_assert (SCALAR_FLOAT_MODE_P (arg0_mode));
-      if (mode == arg0_mode)
+      /* For -fftz-math subnormals are not folded correctly.  */
+      if (mode == arg0_mode && !flag_ftz_math)
 	{
 	  /* real, real, real -> real.  */
 	  REAL_VALUE_TYPE result;
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index f6d5af43b33..1b19bc93248 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -1152,6 +1152,11 @@ const_binop (enum tree_code code, tree arg1, tree arg2)
       bool inexact;
       tree t, type;
 
+      /* For ftz-math disable all floating point constant folding for
+	 now.  */
+      if (flag_ftz_math)
+	return NULL_TREE;
+
       /* The following codes are handled by real_arithmetic.  */
       switch (code)
 	{
@@ -2000,6 +2005,10 @@ fold_convert_const_real_from_real (tree type, const_tree arg1)
       && REAL_VALUE_ISSIGNALING_NAN (TREE_REAL_CST (arg1)))
     return NULL_TREE; 
 
+  /* For ftz-math constant folding is disabled for now.  */
+  if (flag_ftz_math)
+    return NULL_TREE;
+
   real_convert (&value, TYPE_MODE (type), &TREE_REAL_CST (arg1));
   t = build_real (type, value);
 
@@ -6479,6 +6488,10 @@ fold_real_zero_addition_p (const_tree type, const_tree addend, int negate)
   if (!real_zerop (addend))
     return false;
 
+  /* X +/- 0 flushes subnormals to zero but plain X does not.  */
+  if (flag_ftz_math)
+    return false;
+
   /* Don't allow the fold with -fsignaling-nans.  */
   if (HONOR_SNANS (element_mode (type)))
     return false;
@@ -9117,6 +9130,11 @@ fold_binary_loc (location_t loc,
   arg0 = op0;
   arg1 = op1;
 
+  /* For ftz-math disable all floating point constant folding for
+     now.  */
+  if (flag_ftz_math && FLOAT_TYPE_P (type))
+    return NULL_TREE;
+
   /* Strip any conversions that don't change the mode.  This is
      safe for every expression, except for a comparison expression
      because its signedness is derived from its operands.  So, in
@@ -13831,6 +13849,10 @@ fold_relational_const (enum tree_code code, tree type, tree op0, tree op1)
 
   if (TREE_CODE (op0) == REAL_CST && TREE_CODE (op1) == REAL_CST)
     {
+      /* For ftz-math disable all constant folding for now.  */
+      if (flag_ftz_math)
+	return NULL_TREE;
+
       const REAL_VALUE_TYPE *c0 = TREE_REAL_CST_PTR (op0);
       const REAL_VALUE_TYPE *c1 = TREE_REAL_CST_PTR (op1);
 
diff --git a/gcc/match.pd b/gcc/match.pd
index 80a17ba3d23..c4e8eefe0c1 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -129,6 +129,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (simplify
  (mult @0 real_onep)
  (if (!HONOR_SNANS (type)
+      && !flag_ftz_math
       && (!HONOR_SIGNED_ZEROS (type)
           || !COMPLEX_FLOAT_TYPE_P (type)))
   (non_lvalue @0)))
@@ -137,6 +138,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (simplify
  (mult @0 real_minus_onep)
   (if (!HONOR_SNANS (type)
+       && !flag_ftz_math
        && (!HONOR_SIGNED_ZEROS (type)
            || !COMPLEX_FLOAT_TYPE_P (type)))
    (negate @0)))
@@ -240,13 +242,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 /* In IEEE floating point, x/1 is not equivalent to x for snans.  */
 (simplify
  (rdiv @0 real_onep)
- (if (!HONOR_SNANS (type))
+ (if (!HONOR_SNANS (type) && !flag_ftz_math)
   (non_lvalue @0)))
 
 /* In IEEE floating point, x/-1 is not equivalent to -x for snans.  */
 (simplify
  (rdiv @0 real_minus_onep)
- (if (!HONOR_SNANS (type))
+ (if (!HONOR_SNANS (type) && !flag_ftz_math)
   (negate @0)))
 
 (if (flag_reciprocal_math)
@@ -1394,7 +1396,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (for minmax (min max FMIN FMAX)
  (simplify
   (minmax @0 @0)
-  @0))
+   (if (FLOAT_TYPE_P (type) && !flag_ftz_math)
+    @0)))
 /* min(max(x,y),y) -> y.  */
 (simplify
  (min:c (max:c @0 @1) @1)
@@ -1853,7 +1856,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 	  || (GENERIC
 	      && TYPE_MAIN_VARIANT (type) == TYPE_MAIN_VARIANT (inside_type)))
 	 && (((inter_int || inter_ptr) && final_int)
-	     || (inter_float && final_float))
+	     || (inter_float && final_float && !flag_ftz_math))
 	 && inter_prec >= final_prec)
      (ocvt @0))
 
@@ -1862,7 +1865,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
        former is wider than the latter and doesn't change the signedness
        (for integers).  Avoid this if the final type is a pointer since
        then we sometimes need the middle conversion.  */
-    (if (((inter_int && inside_int) || (inter_float && inside_float))
+    (if (((inter_int && inside_int) || (inter_float && inside_float
+					&& !flag_ftz_math))
 	 && (final_int || final_float)
 	 && inter_prec >= inside_prec
 	 && (inter_float || inter_unsignedp == inside_unsignedp))
diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index 7cab26a0e34..3c904cdefd6 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -1240,8 +1240,12 @@ simplify_unary_operation_1 (enum rtx_code code, machine_mode mode, rtx op)
       if (DECIMAL_FLOAT_MODE_P (mode))
 	break;
 
-      /* (float_truncate:SF (float_extend:DF foo:SF)) = foo:SF.  */
-      if (GET_CODE (op) == FLOAT_EXTEND
+      /* (float_truncate:SF (float_extend:DF foo:SF)) = foo:SF except
+	 for -fftz-math with subnormal input. Simplifications like
+	 this must be prevented as they no longer perform
+	 flush-to-zero as required by the semantics of -fftz-math
+	 flag.  */
+      if (!flag_ftz_math && GET_CODE (op) == FLOAT_EXTEND
 	  && GET_MODE (XEXP (op, 0)) == mode)
 	return XEXP (op, 0);
 
@@ -1891,14 +1895,16 @@ simplify_const_unary_operation (enum rtx_code code, machine_mode mode,
 	case FLOAT_TRUNCATE:
 	  /* Don't perform the operation if flag_signaling_nans is on
 	     and the operand is a signaling NaN.  */
-	  if (HONOR_SNANS (mode) && REAL_VALUE_ISSIGNALING_NAN (d))
+	  if ((HONOR_SNANS (mode) && REAL_VALUE_ISSIGNALING_NAN (d))
+	      || flag_ftz_math)
 	    return NULL_RTX;
 	  d = real_value_truncate (mode, d);
 	  break;
 	case FLOAT_EXTEND:
 	  /* Don't perform the operation if flag_signaling_nans is on
 	     and the operand is a signaling NaN.  */
-	  if (HONOR_SNANS (mode) && REAL_VALUE_ISSIGNALING_NAN (d))
+	  if ((HONOR_SNANS (mode) && REAL_VALUE_ISSIGNALING_NAN (d))
+	       || flag_ftz_math)
 	    return NULL_RTX;
 	  /* All this does is change the mode, unless changing
 	     mode class.  */
@@ -2137,7 +2143,8 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
 	 when x is NaN, infinite, or finite and nonzero.  They aren't
 	 when x is -0 and the rounding mode is not towards -infinity,
 	 since (-0) + 0 is then 0.  */
-      if (!HONOR_SIGNED_ZEROS (mode) && trueop1 == CONST0_RTX (mode))
+      if (!HONOR_SIGNED_ZEROS (mode) && !flag_ftz_math
+	  && trueop1 == CONST0_RTX (mode))
 	return op0;
 
       /* ((-a) + b) -> (b - a) and similarly for (a + (-b)).  These
@@ -2342,8 +2349,9 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
       /* Subtracting 0 has no effect unless the mode has signed zeros
 	 and supports rounding towards -infinity.  In such a case,
 	 0 - 0 is -0.  */
-      if (!(HONOR_SIGNED_ZEROS (mode)
-	    && HONOR_SIGN_DEPENDENT_ROUNDING (mode))
+      if (!((HONOR_SIGNED_ZEROS (mode)
+	     && HONOR_SIGN_DEPENDENT_ROUNDING (mode))
+	    || flag_ftz_math)
 	  && trueop1 == CONST0_RTX (mode))
 	return op0;
 
@@ -2558,6 +2566,7 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode,
       /* In IEEE floating point, x*1 is not equivalent to x for
 	 signalling NaNs.  */
       if (!HONOR_SNANS (mode)
+	  && (FLOAT_MODE_P (mode) && !flag_ftz_math)
 	  && trueop1 == CONST1_RTX (mode))
 	return op0;
 
@@ -4001,6 +4010,10 @@ simplify_const_binary_operation (enum rtx_code code, machine_mode mode,
 	  const REAL_VALUE_TYPE *opr0, *opr1;
 	  bool inexact;
 
+	  /* Subnormals are not handled correctly with -fftz-math.  */
+	  if (flag_ftz_math)
+	    return 0;
+
 	  opr0 = CONST_DOUBLE_REAL_VALUE (op0);
 	  opr1 = CONST_DOUBLE_REAL_VALUE (op1);
 
@@ -5083,7 +5096,8 @@ simplify_const_relational_operation (enum rtx_code code,
 
   /* If the operands are floating-point constants, see if we can fold
      the result.  */
-  if (CONST_DOUBLE_AS_FLOAT_P (trueop0)
+  if (!flag_ftz_math
+      && CONST_DOUBLE_AS_FLOAT_P (trueop0)
       && CONST_DOUBLE_AS_FLOAT_P (trueop1)
       && SCALAR_FLOAT_MODE_P (GET_MODE (trueop0)))
     {
diff --git a/gcc/testsuite/gcc.dg/ftz-math.c b/gcc/testsuite/gcc.dg/ftz-math.c
new file mode 100644
index 00000000000..f782515a044
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ftz-math.c
@@ -0,0 +1,330 @@
+/* Tests -fftz-math flag */
+/* { dg-do run { target x86_64-*-* } } */
+/* { dg-options "-O2 -fftz-math" } */
+
+#include <math.h>
+
+/* #define DEBUG_TEST */
+#ifdef DEBUG_TEST
+#  include <stdio.h>
+#endif
+
+#include "xmmintrin.h"
+#include "pmmintrin.h"
+
+union uf
+{
+  unsigned int u;
+  float f;
+};
+
+union ud
+{
+  unsigned long long u;
+  double d;
+};
+
+static unsigned int
+f2u (float v)
+{
+  union uf u;
+  u.f = v;
+  return u.u;
+}
+
+static unsigned long long
+d2u (double v)
+{
+  union ud u;
+  u.d = v;
+  return u.u;
+}
+
+
+static void
+enable_ftz_mode ()
+{
+  _MM_SET_FLUSH_ZERO_MODE (_MM_FLUSH_ZERO_ON);
+  _MM_SET_DENORMALS_ZERO_MODE (_MM_DENORMALS_ZERO_ON);
+}
+
+static int
+test_sf_is_zero (float x)
+{
+  /* FTZ mode is on, must do bitwise ops for zero test. */
+  return ((f2u (x) & 0x7fffffffu) == 0u);
+}
+
+static int
+test_df_is_zero (double x)
+{
+  /* FTZ mode is on, must do bitwise ops for zero test. */
+  return ((d2u (x) & 0x7fffffffffffffffull) == 0ull);
+}
+
+static int
+test_sf_is_subnormal (float x) {
+  unsigned int u = f2u (x);
+  if (u & 0x7f800000u)
+    return 0;
+  return (u & 0x007fffffu);
+}
+
+static int
+test_df_is_subnormal (double x) {
+  unsigned long long u = d2u (x);
+  if (u & 0x7ff0000000000000ull)
+    return 0;
+  return (u & 0x000fffffffffffffull);
+}
+
+#ifdef DEBUG_TEST
+void err_print (unsigned line, const char* expr)
+{
+  printf ("Line %d: FAIL: %s\n", line, expr);
+  abort ();
+}
+#  define TEST_SF_IS_ZERO(expr) \
+  if (!test_sf_is_zero (expr)) err_print (__LINE__, #expr)
+#  define TEST_SF_IS_SUBNORMAL(expr) \
+  if (!test_sf_is_subnormal (expr)) err_print (__LINE__, #expr)
+#  define TEST_DF_IS_ZERO(expr) \
+  if (!test_df_is_zero (expr)) err_print (__LINE__, #expr)
+#  define TEST_DF_IS_SUBNORMAL(expr) \
+  if (!test_df_is_subnormal (expr)) err_print (__LINE__, #expr)
+#  define TEST_TRUE(expr) if (!(expr)) err_print (__LINE__, #expr)
+#else
+#  define TEST_SF_IS_ZERO(expr) if (!test_sf_is_zero (expr)) abort ()
+#  define TEST_SF_IS_SUBNORMAL(expr) if (!test_sf_is_subnormal (expr)) abort ()
+#  define TEST_DF_IS_ZERO(expr) if (!test_df_is_zero (expr)) abort ()
+#  define TEST_DF_IS_SUBNORMAL(expr) if (!test_df_is_subnormal (expr)) abort ()
+#  define TEST_TRUE(expr) if (!(expr)) abort ()
+#endif
+
+volatile float sf;
+volatile double df;
+
+int
+main ()
+{
+  enable_ftz_mode ();
+
+  /* Circulate through volatile to avoid constant folding. */
+  sf = 2.87E-42f; /* = subnormal */
+  float x = sf;
+
+  TEST_SF_IS_SUBNORMAL (x); /* Store/load should not flush. */
+  TEST_TRUE (!isnormal (x));
+  TEST_TRUE (fpclassify (x) == FP_SUBNORMAL);
+
+  TEST_DF_IS_ZERO ((double) x);
+
+  /* Test the expression is not simplified to plain x, thus, leaking the
+     subnormal. */
+  TEST_SF_IS_ZERO (x * 1);
+  TEST_SF_IS_ZERO (x * -1);
+  TEST_SF_IS_ZERO (x * 0.5);
+
+  TEST_SF_IS_ZERO (x / 1);
+  TEST_SF_IS_ZERO (x / -1);
+  TEST_SF_IS_ZERO (x / 2);
+
+  TEST_SF_IS_ZERO (fminf (x, x));
+  TEST_SF_IS_ZERO (fminf (x, -x));
+  TEST_SF_IS_ZERO (fmaxf (x, x));
+  TEST_SF_IS_ZERO (fmaxf (x, -x));
+
+  TEST_SF_IS_ZERO (x + 0);
+  TEST_SF_IS_ZERO (0 - x);
+
+  TEST_SF_IS_ZERO (x - 0);
+  TEST_SF_IS_ZERO (x + x);
+
+  TEST_SF_IS_ZERO (x + 0.0f);
+  TEST_SF_IS_ZERO (0.0f - x);
+  TEST_SF_IS_ZERO (x - 0.0f);
+  TEST_SF_IS_ZERO (x + x);
+
+  TEST_SF_IS_ZERO (x * copysignf (1.0f, x));
+  TEST_SF_IS_ZERO (x * copysignf (1.0f, -x));
+
+  float y = sf;
+  TEST_SF_IS_ZERO (fminf (fmaxf (x, y), y));
+
+  TEST_SF_IS_SUBNORMAL (x == y ? x : y);
+  TEST_SF_IS_SUBNORMAL (x != y ? x : y);
+  TEST_SF_IS_SUBNORMAL (x >= y ? x : y);
+  TEST_SF_IS_SUBNORMAL (x > y ? x : y);
+  TEST_SF_IS_SUBNORMAL (x <= y ? x : y);
+  TEST_SF_IS_SUBNORMAL (x < y ? x : y);
+
+  /* FP ops that should not flush. */
+  TEST_SF_IS_SUBNORMAL (fabsf (x));
+  TEST_SF_IS_SUBNORMAL (x < 0 ? -x : x);
+  TEST_SF_IS_SUBNORMAL (-x);
+  TEST_SF_IS_SUBNORMAL (copysignf (x, -1.0));
+
+  /* Test constant folding with subnormal values. */
+  TEST_TRUE (!isnormal (2.87E-42f));
+
+  TEST_SF_IS_SUBNORMAL (-(2.87E-42f));
+  TEST_SF_IS_SUBNORMAL (fabsf (-2.87E-42f));
+  TEST_SF_IS_SUBNORMAL (copysignf (2.87E-42f, -1.0));
+
+  TEST_SF_IS_ZERO (fminf (2.87E-42f, 2.87E-42f));
+  TEST_SF_IS_ZERO (fminf (2.87E-42f, -5.74E-42f));
+  TEST_SF_IS_ZERO (fmaxf (2.87E-42f, 2.87E-42f));
+  TEST_SF_IS_ZERO (fmaxf (2.87E-42f, -5.74E-42f));
+
+  TEST_SF_IS_ZERO (floorf (-2.87E-42f));
+  TEST_SF_IS_ZERO (ceilf (2.87E-42f));
+
+  TEST_SF_IS_ZERO (sqrtf (2.82E-42f));
+
+  TEST_SF_IS_ZERO (2.87E-42f + 0.0f);
+  TEST_SF_IS_ZERO (2.87E-42f + 5.74E-42f);
+  TEST_SF_IS_ZERO (2.87E-42f - 0.0f);
+  TEST_SF_IS_ZERO (0.0f - 2.87E-42f);
+  TEST_SF_IS_ZERO (2.87E-42f * 1.0f);
+  TEST_SF_IS_ZERO (2.87E-42f * -1.0f);
+  TEST_SF_IS_ZERO (2.87E-42f * 12.3f);
+  TEST_SF_IS_ZERO (2.87E-42f / 1.0f);
+  TEST_SF_IS_ZERO (2.87E-42f / 12.3f);
+
+  TEST_TRUE (2.87E-42f == -5.74E-42f);
+  TEST_TRUE (2.87E-42f == -5.74E-42f ? 1 : 0);
+
+  TEST_TRUE (2.87E-42f == -5.74E-42f ? 1 : 0);
+  TEST_TRUE (2.87E-42f != -5.74E-42f ? 0 : 1);
+  TEST_TRUE (2.87E-42f >= -5.74E-42f ? 1 : 0);
+  TEST_TRUE (2.87E-42f >= 5.74E-42f ? 1 : 0);
+  TEST_TRUE (2.87E-42f > -5.74E-42f ? 0 : 1);
+  TEST_TRUE (2.87E-42f > 5.74E-42f ? 0 : 1);
+  TEST_TRUE (2.87E-42f <= -5.74E-42f ? 1 : 0);
+  TEST_TRUE (2.87E-42f <= 5.74E-42f ? 1 : 0);
+  TEST_TRUE (2.87E-42f < -5.74E-42f ? 0 : 1);
+  TEST_TRUE (2.87E-42f < 5.74E-42f ? 0 : 1);
+
+  /*  A < B ? A : B -> min (B, A)  must not happen (min flushes to zero).*/
+  TEST_SF_IS_SUBNORMAL (2.87E-42f < -5.74E-42f ? 2.87E-42f : -5.74E-42f);
+
+  /* Normal and subnormal input. */
+  TEST_TRUE ((2.87E-42f + 1.1754944E-38f) == 1.1754944E-38f);
+  TEST_TRUE ((1.1754944E-38f - 2.87E-42f) == 1.1754944E-38f);
+
+  /* Expression with normal numbers. Result of the Substraction is
+     subnormal. */
+  float sf_tmp = (1.469368E-38f - 1.1754944E-38f) + 1.1754944E-38f;
+  TEST_TRUE (sf_tmp == 1.1754944E-38f);
+
+
+  /*** Test with double precision. ***/
+  df = 5.06E-321;
+  double dx = df;
+
+  TEST_DF_IS_SUBNORMAL (dx);
+  TEST_TRUE (!isnormal (dx));
+  TEST_TRUE (fpclassify (dx) == FP_SUBNORMAL);
+
+  TEST_SF_IS_ZERO ((float)dx);
+
+  TEST_DF_IS_ZERO (dx * 1);
+  TEST_DF_IS_ZERO (dx * -1);
+  TEST_DF_IS_ZERO (dx * 0.5);
+
+  TEST_DF_IS_ZERO (dx / 1);
+  TEST_DF_IS_ZERO (dx / -1);
+  TEST_DF_IS_ZERO (dx / 2);
+
+  TEST_DF_IS_ZERO (fmin (dx, dx));
+  TEST_DF_IS_ZERO (fmin (dx, -dx));
+  TEST_DF_IS_ZERO (fmax (dx, dx));
+  TEST_DF_IS_ZERO (fmax (dx, -dx));
+
+  TEST_DF_IS_ZERO (dx + 0);
+  TEST_DF_IS_ZERO (0 - dx);
+  TEST_DF_IS_ZERO (dx - 0);
+  TEST_DF_IS_ZERO (dx + dx);
+
+  TEST_DF_IS_ZERO (dx + 0.0);
+  TEST_DF_IS_ZERO (0.0 - dx);
+  TEST_DF_IS_ZERO (dx - 0.0);
+
+  TEST_DF_IS_ZERO (dx * copysign (1.0, dx));
+  TEST_DF_IS_ZERO (dx * copysign (1.0, -dx));
+
+  df = -1.61895E-319;
+  double dy = df;
+  TEST_SF_IS_ZERO (fmin (fmax (dx, dy), dy));
+
+  TEST_DF_IS_SUBNORMAL (dx == dy ? dx : dy);
+  TEST_DF_IS_SUBNORMAL (dx != dy ? dx : dy);
+  TEST_DF_IS_SUBNORMAL (dx >= dy ? dx : dy);
+  TEST_DF_IS_SUBNORMAL (dx > dy ? dx : dy);
+  TEST_DF_IS_SUBNORMAL (dx <= dy ? dx : dy);
+  TEST_DF_IS_SUBNORMAL (dx < dy ? dx : dy);
+
+  /* FP ops that should not flush. */
+  TEST_DF_IS_SUBNORMAL (fabs (dx));
+  TEST_DF_IS_SUBNORMAL (dx < 0 ? -dx : dx);
+  TEST_DF_IS_SUBNORMAL (-dx);
+  TEST_DF_IS_SUBNORMAL (copysign (dx, -1.0));
+
+  /* Test constant folding with subnormal values. */
+
+  TEST_TRUE (!isnormal (5.06E-321));
+  TEST_TRUE (fpclassify (5.06E-321) == FP_SUBNORMAL);
+
+  TEST_DF_IS_SUBNORMAL (-(5.06E-321));
+  TEST_DF_IS_SUBNORMAL (fabs (-5.06E-321));
+  TEST_DF_IS_SUBNORMAL (copysign (5.06E-321, -1.0));
+
+  TEST_DF_IS_ZERO (fmin (5.06E-321, 5.06E-321));
+  TEST_DF_IS_ZERO (fmin (5.06E-321, -1.61895E-319));
+  TEST_DF_IS_ZERO (fmax (5.06E-321, 5.06E-321));
+  TEST_DF_IS_ZERO (fmax (5.06E-321, -1.61895E-319));
+
+  TEST_DF_IS_ZERO (floor (-5.06E-321));
+  TEST_DF_IS_ZERO (ceil (5.06E-321));
+  TEST_DF_IS_ZERO (sqrt (2.82E-42f));
+
+  TEST_DF_IS_ZERO (5.06E-321 + 0.0);
+  TEST_DF_IS_ZERO (5.06E-321 + 1.61895E-319);
+  TEST_DF_IS_ZERO (5.06E-321 - 0.0);
+  TEST_DF_IS_ZERO (0.0 - 5.06E-321);
+  TEST_DF_IS_ZERO (5.06E-321 * 1.0);
+  TEST_DF_IS_ZERO (5.06E-321 * -1.0);
+  TEST_DF_IS_ZERO (5.06E-321 * 12.3);
+  TEST_DF_IS_ZERO (5.06E-321 / 1.0);
+  TEST_DF_IS_ZERO (5.06E-321 / 12.3);
+
+  TEST_TRUE (5.06E-321 == -1.61895E-319);
+
+  TEST_TRUE (5.06E-321 == -1.61895E-319 ? 1 : 0);
+  TEST_TRUE (5.06E-321 == -1.61895E-319 ? 1 : 0);
+  TEST_TRUE (5.06E-321 != -1.61895E-319 ? 0 : 1);
+  TEST_TRUE (5.06E-321 >= -1.61895E-319 ? 1 : 0);
+  TEST_TRUE (5.06E-321 >= 1.61895E-319 ? 1 : 0);
+  TEST_TRUE (5.06E-321 > -1.61895E-319 ? 0 : 1);
+  TEST_TRUE (5.06E-321 > 1.61895E-319 ? 0 : 1);
+  TEST_TRUE (5.06E-321 <= -1.61895E-319 ? 1 : 0);
+  TEST_TRUE (5.06E-321 <= 1.61895E-319 ? 1 : 0);
+  TEST_TRUE (5.06E-321 < -1.61895E-319 ? 0 : 1);
+  TEST_TRUE (5.06E-321 < 1.61895E-319 ? 0 : 1);
+
+  /*  A < B ? A : B -> min (B, A)  must not happen (min flushes to zero).*/
+  TEST_DF_IS_SUBNORMAL (5.06E-321 < -1.61895E-319 ? 5.06E-321 : -1.61895E-319);
+
+  /* Normal and subnormal input. */
+  TEST_TRUE ((5.06E-321 + 3.33E-308) == 3.33E-308);
+  TEST_TRUE ((3.33E-308 - 5.06E-321) == 3.33E-308);
+
+  /* Expression with normal numbers. Result of the Substraction is
+     subnormal. */
+  double df_a = 3.33E-308;
+  double df_b = 2.78E-308;
+  double df_tmp = (df_a - df_b) + df_b;
+  TEST_TRUE (df_tmp == df_b);
+
+  return 0;
+}

Re: [PATCH] -fftz-math: assume that denorms _must_ be flushed to zero optimizations

Reply via email to