Hi Richard and Joseph, Replies for both inline:
I wrote: >> Both the inputs and outputs must be flushed to zero in the HSAIL’s >> ‘ftz’ semantics. >> FTZ operations were previously always “explicit” in the BRIG FE output, like >> you >> propose here; there were builtin calls injected for all inputs and the >> output of ‘ftz’-marked >> float HSAIL instructions. This is still provided as a fallback for >> targets which do not >> support a CPU mode flag. On Mon, Aug 14, 2017 at 1:17 PM, Richard Biener <richard.guent...@gmail.com> wrote: > I see. But how does making them implicit fix cases in the conformance > testsuite? That is, isn't the error in the runtime implementation of > __hsail_ftz_*? I'd have used a "simple" [...] There are two parts in the story here: 1) Making the FTZ/DAZ “the default”, meaning no builtin calls or similar are used to flush the operands/results, but relying on that the runtime flips on the FTZ/DAZ CPU flags before executing this code. This is purely a performance optimization because those FTZ/DAZ builtin calls (three per HSAIL instruction) ruin the performance for multiple reasons. We implemented this optimization already in our staging branch of the BRIG FE. 2) Ensuring GCC does not perform certain compile-time optimizations with the assumption that FTZ/DAZ is optional, but make it assume that ftz should happen for correctness. The proposed patch addresses this part for the compiler side by disabling the currently known optimizations which should be flushed at runtime when “ftz denorm math” is desired. >> The problem with a special FTZ ‘operation’ of some kind in the generic >> output is >> that the basic optimizations get confused by a new operation and we’d need to >> add knowledge of the ‘FTZ’ operation to a bunch of existing optimizer >> code, which >> seems unnecessary to support this case as the optimizations typically apply >> also >> for the ‘FTZ semantics’ when the FTZ/DAZ flag is on. > > Apart from the exceptions you needed to guard ... do you have an example of > a transform that is confused by explicit FTZ and that would be valid if that > FTZ > were implicit? An explicit FTZ should be much safer. I think the builtins > should also be CONST and not only PURE. Explicit builtin calls ruin many optimizations starting from a simple common subexpression elimination if they don’t understand what the builtin returns for any given operand. Thus, inlining the builtin function’s code would be needed first and there would be a lot of code inlined due to the abundance of ftz calls required and you cannot eliminate it all (as at compile time you don’t know if the operand is a denorm or not). Another approach would be to introduce special cases to the optimizations affected so they understand the FTZ builtin and might be able to remove the useless ones. This potentially touches _a lot_ of code. And in the end, if the CPU could flush denorms efficiently using hardware (typically it’s faster to do FTZ in HW than gradual underflow so this is likely the case), any builtin call to do it that cannot be optimized away presents additional, possibly major, runtime overhead. We tested if a simple common subexpression elimination case works with the ftz builtins and it didn’t. CONST didn’t help here. However, I understand your concern that there might be optimizations that still break the FTZ semantics if there are no explicit builtin calls, but we are prepared to fix them case by case if/when they appear. The attached updated patch fixes a few additional cases we noticed, e.g. it disables several constant folding cases. On Mon, Aug 14, 2017 at 2:30 PM, Joseph Myers <jos...@codesourcery.com> wrote: > Presumably this means that constant folding needs to know about those > semantics, both for operations with a subnormal floating-point argument > (whether or not the output is floating point, or floating point in the > same format), and those with such a result? > Can assignments copy subnormals without converting them to zero? Should > comparisons flush input subnormals to zero before comparing? Should > conversions e.g. from float to double convert a float subnormal input to > zero? I can answer yes to all of these questions. BR, Pekka
From 0b97ccde3ec837329b4c551ccd7f98c074ca7a7b Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Henry=20Linjam=C3=A4ki?= <henry.linjam...@parmance.com> Date: Mon, 24 Jul 2017 09:28:00 +0300 Subject: [PATCH] Added common -fftz-math flag With the flag set on the compiler assumes the floating-point operations must flush received and resulting subnormal floating-point values to zero. --- gcc/common.opt | 5 + gcc/doc/invoke.texi | 11 ++ gcc/fold-const-call.c | 9 +- gcc/fold-const.c | 22 +++ gcc/match.pd | 14 +- gcc/simplify-rtx.c | 30 +++- gcc/testsuite/gcc.dg/ftz-math.c | 330 ++++++++++++++++++++++++++++++++++++++++ 7 files changed, 405 insertions(+), 16 deletions(-) create mode 100644 gcc/testsuite/gcc.dg/ftz-math.c diff --git a/gcc/common.opt b/gcc/common.opt index 13305558d2d..fd77d00d814 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2266,6 +2266,11 @@ fsingle-precision-constant Common Report Var(flag_single_precision_constant) Optimization Convert floating point constants to single precision constants. +fftz-math +Common Report Var(flag_ftz_math) Optimization +Optimizations handle floating-point operations as they must flush +subnormal floating-point values to zero. + fsplit-ivs-in-unroller Common Report Var(flag_split_ivs_in_unroller) Init(1) Optimization Split lifetimes of induction variables when loops are unrolled. diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index a6ce483d890..c3da6c8ebe3 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -9330,6 +9330,17 @@ The default is @option{-fno-signaling-nans}. This option is experimental and does not currently guarantee to disable all GCC optimizations that affect signaling NaN behavior. +@item -fftz-math +@opindex ftz-math +This option is experimental. With this flag on GCC treats +floating-point operations (except abs, classify, copysign and +negation) as they must flush subnormal input operands and results to +zero (FTZ). The FTZ rules are derived from HSA Programmers Reference +Manual for the base profile. This alters optimizations that would +break the rules, for example X * 1 -> X simplification. The option +assumes the target supports FTZ in hardware and has it enabled - +either by default or set by the user. + @item -fno-fp-int-builtin-inexact @opindex fno-fp-int-builtin-inexact Do not allow the built-in functions @code{ceil}, @code{floor}, diff --git a/gcc/fold-const-call.c b/gcc/fold-const-call.c index 381cb7fd290..21715f090da 100644 --- a/gcc/fold-const-call.c +++ b/gcc/fold-const-call.c @@ -1049,7 +1049,8 @@ fold_const_call_1 (combined_fn fn, tree type, tree arg) if (real_cst_p (arg)) { gcc_checking_assert (SCALAR_FLOAT_MODE_P (arg_mode)); - if (mode == arg_mode) + /* For -fftz-math subnormals are not folded correctly. */ + if (mode == arg_mode && !flag_ftz_math) { /* real -> real. */ REAL_VALUE_TYPE result; @@ -1299,7 +1300,8 @@ fold_const_call_1 (combined_fn fn, tree type, tree arg0, tree arg1) && real_cst_p (arg1)) { gcc_checking_assert (SCALAR_FLOAT_MODE_P (arg0_mode)); - if (mode == arg0_mode) + /* For -fftz-math subnormals are not folded correctly. */ + if (mode == arg0_mode && !flag_ftz_math) { /* real, real -> real. */ REAL_VALUE_TYPE result; @@ -1494,7 +1496,8 @@ fold_const_call_1 (combined_fn fn, tree type, tree arg0, tree arg1, tree arg2) && real_cst_p (arg2)) { gcc_checking_assert (SCALAR_FLOAT_MODE_P (arg0_mode)); - if (mode == arg0_mode) + /* For -fftz-math subnormals are not folded correctly. */ + if (mode == arg0_mode && !flag_ftz_math) { /* real, real, real -> real. */ REAL_VALUE_TYPE result; diff --git a/gcc/fold-const.c b/gcc/fold-const.c index f6d5af43b33..1b19bc93248 100644 --- a/gcc/fold-const.c +++ b/gcc/fold-const.c @@ -1152,6 +1152,11 @@ const_binop (enum tree_code code, tree arg1, tree arg2) bool inexact; tree t, type; + /* For ftz-math disable all floating point constant folding for + now. */ + if (flag_ftz_math) + return NULL_TREE; + /* The following codes are handled by real_arithmetic. */ switch (code) { @@ -2000,6 +2005,10 @@ fold_convert_const_real_from_real (tree type, const_tree arg1) && REAL_VALUE_ISSIGNALING_NAN (TREE_REAL_CST (arg1))) return NULL_TREE; + /* For ftz-math constant folding is disabled for now. */ + if (flag_ftz_math) + return NULL_TREE; + real_convert (&value, TYPE_MODE (type), &TREE_REAL_CST (arg1)); t = build_real (type, value); @@ -6479,6 +6488,10 @@ fold_real_zero_addition_p (const_tree type, const_tree addend, int negate) if (!real_zerop (addend)) return false; + /* X +/- 0 flushes subnormals to zero but plain X does not. */ + if (flag_ftz_math) + return false; + /* Don't allow the fold with -fsignaling-nans. */ if (HONOR_SNANS (element_mode (type))) return false; @@ -9117,6 +9130,11 @@ fold_binary_loc (location_t loc, arg0 = op0; arg1 = op1; + /* For ftz-math disable all floating point constant folding for + now. */ + if (flag_ftz_math && FLOAT_TYPE_P (type)) + return NULL_TREE; + /* Strip any conversions that don't change the mode. This is safe for every expression, except for a comparison expression because its signedness is derived from its operands. So, in @@ -13831,6 +13849,10 @@ fold_relational_const (enum tree_code code, tree type, tree op0, tree op1) if (TREE_CODE (op0) == REAL_CST && TREE_CODE (op1) == REAL_CST) { + /* For ftz-math disable all constant folding for now. */ + if (flag_ftz_math) + return NULL_TREE; + const REAL_VALUE_TYPE *c0 = TREE_REAL_CST_PTR (op0); const REAL_VALUE_TYPE *c1 = TREE_REAL_CST_PTR (op1); diff --git a/gcc/match.pd b/gcc/match.pd index 80a17ba3d23..c4e8eefe0c1 100644 --- a/gcc/match.pd +++ b/gcc/match.pd @@ -129,6 +129,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (simplify (mult @0 real_onep) (if (!HONOR_SNANS (type) + && !flag_ftz_math && (!HONOR_SIGNED_ZEROS (type) || !COMPLEX_FLOAT_TYPE_P (type))) (non_lvalue @0))) @@ -137,6 +138,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (simplify (mult @0 real_minus_onep) (if (!HONOR_SNANS (type) + && !flag_ftz_math && (!HONOR_SIGNED_ZEROS (type) || !COMPLEX_FLOAT_TYPE_P (type))) (negate @0))) @@ -240,13 +242,13 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) /* In IEEE floating point, x/1 is not equivalent to x for snans. */ (simplify (rdiv @0 real_onep) - (if (!HONOR_SNANS (type)) + (if (!HONOR_SNANS (type) && !flag_ftz_math) (non_lvalue @0))) /* In IEEE floating point, x/-1 is not equivalent to -x for snans. */ (simplify (rdiv @0 real_minus_onep) - (if (!HONOR_SNANS (type)) + (if (!HONOR_SNANS (type) && !flag_ftz_math) (negate @0))) (if (flag_reciprocal_math) @@ -1394,7 +1396,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) (for minmax (min max FMIN FMAX) (simplify (minmax @0 @0) - @0)) + (if (FLOAT_TYPE_P (type) && !flag_ftz_math) + @0))) /* min(max(x,y),y) -> y. */ (simplify (min:c (max:c @0 @1) @1) @@ -1853,7 +1856,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) || (GENERIC && TYPE_MAIN_VARIANT (type) == TYPE_MAIN_VARIANT (inside_type))) && (((inter_int || inter_ptr) && final_int) - || (inter_float && final_float)) + || (inter_float && final_float && !flag_ftz_math)) && inter_prec >= final_prec) (ocvt @0)) @@ -1862,7 +1865,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) former is wider than the latter and doesn't change the signedness (for integers). Avoid this if the final type is a pointer since then we sometimes need the middle conversion. */ - (if (((inter_int && inside_int) || (inter_float && inside_float)) + (if (((inter_int && inside_int) || (inter_float && inside_float + && !flag_ftz_math)) && (final_int || final_float) && inter_prec >= inside_prec && (inter_float || inter_unsignedp == inside_unsignedp)) diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c index 7cab26a0e34..3c904cdefd6 100644 --- a/gcc/simplify-rtx.c +++ b/gcc/simplify-rtx.c @@ -1240,8 +1240,12 @@ simplify_unary_operation_1 (enum rtx_code code, machine_mode mode, rtx op) if (DECIMAL_FLOAT_MODE_P (mode)) break; - /* (float_truncate:SF (float_extend:DF foo:SF)) = foo:SF. */ - if (GET_CODE (op) == FLOAT_EXTEND + /* (float_truncate:SF (float_extend:DF foo:SF)) = foo:SF except + for -fftz-math with subnormal input. Simplifications like + this must be prevented as they no longer perform + flush-to-zero as required by the semantics of -fftz-math + flag. */ + if (!flag_ftz_math && GET_CODE (op) == FLOAT_EXTEND && GET_MODE (XEXP (op, 0)) == mode) return XEXP (op, 0); @@ -1891,14 +1895,16 @@ simplify_const_unary_operation (enum rtx_code code, machine_mode mode, case FLOAT_TRUNCATE: /* Don't perform the operation if flag_signaling_nans is on and the operand is a signaling NaN. */ - if (HONOR_SNANS (mode) && REAL_VALUE_ISSIGNALING_NAN (d)) + if ((HONOR_SNANS (mode) && REAL_VALUE_ISSIGNALING_NAN (d)) + || flag_ftz_math) return NULL_RTX; d = real_value_truncate (mode, d); break; case FLOAT_EXTEND: /* Don't perform the operation if flag_signaling_nans is on and the operand is a signaling NaN. */ - if (HONOR_SNANS (mode) && REAL_VALUE_ISSIGNALING_NAN (d)) + if ((HONOR_SNANS (mode) && REAL_VALUE_ISSIGNALING_NAN (d)) + || flag_ftz_math) return NULL_RTX; /* All this does is change the mode, unless changing mode class. */ @@ -2137,7 +2143,8 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode, when x is NaN, infinite, or finite and nonzero. They aren't when x is -0 and the rounding mode is not towards -infinity, since (-0) + 0 is then 0. */ - if (!HONOR_SIGNED_ZEROS (mode) && trueop1 == CONST0_RTX (mode)) + if (!HONOR_SIGNED_ZEROS (mode) && !flag_ftz_math + && trueop1 == CONST0_RTX (mode)) return op0; /* ((-a) + b) -> (b - a) and similarly for (a + (-b)). These @@ -2342,8 +2349,9 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode, /* Subtracting 0 has no effect unless the mode has signed zeros and supports rounding towards -infinity. In such a case, 0 - 0 is -0. */ - if (!(HONOR_SIGNED_ZEROS (mode) - && HONOR_SIGN_DEPENDENT_ROUNDING (mode)) + if (!((HONOR_SIGNED_ZEROS (mode) + && HONOR_SIGN_DEPENDENT_ROUNDING (mode)) + || flag_ftz_math) && trueop1 == CONST0_RTX (mode)) return op0; @@ -2558,6 +2566,7 @@ simplify_binary_operation_1 (enum rtx_code code, machine_mode mode, /* In IEEE floating point, x*1 is not equivalent to x for signalling NaNs. */ if (!HONOR_SNANS (mode) + && (FLOAT_MODE_P (mode) && !flag_ftz_math) && trueop1 == CONST1_RTX (mode)) return op0; @@ -4001,6 +4010,10 @@ simplify_const_binary_operation (enum rtx_code code, machine_mode mode, const REAL_VALUE_TYPE *opr0, *opr1; bool inexact; + /* Subnormals are not handled correctly with -fftz-math. */ + if (flag_ftz_math) + return 0; + opr0 = CONST_DOUBLE_REAL_VALUE (op0); opr1 = CONST_DOUBLE_REAL_VALUE (op1); @@ -5083,7 +5096,8 @@ simplify_const_relational_operation (enum rtx_code code, /* If the operands are floating-point constants, see if we can fold the result. */ - if (CONST_DOUBLE_AS_FLOAT_P (trueop0) + if (!flag_ftz_math + && CONST_DOUBLE_AS_FLOAT_P (trueop0) && CONST_DOUBLE_AS_FLOAT_P (trueop1) && SCALAR_FLOAT_MODE_P (GET_MODE (trueop0))) { diff --git a/gcc/testsuite/gcc.dg/ftz-math.c b/gcc/testsuite/gcc.dg/ftz-math.c new file mode 100644 index 00000000000..f782515a044 --- /dev/null +++ b/gcc/testsuite/gcc.dg/ftz-math.c @@ -0,0 +1,330 @@ +/* Tests -fftz-math flag */ +/* { dg-do run { target x86_64-*-* } } */ +/* { dg-options "-O2 -fftz-math" } */ + +#include <math.h> + +/* #define DEBUG_TEST */ +#ifdef DEBUG_TEST +# include <stdio.h> +#endif + +#include "xmmintrin.h" +#include "pmmintrin.h" + +union uf +{ + unsigned int u; + float f; +}; + +union ud +{ + unsigned long long u; + double d; +}; + +static unsigned int +f2u (float v) +{ + union uf u; + u.f = v; + return u.u; +} + +static unsigned long long +d2u (double v) +{ + union ud u; + u.d = v; + return u.u; +} + + +static void +enable_ftz_mode () +{ + _MM_SET_FLUSH_ZERO_MODE (_MM_FLUSH_ZERO_ON); + _MM_SET_DENORMALS_ZERO_MODE (_MM_DENORMALS_ZERO_ON); +} + +static int +test_sf_is_zero (float x) +{ + /* FTZ mode is on, must do bitwise ops for zero test. */ + return ((f2u (x) & 0x7fffffffu) == 0u); +} + +static int +test_df_is_zero (double x) +{ + /* FTZ mode is on, must do bitwise ops for zero test. */ + return ((d2u (x) & 0x7fffffffffffffffull) == 0ull); +} + +static int +test_sf_is_subnormal (float x) { + unsigned int u = f2u (x); + if (u & 0x7f800000u) + return 0; + return (u & 0x007fffffu); +} + +static int +test_df_is_subnormal (double x) { + unsigned long long u = d2u (x); + if (u & 0x7ff0000000000000ull) + return 0; + return (u & 0x000fffffffffffffull); +} + +#ifdef DEBUG_TEST +void err_print (unsigned line, const char* expr) +{ + printf ("Line %d: FAIL: %s\n", line, expr); + abort (); +} +# define TEST_SF_IS_ZERO(expr) \ + if (!test_sf_is_zero (expr)) err_print (__LINE__, #expr) +# define TEST_SF_IS_SUBNORMAL(expr) \ + if (!test_sf_is_subnormal (expr)) err_print (__LINE__, #expr) +# define TEST_DF_IS_ZERO(expr) \ + if (!test_df_is_zero (expr)) err_print (__LINE__, #expr) +# define TEST_DF_IS_SUBNORMAL(expr) \ + if (!test_df_is_subnormal (expr)) err_print (__LINE__, #expr) +# define TEST_TRUE(expr) if (!(expr)) err_print (__LINE__, #expr) +#else +# define TEST_SF_IS_ZERO(expr) if (!test_sf_is_zero (expr)) abort () +# define TEST_SF_IS_SUBNORMAL(expr) if (!test_sf_is_subnormal (expr)) abort () +# define TEST_DF_IS_ZERO(expr) if (!test_df_is_zero (expr)) abort () +# define TEST_DF_IS_SUBNORMAL(expr) if (!test_df_is_subnormal (expr)) abort () +# define TEST_TRUE(expr) if (!(expr)) abort () +#endif + +volatile float sf; +volatile double df; + +int +main () +{ + enable_ftz_mode (); + + /* Circulate through volatile to avoid constant folding. */ + sf = 2.87E-42f; /* = subnormal */ + float x = sf; + + TEST_SF_IS_SUBNORMAL (x); /* Store/load should not flush. */ + TEST_TRUE (!isnormal (x)); + TEST_TRUE (fpclassify (x) == FP_SUBNORMAL); + + TEST_DF_IS_ZERO ((double) x); + + /* Test the expression is not simplified to plain x, thus, leaking the + subnormal. */ + TEST_SF_IS_ZERO (x * 1); + TEST_SF_IS_ZERO (x * -1); + TEST_SF_IS_ZERO (x * 0.5); + + TEST_SF_IS_ZERO (x / 1); + TEST_SF_IS_ZERO (x / -1); + TEST_SF_IS_ZERO (x / 2); + + TEST_SF_IS_ZERO (fminf (x, x)); + TEST_SF_IS_ZERO (fminf (x, -x)); + TEST_SF_IS_ZERO (fmaxf (x, x)); + TEST_SF_IS_ZERO (fmaxf (x, -x)); + + TEST_SF_IS_ZERO (x + 0); + TEST_SF_IS_ZERO (0 - x); + + TEST_SF_IS_ZERO (x - 0); + TEST_SF_IS_ZERO (x + x); + + TEST_SF_IS_ZERO (x + 0.0f); + TEST_SF_IS_ZERO (0.0f - x); + TEST_SF_IS_ZERO (x - 0.0f); + TEST_SF_IS_ZERO (x + x); + + TEST_SF_IS_ZERO (x * copysignf (1.0f, x)); + TEST_SF_IS_ZERO (x * copysignf (1.0f, -x)); + + float y = sf; + TEST_SF_IS_ZERO (fminf (fmaxf (x, y), y)); + + TEST_SF_IS_SUBNORMAL (x == y ? x : y); + TEST_SF_IS_SUBNORMAL (x != y ? x : y); + TEST_SF_IS_SUBNORMAL (x >= y ? x : y); + TEST_SF_IS_SUBNORMAL (x > y ? x : y); + TEST_SF_IS_SUBNORMAL (x <= y ? x : y); + TEST_SF_IS_SUBNORMAL (x < y ? x : y); + + /* FP ops that should not flush. */ + TEST_SF_IS_SUBNORMAL (fabsf (x)); + TEST_SF_IS_SUBNORMAL (x < 0 ? -x : x); + TEST_SF_IS_SUBNORMAL (-x); + TEST_SF_IS_SUBNORMAL (copysignf (x, -1.0)); + + /* Test constant folding with subnormal values. */ + TEST_TRUE (!isnormal (2.87E-42f)); + + TEST_SF_IS_SUBNORMAL (-(2.87E-42f)); + TEST_SF_IS_SUBNORMAL (fabsf (-2.87E-42f)); + TEST_SF_IS_SUBNORMAL (copysignf (2.87E-42f, -1.0)); + + TEST_SF_IS_ZERO (fminf (2.87E-42f, 2.87E-42f)); + TEST_SF_IS_ZERO (fminf (2.87E-42f, -5.74E-42f)); + TEST_SF_IS_ZERO (fmaxf (2.87E-42f, 2.87E-42f)); + TEST_SF_IS_ZERO (fmaxf (2.87E-42f, -5.74E-42f)); + + TEST_SF_IS_ZERO (floorf (-2.87E-42f)); + TEST_SF_IS_ZERO (ceilf (2.87E-42f)); + + TEST_SF_IS_ZERO (sqrtf (2.82E-42f)); + + TEST_SF_IS_ZERO (2.87E-42f + 0.0f); + TEST_SF_IS_ZERO (2.87E-42f + 5.74E-42f); + TEST_SF_IS_ZERO (2.87E-42f - 0.0f); + TEST_SF_IS_ZERO (0.0f - 2.87E-42f); + TEST_SF_IS_ZERO (2.87E-42f * 1.0f); + TEST_SF_IS_ZERO (2.87E-42f * -1.0f); + TEST_SF_IS_ZERO (2.87E-42f * 12.3f); + TEST_SF_IS_ZERO (2.87E-42f / 1.0f); + TEST_SF_IS_ZERO (2.87E-42f / 12.3f); + + TEST_TRUE (2.87E-42f == -5.74E-42f); + TEST_TRUE (2.87E-42f == -5.74E-42f ? 1 : 0); + + TEST_TRUE (2.87E-42f == -5.74E-42f ? 1 : 0); + TEST_TRUE (2.87E-42f != -5.74E-42f ? 0 : 1); + TEST_TRUE (2.87E-42f >= -5.74E-42f ? 1 : 0); + TEST_TRUE (2.87E-42f >= 5.74E-42f ? 1 : 0); + TEST_TRUE (2.87E-42f > -5.74E-42f ? 0 : 1); + TEST_TRUE (2.87E-42f > 5.74E-42f ? 0 : 1); + TEST_TRUE (2.87E-42f <= -5.74E-42f ? 1 : 0); + TEST_TRUE (2.87E-42f <= 5.74E-42f ? 1 : 0); + TEST_TRUE (2.87E-42f < -5.74E-42f ? 0 : 1); + TEST_TRUE (2.87E-42f < 5.74E-42f ? 0 : 1); + + /* A < B ? A : B -> min (B, A) must not happen (min flushes to zero).*/ + TEST_SF_IS_SUBNORMAL (2.87E-42f < -5.74E-42f ? 2.87E-42f : -5.74E-42f); + + /* Normal and subnormal input. */ + TEST_TRUE ((2.87E-42f + 1.1754944E-38f) == 1.1754944E-38f); + TEST_TRUE ((1.1754944E-38f - 2.87E-42f) == 1.1754944E-38f); + + /* Expression with normal numbers. Result of the Substraction is + subnormal. */ + float sf_tmp = (1.469368E-38f - 1.1754944E-38f) + 1.1754944E-38f; + TEST_TRUE (sf_tmp == 1.1754944E-38f); + + + /*** Test with double precision. ***/ + df = 5.06E-321; + double dx = df; + + TEST_DF_IS_SUBNORMAL (dx); + TEST_TRUE (!isnormal (dx)); + TEST_TRUE (fpclassify (dx) == FP_SUBNORMAL); + + TEST_SF_IS_ZERO ((float)dx); + + TEST_DF_IS_ZERO (dx * 1); + TEST_DF_IS_ZERO (dx * -1); + TEST_DF_IS_ZERO (dx * 0.5); + + TEST_DF_IS_ZERO (dx / 1); + TEST_DF_IS_ZERO (dx / -1); + TEST_DF_IS_ZERO (dx / 2); + + TEST_DF_IS_ZERO (fmin (dx, dx)); + TEST_DF_IS_ZERO (fmin (dx, -dx)); + TEST_DF_IS_ZERO (fmax (dx, dx)); + TEST_DF_IS_ZERO (fmax (dx, -dx)); + + TEST_DF_IS_ZERO (dx + 0); + TEST_DF_IS_ZERO (0 - dx); + TEST_DF_IS_ZERO (dx - 0); + TEST_DF_IS_ZERO (dx + dx); + + TEST_DF_IS_ZERO (dx + 0.0); + TEST_DF_IS_ZERO (0.0 - dx); + TEST_DF_IS_ZERO (dx - 0.0); + + TEST_DF_IS_ZERO (dx * copysign (1.0, dx)); + TEST_DF_IS_ZERO (dx * copysign (1.0, -dx)); + + df = -1.61895E-319; + double dy = df; + TEST_SF_IS_ZERO (fmin (fmax (dx, dy), dy)); + + TEST_DF_IS_SUBNORMAL (dx == dy ? dx : dy); + TEST_DF_IS_SUBNORMAL (dx != dy ? dx : dy); + TEST_DF_IS_SUBNORMAL (dx >= dy ? dx : dy); + TEST_DF_IS_SUBNORMAL (dx > dy ? dx : dy); + TEST_DF_IS_SUBNORMAL (dx <= dy ? dx : dy); + TEST_DF_IS_SUBNORMAL (dx < dy ? dx : dy); + + /* FP ops that should not flush. */ + TEST_DF_IS_SUBNORMAL (fabs (dx)); + TEST_DF_IS_SUBNORMAL (dx < 0 ? -dx : dx); + TEST_DF_IS_SUBNORMAL (-dx); + TEST_DF_IS_SUBNORMAL (copysign (dx, -1.0)); + + /* Test constant folding with subnormal values. */ + + TEST_TRUE (!isnormal (5.06E-321)); + TEST_TRUE (fpclassify (5.06E-321) == FP_SUBNORMAL); + + TEST_DF_IS_SUBNORMAL (-(5.06E-321)); + TEST_DF_IS_SUBNORMAL (fabs (-5.06E-321)); + TEST_DF_IS_SUBNORMAL (copysign (5.06E-321, -1.0)); + + TEST_DF_IS_ZERO (fmin (5.06E-321, 5.06E-321)); + TEST_DF_IS_ZERO (fmin (5.06E-321, -1.61895E-319)); + TEST_DF_IS_ZERO (fmax (5.06E-321, 5.06E-321)); + TEST_DF_IS_ZERO (fmax (5.06E-321, -1.61895E-319)); + + TEST_DF_IS_ZERO (floor (-5.06E-321)); + TEST_DF_IS_ZERO (ceil (5.06E-321)); + TEST_DF_IS_ZERO (sqrt (2.82E-42f)); + + TEST_DF_IS_ZERO (5.06E-321 + 0.0); + TEST_DF_IS_ZERO (5.06E-321 + 1.61895E-319); + TEST_DF_IS_ZERO (5.06E-321 - 0.0); + TEST_DF_IS_ZERO (0.0 - 5.06E-321); + TEST_DF_IS_ZERO (5.06E-321 * 1.0); + TEST_DF_IS_ZERO (5.06E-321 * -1.0); + TEST_DF_IS_ZERO (5.06E-321 * 12.3); + TEST_DF_IS_ZERO (5.06E-321 / 1.0); + TEST_DF_IS_ZERO (5.06E-321 / 12.3); + + TEST_TRUE (5.06E-321 == -1.61895E-319); + + TEST_TRUE (5.06E-321 == -1.61895E-319 ? 1 : 0); + TEST_TRUE (5.06E-321 == -1.61895E-319 ? 1 : 0); + TEST_TRUE (5.06E-321 != -1.61895E-319 ? 0 : 1); + TEST_TRUE (5.06E-321 >= -1.61895E-319 ? 1 : 0); + TEST_TRUE (5.06E-321 >= 1.61895E-319 ? 1 : 0); + TEST_TRUE (5.06E-321 > -1.61895E-319 ? 0 : 1); + TEST_TRUE (5.06E-321 > 1.61895E-319 ? 0 : 1); + TEST_TRUE (5.06E-321 <= -1.61895E-319 ? 1 : 0); + TEST_TRUE (5.06E-321 <= 1.61895E-319 ? 1 : 0); + TEST_TRUE (5.06E-321 < -1.61895E-319 ? 0 : 1); + TEST_TRUE (5.06E-321 < 1.61895E-319 ? 0 : 1); + + /* A < B ? A : B -> min (B, A) must not happen (min flushes to zero).*/ + TEST_DF_IS_SUBNORMAL (5.06E-321 < -1.61895E-319 ? 5.06E-321 : -1.61895E-319); + + /* Normal and subnormal input. */ + TEST_TRUE ((5.06E-321 + 3.33E-308) == 3.33E-308); + TEST_TRUE ((3.33E-308 - 5.06E-321) == 3.33E-308); + + /* Expression with normal numbers. Result of the Substraction is + subnormal. */ + double df_a = 3.33E-308; + double df_b = 2.78E-308; + double df_tmp = (df_a - df_b) + df_b; + TEST_TRUE (df_tmp == df_b); + + return 0; +}