On 09/12/2016 10:19 AM, Tamar Christina wrote:
Hi All,
This patch adds an optimized route to the fpclassify builtin
for floating point numbers which are similar to IEEE-754 in format.
The goal is to make it faster by:
1. Trying to determine the most common case first
(e.g. the float is a Normal number) and then the
rest. The amount of code generated at -O2 are
about the same ± 1 instruction, but the code
is much better.
2. Using integer operation in the optimized path.
At a high level, the optimized path uses integer operations
to perform the following:
if (exponent bits aren't all set or unset)
return Normal;
else if (no bits are set on the number after masking out
sign bits then)
return Zero;
else if (exponent has no bits set)
return Subnormal;
else if (mantissa has no bits set)
return Infinite;
else
return NaN;
In case the optimization can't be applied the old
implementation is used as a fall-back.
A limitation with this new approach is that the exponent
of the floating point has to fit in 31 bits and the floating
point has to have an IEEE like format and values for NaN and INF
(e.g. for NaN and INF all bits of the exp must be set).
To determine this IEEE likeness a new boolean was added to real_format.
Regression tests ran on aarch64-none-linux and arm-none-linux-gnueabi
and no regression. x86 uses it's own implementation other than
the fpclassify builtin.
As an example, Aarch64 now generates for classification of doubles:
f:
fmov x1, d0
mov w0, 7
sbfx x2, x1, 52, 11
add w3, w2, 1
tst w3, 0x07FE
bne .L1
mov w0, 13
tst x1, 0x7fffffffffffffff
beq .L1
mov w0, 11
tbz x2, 0, .L1
tst x1, 0xfffffffffffff
mov w0, 3
mov w1, 5
csel w0, w0, w1, ne
.L1:
ret
No new tests as there are existing tests to test functionality.
glibc benchmarks ran against the builtin and this shows a 31.3%
performance gain.
Ok for trunk?
Thanks,
Tamar
PS. I don't have commit rights so if OK can someone apply the patch for me.
gcc/
2016-08-25 Tamar Christina <tamar.christ...@arm.com>
Wilco Dijkstra <wilco.dijks...@arm.com>
* gcc/builtins.c (fold_builtin_fpclassify): Added optimized version.
* gcc/real.h (real_format): Added is_ieee_compatible field.
* gcc/real.c (ieee_single_format): Set is_ieee_compatible flag.
(mips_single_format): Likewise.
(motorola_single_format): Likewise.
(spu_single_format): Likewise.
(ieee_double_format): Likewise.
(mips_double_format): Likewise.
(motorola_double_format): Likewise.
(ieee_extended_motorola_format): Likewise.
(ieee_extended_intel_128_format): Likewise.
(ieee_extended_intel_96_round_53_format): Likewise.
(ibm_extended_format): Likewise.
(mips_extended_format): Likewise.
(ieee_quad_format): Likewise.
(mips_quad_format): Likewise.
(vax_f_format): Likewise.
(vax_d_format): Likewise.
(vax_g_format): Likewise.
(decimal_single_format): Likewise.
(decimal_quad_format): Likewise.
(iee_half_format): Likewise.
(mips_single_format): Likewise.
(arm_half_format): Likewise.
(real_internal_format): Likewise.
gcc-public.patch
diff --git a/gcc/builtins.c b/gcc/builtins.c
index
1073e35b17b1bc1f6974c71c940bd9d82bbbfc0f..58bf129f9a0228659fd3b976d38d021d1d5bd6bb
100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -7947,10 +7947,8 @@ static tree
fold_builtin_fpclassify (location_t loc, tree *args, int nargs)
{
tree fp_nan, fp_infinite, fp_normal, fp_subnormal, fp_zero,
- arg, type, res, tmp;
+ arg, type, res;
machine_mode mode;
- REAL_VALUE_TYPE r;
- char buf[128];
/* Verify the required arguments in the original call. */
if (nargs != 6
@@ -7970,14 +7968,143 @@ fold_builtin_fpclassify (location_t loc, tree *args,
int nargs)
arg = args[5];
type = TREE_TYPE (arg);
mode = TYPE_MODE (type);
- arg = builtin_save_expr (fold_build1_loc (loc, ABS_EXPR, type, arg));
+ const real_format *format = REAL_MODE_FORMAT (mode);
+
+ /*
+ For IEEE 754 types:
+
+ fpclassify (x) ->
+ !((exp + 1) & (exp_mask & ~1)) // exponent bits not all set or unset
+ ? (x & sign_mask == 0 ? FP_ZERO :
+ (exp & exp_mask == exp_mask
+ ? (mantisa == 0 ? FP_INFINITE : FP_NAN) :
+ FP_SUBNORMAL)):
+ FP_NORMAL.
+
+ Otherwise
+
+ fpclassify (x) ->
+ isnan (x) ? FP_NAN :
+ (fabs (x) == Inf ? FP_INFINITE :
+ (fabs (x) >= DBL_MIN ? FP_NORMAL :
+ (x == 0 ? FP_ZERO : FP_SUBNORMAL))).
+ */
+
+ /* Check if the number that is being classified is close enough to IEEE 754
+ format to be able to go in the early exit code. */
+ if (format->is_binary_ieee_compatible)
+ {
+ gcc_assert (format->b == 2);
+
+ const tree int_type = integer_type_node;
+ const int exp_bits = (GET_MODE_SIZE (mode) * BITS_PER_UNIT) - format->p;
+ const int exp_mask = (1 << exp_bits) - 1;
+
+ tree exp, specials, exp_bitfield,
+ const_arg0, const_arg1, const0, const1,
+ not_sign_mask, zero_check, mantissa_mask,
+ mantissa_any_set, exp_lsb_set, mask_check;
+ tree int_arg_type, int_arg;
Style nit. Just use
tree exp, specials, exp_bitfield;
tree const_arg0, const_arg1, etc etc.
+
+ /* Re-interpret the float as an unsigned integer type
+ with equal precision. */
+ int_arg_type = build_nonstandard_integer_type (TYPE_PRECISION (type), 0);
+ int_arg = fold_build1_loc (loc, INDIRECT_REF, int_arg_type,
+ fold_build1_loc (loc, NOP_EXPR,
+ build_pointer_type (int_arg_type),
+ fold_build1_loc (loc, ADDR_EXPR,
+ build_pointer_type (type), arg)));
Doesn't this make ARG addressable? Which in turn means ARG won't be
exposed to the gimple/ssa optimizers. Or is it the case that when
fpclassify is used its argument is already in memory (and thus addressable?)
+ /* ~(1 << location_sign_bit).
+ This creates a mask that can be used to mask out the sign bit. */
+ not_sign_mask = fold_build1_loc (loc, BIT_NOT_EXPR, int_arg_type,
+ fold_build2_loc (loc, LSHIFT_EXPR, int_arg_type,
+ const_arg1,
+ build_int_cst (int_arg_type, format->signbit_rw)));
Formatting nits. When you have to wrap a call, the arguments are
formatted like this
foo (arg, arg, arg, ...
arg, arg
Given you've got calls to fold_build2_loc, build_int_cst, etc embedded
inside other calls to fold_build2_loc, I'd just create some temporaries
to hold the results of the inner calls. That'll clean up the formatting
significantly.
+ exp, const1));
+
+ /* Combine the values together. */
+ specials = fold_build3_loc (loc, COND_EXPR, int_type, zero_check,
fp_zero,
+ fold_build3_loc (loc, COND_EXPR, int_type, exp_lsb_set,
+ fold_build3_loc (loc, COND_EXPR, int_type, mantissa_any_set,
+ HONOR_NANS (mode) ? fp_nan : fp_normal,
+ HONOR_INFINITIES (mode) ? fp_infinite : fp_normal),
+ fp_subnormal));
So this implies you're running on generic, not gimple, right? Otherwise
you can't generate these kinds of expressions.
diff --git a/gcc/real.h b/gcc/real.h
index
59af580e78f2637be84f71b98b45ec6611053222..36ded57cf4db7c30c935bdb24219a167480f39c8
100644
--- a/gcc/real.h
+++ b/gcc/real.h
@@ -161,6 +161,15 @@ struct real_format
bool has_signed_zero;
bool qnan_msb_set;
bool canonical_nan_lsbs_set;
+
+ /* This flag indicates whether the format can be used in the optimized
+ code paths for the __builtin_fpclassify function and friends.
+ The format has to have the same NaN and INF representation as normal
+ IEEE floats (e.g. exp must have all bits set), most significant bit must
be
+ sign bit, followed by exp bits of at most 32 bits. Lastly the floating
+ point number must be representable as an integer. The base of the number
+ also must be base 2. */
+ bool is_binary_ieee_compatible;
const char *name;
};
I think Joseph has already commented on the contents of the initializer
and a few more cases were we can use the optimized paths.
However, I do have a general question. There are some targets which
have FPUs that are basically IEEE, but don't support certain IEEE
features like NaNs, denorms, etc.
Presumably all that's needed is for those targets to define a hook to
describe which checks will always be false and you can check the hook's
return value. Right?
Can you please include some tests to verify you're getting the initial
code generation you want? Ideally there'd be execution tests too where
you generate one of the special nodes, then call the __builtin and
verify that you get the expected results back. The latter in particular
are key since it'll allow us to catch problems much earlier across the
wide variety of targets GCC supports.
I think you already had plans to post an updated patch. Please include
the fixes noted above in that update.
And just to be clear, I like where this is going, I just think we're
going to need a couple iterations to iron everything out.
Jeff