On 27/10/15 15:57, Marc Glisse wrote:
On Tue, 27 Oct 2015, Kyrill Tkachov wrote:
Hi Marc,
On 30/08/15 08:57, Marc Glisse wrote:
Hello,
just trying to shrink fold-const.c a bit more.
initializer_zerop is close to what I was looking for with zerop, but I
wasn't sure if it would be safe (it accepts some CONSTRUCTOR and
STRING_CST). At some point I tried using sign_bit_p, but using the return
of that function in the simplification confused the machinery too much. I
added an "overload" of element_precision like the one in element_mode, for
convenience.
Bootstrap+testsuite on ppc64le-redhat-linux.
2015-08-31 Marc Glisse <marc.gli...@inria.fr>
gcc/
* tree.h (zerop): New function.
* tree.c (zerop): Likewise.
(element_precision): Handle expressions.
* match.pd (define_predicates): Add zerop.
(x <= +Inf): Fix comment.
(abs (x) == 0, A & C == C, A & C != 0): Converted from ...
* fold-const.c (fold_binary_loc): ... here. Remove.
gcc/testsuite/
* gcc.dg/tree-ssa/cmp-1.c: New file.
+/* If we have (A & C) != 0 where C is the sign bit of A, convert
+ this into A < 0. Similarly for (A & C) == 0 into A >= 0. */
+(for cmp (eq ne)
+ ncmp (ge lt)
+ (simplify
+ (cmp (bit_and (convert?@2 @0) integer_pow2p@1) integer_zerop)
+ (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+ && (TYPE_PRECISION (TREE_TYPE (@0))
+ == GET_MODE_PRECISION (TYPE_MODE (TREE_TYPE (@0))))
+ && element_precision (@2) >= element_precision (@0)
+ && wi::only_sign_bit_p (@1, element_precision (@0)))
This condition is a bit strict when @0 is signed, for an int32_t i, (i & (5LL
<< 42)) != 0 would work as well thanks to sign extension.
+ (with { tree stype = signed_type_for (TREE_TYPE (@0)); }
+ (ncmp (convert:stype @0) { build_zero_cst (stype); })))))
+
With this patch and this pattern pattern in particular I've seen some code
quality regressions on aarch64.
I'm still trying to reduce a testcase to demonstrate the issue but it seems to
involve
intorucing extra conversions from unsigned to signed values. If I gate this
pattern on
!TYPE_UNSIGNED (TREE_TYPE (@0)) the codegen seems to improve.
Any thoughts?
Hmm, my first thoughts would be:
* conversion from unsigned to signed is a NOP,
* if a & signbit != 0 is faster to compute than a < 0, that's how a < 0 should
be expanded by the target,
* so the problem is probably something else, maybe the bit_and combined better
with another operation, or the cast obfuscates things enough to confuse a later
pass. Or maybe something related to @2.
Thanks,
So here the types are shorts and unsigned shorts. On aarch64 these are HImode
values and there's no direct arithmetic
operations on them, so they have to be extended to SImode and truncated back.
An example would help understand what we are talking about...
Working on it. I'm trying to introduce gcc_unreachable calls into the compiler
when the bad situation happens
and reducing the original file, but I think I'm not capturing the conditions
that trigger this behaviour
exactly right :(
Kyrill