> -----Original Message----- > From: Richard Biener <rguent...@suse.de> > Sent: Friday, October 15, 2021 10:07 AM > To: Tamar Christina <tamar.christ...@arm.com> > Cc: Richard Earnshaw <richard.earns...@foss.arm.com>; gcc- > patc...@gcc.gnu.org; nd <n...@arm.com> > Subject: RE: [PATCH]middle-end convert negate + right shift into compare > greater. > > On Fri, 15 Oct 2021, Tamar Christina wrote: > > > > > > > > > +/* Fold (-x >> C) into x > 0 where C = precision(type) - 1. */ > > > > +(for cst (INTEGER_CST VECTOR_CST) (simplify > > > > + (rshift (negate:s @0) cst@1) > > > > + (if (!flag_wrapv) > > > > > > Don't test flag_wrapv directly, instead use the appropriate > > > TYPE_OVERFLOW_{UNDEFINED,WRAPS} predicates. But I'm not sure > what > > > we are protecting against? Right-shift of signed integers is > > > implementation- defined and GCC treats it as you'd expect, sign- > extending the result. > > > > > > > It's protecting against the overflow of the negate on INT_MIN. When > > wrapping overflows are enabled the results would be wrong. > > But -INT_MIN == INT_MIN in twos-complement so I fail to see the wrong > result? That is, both -INT_MIN >> 31 and INT_MIN >> 31 are -1. > > > > > + (with { tree ctype = TREE_TYPE (@0); > > > > + tree stype = TREE_TYPE (@1); > > > > + tree bt = truth_type_for (ctype); } > > > > + (switch > > > > + /* Handle scalar case. */ > > > > + (if (INTEGRAL_TYPE_P (ctype) > > > > + && !VECTOR_TYPE_P (ctype) > > > > + && !TYPE_UNSIGNED (ctype) > > > > + && canonicalize_math_after_vectorization_p () > > > > + && wi::eq_p (wi::to_wide (@1), TYPE_PRECISION (stype) - 1)) > > > > + (convert:bt (gt:bt @0 { build_zero_cst (stype); }))) > > > > > > I'm not sure why the result is of type 'bt' rather than the original > > > type of the expression? > > > > That was to satisfy some RTL check that expected results of > > comparisons to always be a Boolean, though for scalar that logically > > always is the case, I just added it for consistency. > > > > > > > > In that regard for non-vectors we'd have to add the sign extension > > > from unsigned bool, in the vector case we'd hope the type of the > > > comparison is correct. I think in both cases it might be convenient > > > to use > > > > > > (cond (gt:bt @0 { build_zero_cst (ctype); }) { build_all_ones_cst > > > (ctype); } { build_zero_cost (ctype); }) > > > > > > to compute the correct result and rely on (cond ..) simplifications > > > to simplify that if possible. > > > > > > Btw, 'stype' should be irrelevant - you need to look at the > > > precision of 'ctype', no? > > > > I was working under the assumption that both input types must have the > > same precision, but turns out that assumption doesn't need to hold. > > > > New version attached. > > > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu > > and no regressions. > > > > Ok for master? > > > > Thanks, > > Tamar > > > > gcc/ChangeLog: > > > > * match.pd: New negate+shift pattern. > > > > gcc/testsuite/ChangeLog: > > > > * gcc.dg/signbit-2.c: New test. > > * gcc.dg/signbit-3.c: New test. > > * gcc.target/aarch64/signbit-1.c: New test. > > > > --- inline copy of patch --- > > > > diff --git a/gcc/match.pd b/gcc/match.pd index > > > 7d2a24dbc5e9644a09968f877e12a824d8ba1caa..9532cae582e152cae6e22fcce > 95a > > 9744a844e3c2 100644 > > --- a/gcc/match.pd > > +++ b/gcc/match.pd > > @@ -38,7 +38,8 @@ along with GCC; see the file COPYING3. If not see > > uniform_integer_cst_p > > HONOR_NANS > > uniform_vector_p > > - bitmask_inv_cst_vector_p) > > + bitmask_inv_cst_vector_p > > + expand_vec_cmp_expr_p) > > > > /* Operator lists. */ > > (define_operator_list tcc_comparison > > @@ -826,6 +827,42 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > > { tree utype = unsigned_type_for (type); } > > (convert (rshift (lshift (convert:utype @0) @2) @3)))))) > > > > +/* Fold (-x >> C) into x > 0 where C = precision(type) - 1. */ (for > > +cst (INTEGER_CST VECTOR_CST) (simplify > > + (rshift (negate:s @0) cst@1) > > + (if (!TYPE_OVERFLOW_WRAPS (type)) > > as said, I don't think that's necessary but at least it's now written > correctly ;) > > > + (with { tree ctype = TREE_TYPE (@0); > > Instead of 'ctype' you can use 'type' since the type of the expression is the > same as that of @0 > > > + tree stype = TREE_TYPE (@1); > > + tree bt = truth_type_for (ctype); > > + tree zeros = build_zero_cst (ctype); } > > + (switch > > + /* Handle scalar case. */ > > + (if (INTEGRAL_TYPE_P (ctype) > > + && !VECTOR_TYPE_P (ctype) > > INTEGRAL_TYPE_P does not include VECTOR_TYPE_P. > > > + && !TYPE_UNSIGNED (ctype) > > + && canonicalize_math_after_vectorization_p () > > + && wi::eq_p (wi::to_wide (@1), TYPE_PRECISION (ctype) - 1)) > > + (cond (gt:bt @0 { zeros; }) { build_all_ones_cst (ctype); } { > > zeros; })) > > + /* Handle vector case with a scalar immediate. */ > > + (if (VECTOR_INTEGER_TYPE_P (ctype) > > + && !VECTOR_TYPE_P (stype) > > + && !TYPE_UNSIGNED (ctype) > > + && expand_vec_cmp_expr_p (ctype, ctype, { GT_EXPR })) > > + (with { HOST_WIDE_INT bits = GET_MODE_UNIT_BITSIZE > (TYPE_MODE (ctype)); } > > + (if (wi::eq_p (wi::to_wide (@1), bits - 1)) > > You can use element_precision (@0) - 1 in both the scalar and vector case. > > > + (convert:bt (gt:bt @0 { zeros; }))))) > > + /* Handle vector case with a vector immediate. */ > > + (if (VECTOR_INTEGER_TYPE_P (ctype) > > + && VECTOR_TYPE_P (stype) > > + && !TYPE_UNSIGNED (ctype) > > + && uniform_vector_p (@1) > > + && expand_vec_cmp_expr_p (ctype, ctype, { GT_EXPR })) > > + (with { tree cst = vector_cst_elt (@1, 0); > > + HOST_WIDE_INT bits = GET_MODE_UNIT_BITSIZE (TYPE_MODE > (ctype)); } > > + (if (wi::eq_p (wi::to_wide (cst), bits - 1)) > > + (convert:bt (gt:bt @0 { zeros; })))))))))) > > The (convert:bt are not necessary. Note that 'ctype' is not a valid mask type > but we eventually will be forgiving enough here. You are computing > thruth_type_for, asking for gt:bt but checking expand_vec_cmp_expr_p > (ctype, ctype, ...), that's inconsistent and will lead to problems on targets > with AVX512VL + AVX2 which will have a AVX2 compare of say V4SImode > producing a V4SImode mask but thruth_type_for will give you a QImode > vector type. >
Right, What I wanted to check here was that the target has the same mode for the mask as the result, which looking at expand_vec_cmp_expr_p I thought would give me. The problem with checking expand_vec_cmp_expr_p with bt as the mask crashes with SVE because the comparison becomes a predicate which is applied on an operation. So I guess there are tree various different possibilities here. > As said the most general transform would be to > > (cond (gt:bt @0 { zeros; }) { build_zero_cst (ctype); } { build_all_ones_cst > (ctype); }) > I tried this before but it doesn't optimize the cond away. So with this it produces: vect__6.10_7 = vect__4.8_9 > { 0, 0, 0, 0 } ? { 0, 0, 0, 0 } : { -1, -1, -1, -1 }; MEM <vector(4) int> [(int32_t *)_2] = vect__6.10_7; Which I suppose is technically correct, but it'll crash at expand time because this form doesn't look to be expected: 0x1262ac1 prepare_cmp_insn /src/gcc/gcc/optabs.c:4532 0x1262f80 emit_cmp_and_jump_insns(rtx_def*, rtx_def*, rtx_code, rtx_def*, machine_mode, int, rtx_def*, profile_probability) /src/gcc/gcc/optabs.c:4677 0xdad85d do_compare_rtx_and_jump(rtx_def*, rtx_def*, rtx_code, int, machine_mode, rtx_def*, rtx_code_label*, rtx_code_label*, profile_probability) /src/gcc/gcc/dojump.c:1220 0xdadf0c do_compare_and_jump /src/gcc/gcc/dojump.c:1294 0xdab0d3 do_jump_1 /src/gcc/gcc/dojump.c:253 0xdabaa2 do_jump /src/gcc/gcc/dojump.c:500 0xdacdd7 jumpifnot(tree_node*, rtx_code_label*, profile_probability) /src/gcc/gcc/dojump.c:942 0xeba6f1 expand_expr_real_2(separate_ops*, rtx_def*, machine_mode, expand_modifier) /src/gcc/gcc/expr.c:10219 0xebb99d expand_expr_real_1(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) /src/gcc/gcc/expr.c:10480 0xeb363a expand_expr_real(tree_node*, rtx_def*, machine_mode, expand_modifier, rtx_def**, bool) /src/gcc/gcc/expr.c:8713 0xe9116d expand_expr /src/gcc/gcc/expr.h:301 0xea3ce5 expand_assignment(tree_node*, tree_node*, bool) /src/gcc/gcc/expr.c:5345 0xd01860 expand_gimple_stmt_1 /src/gcc/gcc/cfgexpand.c:3942 0xd01c4f expand_gimple_stmt /src/gcc/gcc/cfgexpand.c:4040 0xd0a9b7 expand_gimple_basic_block /src/gcc/gcc/cfgexpand.c:6082 0xd0ce45 execute /src/gcc/gcc/cfgexpand.c:6808 > if you think that's not better than the negate + shift but only when the > compare will directly produce the desired result then I think you have to > either drop the truth_type_for and manually build the truthtype or check > that the mode of the truth type and the value type match. And you need to > produce > But how do I figure out what the target's format is? Is there a hook/property I can use for this? Thanks, Tamar > (view_convert:ctype (gt:bt @0 { zeros; })) > > then. > > > + > > /* Fold (C1/X)*C2 into (C1*C2)/X. */ (simplify > > (mult (rdiv@3 REAL_CST@0 @1) REAL_CST@2) diff --git > > a/gcc/testsuite/gcc.dg/signbit-2.c b/gcc/testsuite/gcc.dg/signbit-2.c > > new file mode 100644 > > index > > > 0000000000000000000000000000000000000000..fc0157cbc5c7996b481f2998bc > 30 > > 176c96a669bb > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/signbit-2.c > > @@ -0,0 +1,19 @@ > > +/* { dg-do assemble } */ > > +/* { dg-options "-O3 --save-temps -fdump-tree-optimized" } */ > > + > > +#include <stdint.h> > > + > > +void fun1(int32_t *x, int n) > > +{ > > + for (int i = 0; i < (n & -16); i++) > > + x[i] = (-x[i]) >> 31; > > +} > > + > > +void fun2(int32_t *x, int n) > > +{ > > + for (int i = 0; i < (n & -16); i++) > > + x[i] = (-x[i]) >> 30; > > +} > > + > > +/* { dg-final { scan-tree-dump-times {\s+>\s+\{ 0, 0, 0, 0 \}} 1 > > +optimized } } */ > > +/* { dg-final { scan-tree-dump-not {\s+>>\s+31} optimized } } */ > > diff --git a/gcc/testsuite/gcc.dg/signbit-3.c > > b/gcc/testsuite/gcc.dg/signbit-3.c > > new file mode 100644 > > index > > > 0000000000000000000000000000000000000000..19e9c06c349b3287610f817628 > f0 > > 0938ece60bf7 > > --- /dev/null > > +++ b/gcc/testsuite/gcc.dg/signbit-3.c > > @@ -0,0 +1,13 @@ > > +/* { dg-do assemble } */ > > +/* { dg-options "-O1 --save-temps -fdump-tree-optimized" } */ > > + > > +#include <stdint.h> > > + > > +void fun1(int32_t *x, int n) > > +{ > > + for (int i = 0; i < (n & -16); i++) > > + x[i] = (-x[i]) >> 31; > > +} > > + > > +/* { dg-final { scan-tree-dump-times {\s+>\s+0;} 1 optimized } } */ > > +/* { dg-final { scan-tree-dump-not {\s+>>\s+31} optimized } } */ > > diff --git a/gcc/testsuite/gcc.target/aarch64/signbit-1.c > > b/gcc/testsuite/gcc.target/aarch64/signbit-1.c > > new file mode 100644 > > index > > > 0000000000000000000000000000000000000000..3ebfb0586f37de29cf58635b27 > fe > > 48503714447e > > --- /dev/null > > +++ b/gcc/testsuite/gcc.target/aarch64/signbit-1.c > > @@ -0,0 +1,18 @@ > > +/* { dg-do assemble } */ > > +/* { dg-options "-O3 --save-temps" } */ > > + > > +#include <stdint.h> > > + > > +void fun1(int32_t *x, int n) > > +{ > > + for (int i = 0; i < (n & -16); i++) > > + x[i] = (-x[i]) >> 31; > > +} > > + > > +void fun2(int32_t *x, int n) > > +{ > > + for (int i = 0; i < (n & -16); i++) > > + x[i] = (-x[i]) >> 30; > > +} > > + > > +/* { dg-final { scan-assembler-times {\tcmgt\t} 1 } } */ > > > > -- > Richard Biener <rguent...@suse.de> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 > Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)