> -----Original Message----- > From: Richard Earnshaw <richard.earns...@foss.arm.com> > Sent: Tuesday, October 5, 2021 2:52 PM > To: Tamar Christina <tamar.christ...@arm.com>; gcc-patches@gcc.gnu.org > Cc: nd <n...@arm.com>; rguent...@suse.de > Subject: Re: [PATCH]middle-end convert negate + right shift into compare > greater. > > > > On 05/10/2021 14:49, Tamar Christina wrote: > >> -----Original Message----- > >> From: Richard Earnshaw <richard.earns...@foss.arm.com> > >> Sent: Tuesday, October 5, 2021 2:34 PM > >> To: Tamar Christina <tamar.christ...@arm.com>; > >> gcc-patches@gcc.gnu.org > >> Cc: nd <n...@arm.com>; rguent...@suse.de > >> Subject: Re: [PATCH]middle-end convert negate + right shift into > >> compare greater. > >> > >> > >> > >> On 05/10/2021 14:30, Tamar Christina wrote: > >>> > >>> > >>>> -----Original Message----- > >>>> From: Richard Earnshaw <richard.earns...@foss.arm.com> > >>>> Sent: Tuesday, October 5, 2021 1:56 PM > >>>> To: Tamar Christina <tamar.christ...@arm.com>; > >>>> gcc-patches@gcc.gnu.org > >>>> Cc: nd <n...@arm.com>; rguent...@suse.de > >>>> Subject: Re: [PATCH]middle-end convert negate + right shift into > >>>> compare greater. > >>>> > >>>> > >>>> > >>>> On 05/10/2021 13:50, Tamar Christina via Gcc-patches wrote: > >>>>> Hi All, > >>>>> > >>>>> This turns an inversion of the sign bit + arithmetic right shift > >>>>> into a comparison with 0. > >>>>> > >>>>> i.e. > >>>>> > >>>>> void fun1(int32_t *x, int n) > >>>>> { > >>>>> for (int i = 0; i < (n & -16); i++) > >>>>> x[i] = (-x[i]) >> 31; > >>>>> } > >>>>> > >>>> Notwithstanding that I think shifting a negative value right is > >>>> unspecified behaviour, I don't think this generates the same result > >>>> when x[i] is INT_MIN either, although negating that is also > >>>> unspecified since it can't be represented in an int. > >>>> > >>> > >>> You're right that they are implementation defined, but I think most > >>> ISAs do have a sane Implementation of these two cases. At least both > >>> x86 and AArch64 just replicate the signbit and for negate do two > >> complement negation. So INT_MIN works as expected and results in 0. > >> > >> Which is not what the original code produces if you have wrapping > >> ints, because -INT_MIN is INT_MIN, and thus still negative. > >> > > > > True, but then you have a signed overflow which is undefined behaviour > > and not implementation defined > > > > " If an exceptional condition occurs during the evaluation of an expression > (that is, if the result is not mathematically defined or not in the range of > representable values for its type), the behavior is undefined." > > > > So it should still be acceptable to do in this case. > > -fwrapv
If I understand correctly, you're happy with this is I guard it on ! flag_wrapv ? Regards, Tamar > > R. > > > > >> R. > >> > >>> > >>> But I'm happy to guard this behind some sort of target guard. > >>> > >>> Regards, > >>> Tamar > >>> > >>>> R. > >>>> > >>>>> now generates: > >>>>> > >>>>> .L3: > >>>>> ldr q0, [x0] > >>>>> cmgt v0.4s, v0.4s, #0 > >>>>> str q0, [x0], 16 > >>>>> cmp x0, x1 > >>>>> bne .L3 > >>>>> > >>>>> instead of: > >>>>> > >>>>> .L3: > >>>>> ldr q0, [x0] > >>>>> neg v0.4s, v0.4s > >>>>> sshr v0.4s, v0.4s, 31 > >>>>> str q0, [x0], 16 > >>>>> cmp x0, x1 > >>>>> bne .L3 > >>>>> > >>>>> Bootstrapped Regtested on aarch64-none-linux-gnu, > >>>>> x86_64-pc-linux-gnu and no regressions. > >>>>> > >>>>> Ok for master? > >>>>> > >>>>> Thanks, > >>>>> Tamar > >>>>> > >>>>> gcc/ChangeLog: > >>>>> > >>>>> * match.pd: New negate+shift pattern. > >>>>> > >>>>> gcc/testsuite/ChangeLog: > >>>>> > >>>>> * gcc.dg/signbit-2.c: New test. > >>>>> * gcc.dg/signbit-3.c: New test. > >>>>> * gcc.target/aarch64/signbit-1.c: New test. > >>>>> > >>>>> --- inline copy of patch -- > >>>>> diff --git a/gcc/match.pd b/gcc/match.pd index > >>>> > >> > 7d2a24dbc5e9644a09968f877e12a824d8ba1caa..581436fe36dbacdcb0c2720b7 > >>>> 190c96d14398143 100644 > >>>>> --- a/gcc/match.pd > >>>>> +++ b/gcc/match.pd > >>>>> @@ -826,6 +826,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > >>>>> { tree utype = unsigned_type_for (type); } > >>>>> (convert (rshift (lshift (convert:utype @0) @2) @3)))))) > >>>>> > >>>>> +/* Fold (-x >> C) into x > 0 where C = precision(type) - 1. */ > >>>>> +(for cst (INTEGER_CST VECTOR_CST) (simplify > >>>>> + (rshift (negate:s @0) cst@1) > >>>>> + (with { tree ctype = TREE_TYPE (@0); > >>>>> + tree stype = TREE_TYPE (@1); > >>>>> + tree bt = truth_type_for (ctype); } > >>>>> + (switch > >>>>> + /* Handle scalar case. */ > >>>>> + (if (INTEGRAL_TYPE_P (ctype) > >>>>> + && !VECTOR_TYPE_P (ctype) > >>>>> + && !TYPE_UNSIGNED (ctype) > >>>>> + && canonicalize_math_after_vectorization_p () > >>>>> + && wi::eq_p (wi::to_wide (@1), TYPE_PRECISION (stype) - > 1)) > >>>>> + (convert:bt (gt:bt @0 { build_zero_cst (stype); }))) > >>>>> + /* Handle vector case with a scalar immediate. */ > >>>>> + (if (VECTOR_INTEGER_TYPE_P (ctype) > >>>>> + && !VECTOR_TYPE_P (stype) > >>>>> + && !TYPE_UNSIGNED (ctype) > >>>>> + && wi::eq_p (wi::to_wide (@1), TYPE_PRECISION (stype) - 1)) > >>>>> + (convert:bt (gt:bt @0 { build_zero_cst (ctype); }))) > >>>>> + /* Handle vector case with a vector immediate. */ > >>>>> + (if (VECTOR_INTEGER_TYPE_P (ctype) > >>>>> + && VECTOR_TYPE_P (stype) > >>>>> + && !TYPE_UNSIGNED (ctype) > >>>>> + && uniform_vector_p (@1)) > >>>>> + (with { tree cst = vector_cst_elt (@1, 0); > >>>>> + tree t = TREE_TYPE (cst); } > >>>>> + (if (wi::eq_p (wi::to_wide (cst), TYPE_PRECISION (t) - 1)) > >>>>> + (convert:bt (gt:bt @0 { build_zero_cst (ctype); > >>>>> +}))))))))) > >>>>> + > >>>>> /* Fold (C1/X)*C2 into (C1*C2)/X. */ > >>>>> (simplify > >>>>> (mult (rdiv@3 REAL_CST@0 @1) REAL_CST@2) diff --git > >>>>> a/gcc/testsuite/gcc.dg/signbit-2.c b/gcc/testsuite/gcc.dg/signbit- > >>>> 2.c > >>>>> new file mode 100644 > >>>>> index > >>>> > >> > 0000000000000000000000000000000000000000..fc0157cbc5c7996b481f2998bc > >>>> 30176c96a669bb > >>>>> --- /dev/null > >>>>> +++ b/gcc/testsuite/gcc.dg/signbit-2.c > >>>>> @@ -0,0 +1,19 @@ > >>>>> +/* { dg-do assemble } */ > >>>>> +/* { dg-options "-O3 --save-temps -fdump-tree-optimized" } */ > >>>>> + > >>>>> +#include <stdint.h> > >>>>> + > >>>>> +void fun1(int32_t *x, int n) > >>>>> +{ > >>>>> + for (int i = 0; i < (n & -16); i++) > >>>>> + x[i] = (-x[i]) >> 31; > >>>>> +} > >>>>> + > >>>>> +void fun2(int32_t *x, int n) > >>>>> +{ > >>>>> + for (int i = 0; i < (n & -16); i++) > >>>>> + x[i] = (-x[i]) >> 30; > >>>>> +} > >>>>> + > >>>>> +/* { dg-final { scan-tree-dump-times {\s+>\s+\{ 0, 0, 0, 0 \}} 1 > >>>>> +optimized } } > >>>> */ > >>>>> +/* { dg-final { scan-tree-dump-not {\s+>>\s+31} optimized } } */ > >>>>> diff --git a/gcc/testsuite/gcc.dg/signbit-3.c > >>>>> b/gcc/testsuite/gcc.dg/signbit- > >>>> 3.c > >>>>> new file mode 100644 > >>>>> index > >>>> > >> > 0000000000000000000000000000000000000000..19e9c06c349b3287610f817628 > >>>> f00938ece60bf7 > >>>>> --- /dev/null > >>>>> +++ b/gcc/testsuite/gcc.dg/signbit-3.c > >>>>> @@ -0,0 +1,13 @@ > >>>>> +/* { dg-do assemble } */ > >>>>> +/* { dg-options "-O1 --save-temps -fdump-tree-optimized" } */ > >>>>> + > >>>>> +#include <stdint.h> > >>>>> + > >>>>> +void fun1(int32_t *x, int n) > >>>>> +{ > >>>>> + for (int i = 0; i < (n & -16); i++) > >>>>> + x[i] = (-x[i]) >> 31; > >>>>> +} > >>>>> + > >>>>> +/* { dg-final { scan-tree-dump-times {\s+>\s+0;} 1 optimized } } > >>>>> +*/ > >>>>> +/* { dg-final { scan-tree-dump-not {\s+>>\s+31} optimized } } */ > >>>>> diff --git a/gcc/testsuite/gcc.target/aarch64/signbit-1.c > >>>> b/gcc/testsuite/gcc.target/aarch64/signbit-1.c > >>>>> new file mode 100644 > >>>>> index > >>>> > >> > 0000000000000000000000000000000000000000..3ebfb0586f37de29cf58635b27 > >>>> fe48503714447e > >>>>> --- /dev/null > >>>>> +++ b/gcc/testsuite/gcc.target/aarch64/signbit-1.c > >>>>> @@ -0,0 +1,18 @@ > >>>>> +/* { dg-do assemble } */ > >>>>> +/* { dg-options "-O3 --save-temps" } */ > >>>>> + > >>>>> +#include <stdint.h> > >>>>> + > >>>>> +void fun1(int32_t *x, int n) > >>>>> +{ > >>>>> + for (int i = 0; i < (n & -16); i++) > >>>>> + x[i] = (-x[i]) >> 31; > >>>>> +} > >>>>> + > >>>>> +void fun2(int32_t *x, int n) > >>>>> +{ > >>>>> + for (int i = 0; i < (n & -16); i++) > >>>>> + x[i] = (-x[i]) >> 30; > >>>>> +} > >>>>> + > >>>>> +/* { dg-final { scan-assembler-times {\tcmgt\t} 1 } } */ > >>>>> > >>>>>