On Thu, 30 Sep 2021, Tamar Christina wrote:

> > -----Original Message-----
> > From: Richard Biener <rguent...@suse.de>
> > Sent: Thursday, September 30, 2021 7:18 AM
> > To: Tamar Christina <tamar.christ...@arm.com>
> > Cc: gcc-patches@gcc.gnu.org; nd <n...@arm.com>
> > Subject: Re: [PATCH 5/7]middle-end Convert bitclear <imm> + cmp<cc> #0
> > into cm<cc2> <imm2>
> > 
> > On Wed, 29 Sep 2021, Tamar Christina wrote:
> > 
> > > Hi All,
> > >
> > > This optimizes the case where a mask Y which fulfills ~Y + 1 == pow2
> > > is used to clear a some bits and then compared against 0 into one
> > > without the masking and a compare against a different bit immediate.
> > >
> > > We can do this for all unsigned compares and for signed we can do it
> > > for comparisons of EQ and NE:
> > >
> > > (x & (~255)) == 0 becomes x <= 255. Which for leaves it to the target
> > > to optimally deal with the comparison.
> > >
> > > This transformation has to be done in the mid-end because in RTL you
> > > don't have the signs of the comparison operands and if the target
> > > needs an immediate this should be floated outside of the loop.
> > >
> > > The RTL loop invariant hoisting is done before split1.
> > >
> > > i.e.
> > >
> > > void fun1(int32_t *x, int n)
> > > {
> > >     for (int i = 0; i < (n & -16); i++)
> > >       x[i] = (x[i]&(~255)) == 0;
> > > }
> > >
> > > now generates:
> > >
> > > .L3:
> > >         ldr     q0, [x0]
> > >         cmhs    v0.4s, v2.4s, v0.4s
> > >         and     v0.16b, v1.16b, v0.16b
> > >         str     q0, [x0], 16
> > >         cmp     x0, x1
> > >         bne     .L3
> > >
> > > and floats the immediate out of the loop.
> > >
> > > instead of:
> > >
> > > .L3:
> > >         ldr     q0, [x0]
> > >         bic     v0.4s, #255
> > >         cmeq    v0.4s, v0.4s, #0
> > >         and     v0.16b, v1.16b, v0.16b
> > >         str     q0, [x0], 16
> > >         cmp     x0, x1
> > >         bne     .L3
> > >
> > > Bootstrapped Regtested on aarch64-none-linux-gnu, x86_64-pc-linux-gnu
> > > and no issues.
> > >
> > > Ok for master?
> > >
> > > Thanks,
> > > Tamar
> > >
> > > gcc/ChangeLog:
> > >
> > >   * match.pd: New bitmask compare pattern.
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > >   * gcc.dg/bic-bitmask-10.c: New test.
> > >   * gcc.dg/bic-bitmask-11.c: New test.
> > >   * gcc.dg/bic-bitmask-12.c: New test.
> > >   * gcc.dg/bic-bitmask-2.c: New test.
> > >   * gcc.dg/bic-bitmask-3.c: New test.
> > >   * gcc.dg/bic-bitmask-4.c: New test.
> > >   * gcc.dg/bic-bitmask-5.c: New test.
> > >   * gcc.dg/bic-bitmask-6.c: New test.
> > >   * gcc.dg/bic-bitmask-7.c: New test.
> > >   * gcc.dg/bic-bitmask-8.c: New test.
> > >   * gcc.dg/bic-bitmask-9.c: New test.
> > >   * gcc.dg/bic-bitmask.h: New test.
> > >   * gcc.target/aarch64/bic-bitmask-1.c: New test.
> > >
> > > --- inline copy of patch --
> > > diff --git a/gcc/match.pd b/gcc/match.pd index
> > >
> > 0fcfd0ea62c043dc217d0d560ce5b7e569b70e7d..df9212cb27d172856b9d43b08
> > 752
> > > 62f96e8993c4 100644
> > > --- a/gcc/match.pd
> > > +++ b/gcc/match.pd
> > > @@ -4288,6 +4288,56 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >      (if (ic == ncmp)
> > >       (ncmp @0 @1))))))
> > >
> > > +/* Transform comparisons of the form (X & Y) CMP 0 to X CMP2 Z
> > > +   where ~Y + 1 == pow2 and Z = ~Y.  */ (for cmp (simple_comparison)
> > > +(simplify
> > > +  (cmp (bit_and:c @0 VECTOR_CST@1) integer_zerop)
> > 
> > Why not for INTEGER_CST as well?  We do have a related folding (only for
> > INTEGER_CST) that does
> > 
> 
> Because of a slight concern to de-optimize what targets currently generate 
> for the flag setting variants.
> So for example AArch64 generates worse code for foo than it does bar
> 
> int foo (int x)
> {
>     if (x <= 0xFFFF)
>       return 1;
> 
>     return 0;
> }
> 
> int bar (int x)
> {
>     if (x & ~0xFFFF)
>       return 1;
> 
>     return 0;
> }
> 
> Because the flag setting bitmask was optimized more.  I can of course do this 
> and fix
> AArch64 but other targets may have the same issue.  For vectors this was less 
> of a concern since
> there's not flag setting there.
> 
> Do you still want the scalar version?

Yes, the simplification result is simpler and thus more canonical on
GIMPLE.  On x86 we generate

        xorl    %eax, %eax
        cmpl    $65535, %edi
        setle   %al
        ret

vs

        xorl    %eax, %eax
        andl    $-65536, %edi
        setne   %al
        ret

which are equivalent I think (and would be easily be transformed
using a peephole if required).

Richard.

> Thanks,
> Tamar
> 
> > /* A & (2**N - 1) <= 2**K - 1 -> A & (2**N - 2**K) == 0
> >    A & (2**N - 1) >  2**K - 1 -> A & (2**N - 2**K) != 0
> > 
> > which could be extended for integer vectors.  That said, can you please 
> > place
> > the pattern next to the above?
> > 
> > Why does the transform only work for uniform vector constants?  (I see that
> > the implementation becomes simpler, but then you should also handle the
> > INTEGER_CST case at least)
> > 
> > > +   (if (VECTOR_INTEGER_TYPE_P (TREE_TYPE (@1))
> > > + && uniform_vector_p (@1))
> > > +    (with { tree elt = vector_cst_elt (@1, 0); }
> > > +     (switch
> > > +      (if (TYPE_UNSIGNED (TREE_TYPE (@1)) && tree_fits_uhwi_p (elt))
> > 
> > avoid tree_fits_uhwi_p and use wide_int here
> > 
> > > + (with { unsigned HOST_WIDE_INT diff = tree_to_uhwi (elt);
> > > +         tree tdiff = wide_int_to_tree (TREE_TYPE (elt), (~diff) + 1);
> > > +         tree newval = wide_int_to_tree (TREE_TYPE (elt), ~diff);
> > > +         tree newmask = build_uniform_cst (TREE_TYPE (@1),
> > newval); }
> > > +  (if (integer_pow2p (tdiff))
> > 
> > You don't seem to use 'tdiff' so please do this check in wide_int
> > 
> > > +   (switch
> > > +    /* ((mask & x) < 0) -> 0.  */
> > > +    (if (cmp == LT_EXPR)
> > > +     { build_zero_cst (TREE_TYPE (@1)); })
> > > +    /* ((mask & x) <= 0) -> x < mask.  */
> > > +    (if (cmp == LE_EXPR)
> > > +     (lt @0 { newmask; }))
> > > +    /* ((mask & x) == 0) -> x < mask.  */
> > > +    (if (cmp == EQ_EXPR)
> > > +     (le @0 { newmask; }))
> > > +    /* ((mask & x) != 0) -> x > mask.  */
> > > +    (if (cmp == NE_EXPR)
> > > +     (gt @0 { newmask; }))
> > > +    /* ((mask & x) >= 0) -> x <= mask.  */
> > > +    (if (cmp == GE_EXPR)
> > > +     (le @0 { newmask; }))
> > > +     /* ((mask & x) > 0) -> x < mask.  */
> > > +    (if (cmp == GT_EXPR)
> > > +     (lt @0 { newmask; }))))))
> > 
> > you can avoid this switch with a lock-step (for, that maps 'cmp'
> > to the result comparison code (for simplicity you can either keep the 
> > LT_EXPR
> > special-case or transform to an always true condition which will be 
> > simplified).
> > 
> > > +      (if (!TYPE_UNSIGNED (TREE_TYPE (@1)) && tree_fits_shwi_p (elt))
> > > + (with { unsigned HOST_WIDE_INT diff = tree_to_shwi (elt);
> > > +         tree ustype = unsigned_type_for (TREE_TYPE (elt));
> > > +         tree uvtype = unsigned_type_for (TREE_TYPE (@1));
> > > +         tree tdiff = wide_int_to_tree (ustype, (~diff) + 1);
> > > +         tree udiff = wide_int_to_tree (ustype, ~diff);
> > > +         tree cst = build_uniform_cst (uvtype, udiff); }
> > > +  (if (integer_pow2p (tdiff))
> > > +   (switch
> > > +     /* ((mask & x) == 0) -> x < mask.  */
> > > +     (if (cmp == EQ_EXPR)
> > > +      (le (convert:uvtype @0) { cst; }))
> > > +     /* ((mask & x) != 0) -> x > mask.  */
> > > +     (if (cmp == NE_EXPR)
> > > +      (gt (convert:uvtype @0) { cst; })))))))))))
> > > +
> > >  /* Transform comparisons of the form X - Y CMP 0 to X CMP Y.
> > >     ??? The transformation is valid for the other operators if overflow
> > >     is undefined for the type, but performing it here badly interacts
> > > diff --git a/gcc/testsuite/gcc.dg/bic-bitmask-10.c
> > > b/gcc/testsuite/gcc.dg/bic-bitmask-10.c
> > > new file mode 100644
> > > index
> > >
> > 0000000000000000000000000000000000000000..76a22a2313137a2a75dd711c2c
> > 15
> > > c2d3a34e15aa
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/bic-bitmask-10.c
> > > @@ -0,0 +1,26 @@
> > > +/* { dg-do run } */
> > > +/* { dg-options "-O3 -save-temps -fdump-tree-dce" } */
> > > +
> > > +#include <stdint.h>
> > > +
> > > +__attribute__((noinline, noipa))
> > > +void fun1(int32_t *x, int n)
> > > +{
> > > +    for (int i = 0; i < (n & -16); i++)
> > > +      x[i] = (x[i]&(~255)) == 0;
> > > +}
> > > +
> > > +__attribute__((noinline, noipa, optimize("O1"))) void fun2(int32_t
> > > +*x, int n) {
> > > +    for (int i = 0; i < (n & -16); i++)
> > > +      x[i] = (x[i]&(~255)) == 0;
> > > +}
> > > +
> > > +#define TYPE int32_t
> > > +#include "bic-bitmask.h"
> > > +
> > > +/* { dg-final { scan-tree-dump {<=\s*.+\{ 255,.+\}} dce7 } } */
> > > +/* { dg-final { scan-tree-dump-not {&\s*.+\{ 4294967290,.+\}} dce7 }
> > > +} */
> > > +/* { dg-final { scan-tree-dump-not {\s+bic\s+} dce7 { target {
> > > +aarch64*-*-* } } } } */
> > > +
> > > diff --git a/gcc/testsuite/gcc.dg/bic-bitmask-11.c
> > > b/gcc/testsuite/gcc.dg/bic-bitmask-11.c
> > > new file mode 100644
> > > index
> > >
> > 0000000000000000000000000000000000000000..32553d7ba2f823f7a212374519
> > 90
> > > d0a216d2f912
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/bic-bitmask-11.c
> > > @@ -0,0 +1,25 @@
> > > +/* { dg-do run } */
> > > +/* { dg-options "-O3 -save-temps -fdump-tree-dce" } */
> > > +
> > > +#include <stdint.h>
> > > +
> > > +__attribute__((noinline, noipa))
> > > +void fun1(uint32_t *x, int n)
> > > +{
> > > +    for (int i = 0; i < (n & -16); i++)
> > > +      x[i] = (x[i]&(~255)) != 0;
> > > +}
> > > +
> > > +__attribute__((noinline, noipa, optimize("O1"))) void fun2(uint32_t
> > > +*x, int n) {
> > > +    for (int i = 0; i < (n & -16); i++)
> > > +      x[i] = (x[i]&(~255)) != 0;
> > > +}
> > > +
> > > +#include "bic-bitmask.h"
> > > +
> > > +/* { dg-final { scan-tree-dump {>\s*.+\{ 255,.+\}} dce7 } } */
> > > +/* { dg-final { scan-tree-dump-not {&\s*.+\{ 4294967290,.+\}} dce7 }
> > > +} */
> > > +/* { dg-final { scan-tree-dump-not {\s+bic\s+} dce7 { target {
> > > +aarch64*-*-* } } } } */
> > > +
> > > diff --git a/gcc/testsuite/gcc.dg/bic-bitmask-12.c
> > > b/gcc/testsuite/gcc.dg/bic-bitmask-12.c
> > > new file mode 100644
> > > index
> > >
> > 0000000000000000000000000000000000000000..e10cbf7fabe2dbf7ce436cdf37
> > b0
> > > f8b207c58408
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/bic-bitmask-12.c
> > > @@ -0,0 +1,17 @@
> > > +/* { dg-do assemble } */
> > > +/* { dg-options "-O3 -fdump-tree-dce" } */
> > > +
> > > +#include <stdint.h>
> > > +
> > > +typedef unsigned int v4si __attribute__ ((vector_size (16)));
> > > +
> > > +__attribute__((noinline, noipa))
> > > +void fun(v4si *x, int n)
> > > +{
> > > +    for (int i = 0; i < (n & -16); i++)
> > > +      x[i] = (x[i]&(~255)) == 0;
> > > +}
> > > +
> > > +/* { dg-final { scan-tree-dump {<=\s*.+\{ 255,.+\}} dce7 } } */
> > > +/* { dg-final { scan-tree-dump-not {&\s*.+\{ 4294967290,.+\}} dce7 }
> > > +} */
> > > +/* { dg-final { scan-tree-dump-not {\s+bic\s+} dce7 { target {
> > > +aarch64*-*-* } } } } */
> > > diff --git a/gcc/testsuite/gcc.dg/bic-bitmask-2.c
> > > b/gcc/testsuite/gcc.dg/bic-bitmask-2.c
> > > new file mode 100644
> > > index
> > >
> > 0000000000000000000000000000000000000000..da30fad89f6c8239baa4395b3f
> > fa
> > > ec0be577e13f
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/bic-bitmask-2.c
> > > @@ -0,0 +1,25 @@
> > > +/* { dg-do run } */
> > > +/* { dg-options "-O3 -save-temps -fdump-tree-dce" } */
> > > +
> > > +#include <stdint.h>
> > > +
> > > +__attribute__((noinline, noipa))
> > > +void fun1(uint32_t *x, int n)
> > > +{
> > > +    for (int i = 0; i < (n & -16); i++)
> > > +      x[i] = (x[i]&(~255)) == 0;
> > > +}
> > > +
> > > +__attribute__((noinline, noipa, optimize("O1"))) void fun2(uint32_t
> > > +*x, int n) {
> > > +    for (int i = 0; i < (n & -16); i++)
> > > +      x[i] = (x[i]&(~255)) == 0;
> > > +}
> > > +
> > > +#include "bic-bitmask.h"
> > > +
> > > +/* { dg-final { scan-tree-dump-times {<=\s*.+\{ 255,.+\}} 1 dce7 } }
> > > +*/
> > > +/* { dg-final { scan-tree-dump-not {&\s*.+\{ 4294967040,.+\}} dce7 }
> > > +} */
> > > +/* { dg-final { scan-tree-dump-not {\s+bic\s+} dce7 { target {
> > > +aarch64*-*-* } } } } */
> > > +
> > > diff --git a/gcc/testsuite/gcc.dg/bic-bitmask-3.c
> > > b/gcc/testsuite/gcc.dg/bic-bitmask-3.c
> > > new file mode 100644
> > > index
> > >
> > 0000000000000000000000000000000000000000..da30fad89f6c8239baa4395b3f
> > fa
> > > ec0be577e13f
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/bic-bitmask-3.c
> > > @@ -0,0 +1,25 @@
> > > +/* { dg-do run } */
> > > +/* { dg-options "-O3 -save-temps -fdump-tree-dce" } */
> > > +
> > > +#include <stdint.h>
> > > +
> > > +__attribute__((noinline, noipa))
> > > +void fun1(uint32_t *x, int n)
> > > +{
> > > +    for (int i = 0; i < (n & -16); i++)
> > > +      x[i] = (x[i]&(~255)) == 0;
> > > +}
> > > +
> > > +__attribute__((noinline, noipa, optimize("O1"))) void fun2(uint32_t
> > > +*x, int n) {
> > > +    for (int i = 0; i < (n & -16); i++)
> > > +      x[i] = (x[i]&(~255)) == 0;
> > > +}
> > > +
> > > +#include "bic-bitmask.h"
> > > +
> > > +/* { dg-final { scan-tree-dump-times {<=\s*.+\{ 255,.+\}} 1 dce7 } }
> > > +*/
> > > +/* { dg-final { scan-tree-dump-not {&\s*.+\{ 4294967040,.+\}} dce7 }
> > > +} */
> > > +/* { dg-final { scan-tree-dump-not {\s+bic\s+} dce7 { target {
> > > +aarch64*-*-* } } } } */
> > > +
> > > diff --git a/gcc/testsuite/gcc.dg/bic-bitmask-4.c
> > > b/gcc/testsuite/gcc.dg/bic-bitmask-4.c
> > > new file mode 100644
> > > index
> > >
> > 0000000000000000000000000000000000000000..1bcf23ccf1447d6c8c999ed1eb
> > 25
> > > ba0a450028e1
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/bic-bitmask-4.c
> > > @@ -0,0 +1,25 @@
> > > +/* { dg-do run } */
> > > +/* { dg-options "-O3 -save-temps -fdump-tree-dce" } */
> > > +
> > > +#include <stdint.h>
> > > +
> > > +__attribute__((noinline, noipa))
> > > +void fun1(uint32_t *x, int n)
> > > +{
> > > +    for (int i = 0; i < (n & -16); i++)
> > > +      x[i] = (x[i]&(~255)) >= 0;
> > > +}
> > > +
> > > +__attribute__((noinline, noipa, optimize("O1"))) void fun2(uint32_t
> > > +*x, int n) {
> > > +    for (int i = 0; i < (n & -16); i++)
> > > +      x[i] = (x[i]&(~255)) >= 0;
> > > +}
> > > +
> > > +#include "bic-bitmask.h"
> > > +
> > > +/* { dg-final { scan-tree-dump-times {=\s*.+\{ 1,.+\}} 1 dce7 } } */
> > > +/* { dg-final { scan-tree-dump-not {&\s*.+\{ 4294967040,.+\}} dce7 }
> > > +} */
> > > +/* { dg-final { scan-tree-dump-not {\s+bic\s+} dce7 { target {
> > > +aarch64*-*-* } } } } */
> > > +
> > > diff --git a/gcc/testsuite/gcc.dg/bic-bitmask-5.c
> > > b/gcc/testsuite/gcc.dg/bic-bitmask-5.c
> > > new file mode 100644
> > > index
> > >
> > 0000000000000000000000000000000000000000..6e5a2fca9992efbc01f8dbbc6f
> > 95
> > > 936e86643028
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/bic-bitmask-5.c
> > > @@ -0,0 +1,25 @@
> > > +/* { dg-do run } */
> > > +/* { dg-options "-O3 -save-temps -fdump-tree-dce" } */
> > > +
> > > +#include <stdint.h>
> > > +
> > > +__attribute__((noinline, noipa))
> > > +void fun1(uint32_t *x, int n)
> > > +{
> > > +    for (int i = 0; i < (n & -16); i++)
> > > +      x[i] = (x[i]&(~255)) > 0;
> > > +}
> > > +
> > > +__attribute__((noinline, noipa, optimize("O1"))) void fun2(uint32_t
> > > +*x, int n) {
> > > +    for (int i = 0; i < (n & -16); i++)
> > > +      x[i] = (x[i]&(~255)) > 0;
> > > +}
> > > +
> > > +#include "bic-bitmask.h"
> > > +
> > > +/* { dg-final { scan-tree-dump-times {>\s*.+\{ 255,.+\}} 1 dce7 } }
> > > +*/
> > > +/* { dg-final { scan-tree-dump-not {&`s*.+\{ 4294967040,.+\}} dce7 }
> > > +} */
> > > +/* { dg-final { scan-tree-dump-not {\s+bic\s+} dce7 { target {
> > > +aarch64*-*-* } } } } */
> > > +
> > > diff --git a/gcc/testsuite/gcc.dg/bic-bitmask-6.c
> > > b/gcc/testsuite/gcc.dg/bic-bitmask-6.c
> > > new file mode 100644
> > > index
> > >
> > 0000000000000000000000000000000000000000..018e7a4348c9fc461106c3d9d0
> > 12
> > > 91325d3406c2
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/bic-bitmask-6.c
> > > @@ -0,0 +1,25 @@
> > > +/* { dg-do run } */
> > > +/* { dg-options "-O3 -save-temps -fdump-tree-dce" } */
> > > +
> > > +#include <stdint.h>
> > > +
> > > +__attribute__((noinline, noipa))
> > > +void fun1(uint32_t *x, int n)
> > > +{
> > > +    for (int i = 0; i < (n & -16); i++)
> > > +      x[i] = (x[i]&(~255)) <= 0;
> > > +}
> > > +
> > > +__attribute__((noinline, noipa, optimize("O1"))) void fun2(uint32_t
> > > +*x, int n) {
> > > +    for (int i = 0; i < (n & -16); i++)
> > > +      x[i] = (x[i]&(~255)) <= 0;
> > > +}
> > > +
> > > +#include "bic-bitmask.h"
> > > +
> > > +/* { dg-final { scan-tree-dump-times {<=\s*.+\{ 255,.+\}} 1 dce7 } }
> > > +*/
> > > +/* { dg-final { scan-tree-dump-not {&\s*.+\{ 4294967040,.+\}} dce7 }
> > > +} */
> > > +/* { dg-final { scan-tree-dump-not {\s+bic\s+} dce7 { target {
> > > +aarch64*-*-* } } } } */
> > > +
> > > diff --git a/gcc/testsuite/gcc.dg/bic-bitmask-7.c
> > > b/gcc/testsuite/gcc.dg/bic-bitmask-7.c
> > > new file mode 100644
> > > index
> > >
> > 0000000000000000000000000000000000000000..798678fb7555052c93abc4ca34
> > f6
> > > 17d640f73bb4
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/bic-bitmask-7.c
> > > @@ -0,0 +1,24 @@
> > > +/* { dg-do run } */
> > > +/* { dg-options "-O3 -save-temps -fdump-tree-dce" } */
> > > +
> > > +#include <stdint.h>
> > > +
> > > +__attribute__((noinline, noipa))
> > > +void fun1(uint32_t *x, int n)
> > > +{
> > > +    for (int i = 0; i < (n & -16); i++)
> > > +      x[i] = (x[i]&(~1)) < 0;
> > > +}
> > > +
> > > +__attribute__((noinline, noipa, optimize("O1"))) void fun2(uint32_t
> > > +*x, int n) {
> > > +    for (int i = 0; i < (n & -16); i++)
> > > +      x[i] = (x[i]&(~1)) < 0;
> > > +}
> > > +
> > > +#include "bic-bitmask.h"
> > > +
> > > +/* { dg-final { scan-tree-dump-times {__builtin_memset} 1 dce7 } } */
> > > +/* { dg-final { scan-tree-dump-not {\s+bic\s+} dce7 { target {
> > > +aarch64*-*-* } } } } */
> > > +
> > > diff --git a/gcc/testsuite/gcc.dg/bic-bitmask-8.c
> > > b/gcc/testsuite/gcc.dg/bic-bitmask-8.c
> > > new file mode 100644
> > > index
> > >
> > 0000000000000000000000000000000000000000..1dabe834ed57dfa0be48c1dc3
> > dbb
> > > 226092c79a1a
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/bic-bitmask-8.c
> > > @@ -0,0 +1,25 @@
> > > +/* { dg-do run } */
> > > +/* { dg-options "-O3 -save-temps -fdump-tree-dce" } */
> > > +
> > > +#include <stdint.h>
> > > +
> > > +__attribute__((noinline, noipa))
> > > +void fun1(uint32_t *x, int n)
> > > +{
> > > +    for (int i = 0; i < (n & -16); i++)
> > > +      x[i] = (x[i]&(~1)) != 0;
> > > +}
> > > +
> > > +__attribute__((noinline, noipa, optimize("O1"))) void fun2(uint32_t
> > > +*x, int n) {
> > > +    for (int i = 0; i < (n & -16); i++)
> > > +      x[i] = (x[i]&(~1)) != 0;
> > > +}
> > > +
> > > +#include "bic-bitmask.h"
> > > +
> > > +/* { dg-final { scan-tree-dump-times {>\s*.+\{ 1,.+\}} 1 dce7 } } */
> > > +/* { dg-final { scan-tree-dump-not {&\s*.+\{ 4294967294,.+\}} dce7 }
> > > +} */
> > > +/* { dg-final { scan-tree-dump-not {\s+bic\s+} dce7 { target {
> > > +aarch64*-*-* } } } } */
> > > +
> > > diff --git a/gcc/testsuite/gcc.dg/bic-bitmask-9.c
> > > b/gcc/testsuite/gcc.dg/bic-bitmask-9.c
> > > new file mode 100644
> > > index
> > >
> > 0000000000000000000000000000000000000000..9c1f8ee0adfc45d1b9fc212138
> > ea
> > > 26bb6b693e49
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/bic-bitmask-9.c
> > > @@ -0,0 +1,25 @@
> > > +/* { dg-do run } */
> > > +/* { dg-options "-O3 -save-temps -fdump-tree-dce" } */
> > > +
> > > +#include <stdint.h>
> > > +
> > > +__attribute__((noinline, noipa))
> > > +void fun1(uint32_t *x, int n)
> > > +{
> > > +    for (int i = 0; i < (n & -16); i++)
> > > +      x[i] = (x[i]&(~5)) == 0;
> > > +}
> > > +
> > > +__attribute__((noinline, noipa, optimize("O1"))) void fun2(uint32_t
> > > +*x, int n) {
> > > +    for (int i = 0; i < (n & -16); i++)
> > > +      x[i] = (x[i]&(~5)) == 0;
> > > +}
> > > +
> > > +#include "bic-bitmask.h"
> > > +
> > > +/* { dg-final { scan-tree-dump-not {<=\s*.+\{ 4294967289,.+\}} dce7 }
> > > +} */
> > > +/* { dg-final { scan-tree-dump {&\s*.+\{ 4294967290,.+\}} dce7 } } */
> > > +/* { dg-final { scan-tree-dump-not {\s+bic\s+} dce7 { target {
> > > +aarch64*-*-* } } } } */
> > > +
> > > diff --git a/gcc/testsuite/gcc.dg/bic-bitmask.h
> > > b/gcc/testsuite/gcc.dg/bic-bitmask.h
> > > new file mode 100644
> > > index
> > >
> > 0000000000000000000000000000000000000000..2b94065c025e0cbf71a21ac9b9
> > d6
> > > 314e24b0c2d9
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.dg/bic-bitmask.h
> > > @@ -0,0 +1,43 @@
> > > +#include <stdio.h>
> > > +
> > > +#ifndef N
> > > +#define N 50
> > > +#endif
> > > +
> > > +#ifndef TYPE
> > > +#define TYPE uint32_t
> > > +#endif
> > > +
> > > +#ifndef DEBUG
> > > +#define DEBUG 0
> > > +#endif
> > > +
> > > +#define BASE ((TYPE) -1 < 0 ? -126 : 4)
> > > +
> > > +int main ()
> > > +{
> > > +  TYPE a[N];
> > > +  TYPE b[N];
> > > +
> > > +  for (int i = 0; i < N; ++i)
> > > +    {
> > > +      a[i] = BASE + i * 13;
> > > +      b[i] = BASE + i * 13;
> > > +      if (DEBUG)
> > > +        printf ("%d: 0x%x\n", i, a[i]);
> > > +    }
> > > +
> > > +  fun1 (a, N);
> > > +  fun2 (b, N);
> > > +
> > > +  for (int i = 0; i < N; ++i)
> > > +    {
> > > +      if (DEBUG)
> > > +        printf ("%d = 0x%x == 0x%x\n", i, a[i], b[i]);
> > > +
> > > +      if (a[i] != b[i])
> > > +        __builtin_abort ();
> > > +    }
> > > +  return 0;
> > > +}
> > > +
> > > diff --git a/gcc/testsuite/gcc.target/aarch64/bic-bitmask-1.c
> > > b/gcc/testsuite/gcc.target/aarch64/bic-bitmask-1.c
> > > new file mode 100644
> > > index
> > >
> > 0000000000000000000000000000000000000000..568c1ffc8bc4148efaeeba7a45
> > a7
> > > 5ecbd3a7a3dd
> > > --- /dev/null
> > > +++ b/gcc/testsuite/gcc.target/aarch64/bic-bitmask-1.c
> > > @@ -0,0 +1,13 @@
> > > +/* { dg-do assemble } */
> > > +/* { dg-options "-O2 -save-temps" } */
> > > +
> > > +#include <arm_neon.h>
> > > +
> > > +uint32x4_t foo (int32x4_t a)
> > > +{
> > > +  int32x4_t cst = vdupq_n_s32 (255);
> > > +  int32x4_t zero = vdupq_n_s32 (0);
> > > +  return vceqq_s32 (vbicq_s32 (a, cst), zero); }
> > > +
> > > +/* { dg-final { scan-assembler-not {\tbic\t} { xfail { aarch64*-*-* }
> > > +} } } */
> > >
> > >
> > >
> > 
> > --
> > Richard Biener <rguent...@suse.de>
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> 

-- 
Richard Biener <rguent...@suse.de>
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Reply via email to