On Wed, May 18, 2022 at 4:45 AM Hongtao Liu <crazy...@gmail.com> wrote: > > On Fri, May 13, 2022 at 7:16 PM Richard Biener > <richard.guent...@gmail.com> wrote: > > > > On Fri, May 13, 2022 at 5:37 AM Hongtao Liu <crazy...@gmail.com> wrote: > > > > > > On Wed, May 11, 2022 at 4:45 PM Richard Biener via Gcc-patches > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > On Mon, May 9, 2022 at 7:19 AM liuhongt <hongtao....@intel.com> wrote: > > > > > > > > > > This patch will enable below optimization: > > > > > > > > > > { > > > > > - int bit; > > > > > - long long unsigned int _1; > > > > > - long long unsigned int _2; > > > > > - > > > > > <bb 2> [local count: 46707768]: > > > > > - > > > > > - <bb 3> [local count: 1027034057]: > > > > > - # tmp_11 = PHI <tmp_8(3), tmp_6(D)(2)> > > > > > - # bit_13 = PHI <bit_9(3), 63(2)> > > > > > - _1 = 1 << bit_13; > > > > > - _2 = ~_1; > > > > > - tmp_8 = _2 & tmp_11; > > > > > - bit_9 = bit_13 + -3; > > > > > - if (bit_9 != -3(OVF)) > > > > > - goto <bb 3>; [95.65%] > > > > > - else > > > > > - goto <bb 4>; [4.35%] > > > > > - > > > > > - <bb 4> [local count: 46707768]: > > > > > - return tmp_8; > > > > > + tmp_12 = tmp_6(D) & 7905747460161236406; > > > > > + return tmp_12; > > > > > > > > > > } > > > > > > > > > > > > > > > Boostrapped and regtested on x86_64-pc-linux-gnu{-m32,} > > > > > Ok for trunk? > > > > > > > > > > gcc/ChangeLog: > > > > > > > > > > PR middle-end/103462 > > > > > * match.pd (bitwise_induction_p): New match. > > > > > * tree-scalar-evolution.c (gimple_bitwise_induction_p): > > > > > Declare. > > > > > (analyze_and_compute_bitwise_induction_effect): New function. > > > > > (enum bit_op_kind): New enum. > > > > > (final_value_replacement_loop): Enhanced to handle bitwise > > > > > induction. > > > > > > > > > > gcc/testsuite/ChangeLog: > > > > > > > > > > * gcc.target/i386/pr103462-1.c: New test. > > > > > * gcc.target/i386/pr103462-2.c: New test. > > > > > * gcc.target/i386/pr103462-3.c: New test. > > > > > * gcc.target/i386/pr103462-4.c: New test. > > > > > * gcc.target/i386/pr103462-5.c: New test. > > > > > * gcc.target/i386/pr103462-6.c: New test. > > > > > --- > > > > > gcc/match.pd | 7 + > > > > > gcc/testsuite/gcc.target/i386/pr103462-1.c | 111 +++++++++++++ > > > > > gcc/testsuite/gcc.target/i386/pr103462-2.c | 45 ++++++ > > > > > gcc/testsuite/gcc.target/i386/pr103462-3.c | 111 +++++++++++++ > > > > > gcc/testsuite/gcc.target/i386/pr103462-4.c | 46 ++++++ > > > > > gcc/testsuite/gcc.target/i386/pr103462-5.c | 111 +++++++++++++ > > > > > gcc/testsuite/gcc.target/i386/pr103462-6.c | 46 ++++++ > > > > > gcc/tree-scalar-evolution.cc | 178 > > > > > ++++++++++++++++++++- > > > > > 8 files changed, 654 insertions(+), 1 deletion(-) > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr103462-1.c > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr103462-2.c > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr103462-3.c > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr103462-4.c > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr103462-5.c > > > > > create mode 100644 gcc/testsuite/gcc.target/i386/pr103462-6.c > > > > > > > > > > diff --git a/gcc/match.pd b/gcc/match.pd > > > > > index 6d691d302b3..24ff5f9e6a8 100644 > > > > > --- a/gcc/match.pd > > > > > +++ b/gcc/match.pd > > > > > @@ -7746,3 +7746,10 @@ and, > > > > > == TYPE_UNSIGNED (TREE_TYPE (@3)))) > > > > > && single_use (@4) > > > > > && single_use (@5)))) > > > > > + > > > > > +(for bit_op (bit_and bit_ior bit_xor) > > > > > + (match (bitwise_induction_p @0 @2 @3) > > > > > + (bit_op:c (nop_convert1? (bit_not2?@0 (convert3? (lshift > > > > > integer_onep@1 @2)))) @3))) > > > > > + > > > > > +(match (bitwise_induction_p @0 @2 @3) > > > > > + (bit_not (nop_convert1? (bit_xor@0 (convert2? (lshift > > > > > integer_onep@1 @2)) @3)))) > > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr103462-1.c > > > > > b/gcc/testsuite/gcc.target/i386/pr103462-1.c > > > > > new file mode 100644 > > > > > index 00000000000..1dc4c2acad6 > > > > > --- /dev/null > > > > > +++ b/gcc/testsuite/gcc.target/i386/pr103462-1.c > > > > > @@ -0,0 +1,111 @@ > > > > > +/* { dg-do compile } */ > > > > > +/* { dg-options "-O1 -fdump-tree-sccp-details" } */ > > > > > +/* { dg-final { scan-tree-dump-times {final value replacement} 12 > > > > > "sccp" } } */ > > > > > + > > > > > +unsigned long long > > > > > +__attribute__((noipa)) > > > > > +foo (unsigned long long tmp) > > > > > +{ > > > > > + for (int bit = 0; bit < 64; bit += 3) > > > > > + tmp &= ~(1ULL << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned long long > > > > > +__attribute__((noipa)) > > > > > +foo1 (unsigned long long tmp) > > > > > +{ > > > > > + for (int bit = 63; bit >= 0; bit -= 3) > > > > > + tmp &= ~(1ULL << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned long long > > > > > +__attribute__((noipa)) > > > > > +foo2 (unsigned long long tmp) > > > > > +{ > > > > > + for (int bit = 0; bit < 64; bit += 3) > > > > > + tmp &= (1ULL << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned long long > > > > > +__attribute__((noipa)) > > > > > +foo3 (unsigned long long tmp) > > > > > +{ > > > > > + for (int bit = 63; bit >= 0; bit -= 3) > > > > > + tmp &= (1ULL << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned long long > > > > > +__attribute__((noipa)) > > > > > +foo4 (unsigned long long tmp) > > > > > +{ > > > > > + for (int bit = 0; bit < 64; bit += 3) > > > > > + tmp |= ~(1ULL << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned long long > > > > > +__attribute__((noipa)) > > > > > +foo5 (unsigned long long tmp) > > > > > +{ > > > > > + for (int bit = 63; bit >= 0; bit -= 3) > > > > > + tmp |= ~(1ULL << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned long long > > > > > +__attribute__((noipa)) > > > > > +foo6 (unsigned long long tmp) > > > > > +{ > > > > > + for (int bit = 0; bit < 64; bit += 3) > > > > > + tmp |= (1ULL << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned long long > > > > > +__attribute__((noipa)) > > > > > +foo7 (unsigned long long tmp) > > > > > +{ > > > > > + for (int bit = 63; bit >= 0; bit -= 3) > > > > > + tmp |= (1ULL << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned long long > > > > > +__attribute__((noipa)) > > > > > +foo8 (unsigned long long tmp) > > > > > +{ > > > > > + for (int bit = 0; bit < 64; bit += 3) > > > > > + tmp ^= ~(1ULL << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned long long > > > > > +__attribute__((noipa)) > > > > > +foo9 (unsigned long long tmp) > > > > > +{ > > > > > + for (int bit = 63; bit >= 0; bit -= 3) > > > > > + tmp ^= ~(1ULL << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned long long > > > > > +__attribute__((noipa)) > > > > > +foo10 (unsigned long long tmp) > > > > > +{ > > > > > + for (int bit = 0; bit < 64; bit += 3) > > > > > + tmp ^= (1ULL << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned long long > > > > > +__attribute__((noipa)) > > > > > +foo11 (unsigned long long tmp) > > > > > +{ > > > > > + for (int bit = 63; bit >= 0; bit -= 3) > > > > > + tmp ^= (1ULL << bit); > > > > > + return tmp; > > > > > +} > > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr103462-2.c > > > > > b/gcc/testsuite/gcc.target/i386/pr103462-2.c > > > > > new file mode 100644 > > > > > index 00000000000..bc375cb78d4 > > > > > --- /dev/null > > > > > +++ b/gcc/testsuite/gcc.target/i386/pr103462-2.c > > > > > @@ -0,0 +1,45 @@ > > > > > +/* { dg-do run } */ > > > > > +/* { dg-options "-O1" } */ > > > > > + > > > > > +#include "pr103462-1.c" > > > > > + > > > > > +int main() > > > > > +{ > > > > > + unsigned long long tmp = 0x1111111111111111ULL; > > > > > + if (foo (tmp) != 0x110110110110110ULL) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo1 (tmp) != 0x110110110110110ULL) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo2 (tmp) != 0x0ULL) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo3 (tmp) != 0x0ULL) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo4 (tmp) != 0xffffffffffffffffULL) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo5 (tmp) != 0xffffffffffffffffULL) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo6 (tmp) != 0x9359359359359359ULL) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo7 (tmp) != 0x9359359359359359ULL) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo8 (tmp) != 0x8358358358358358ULL) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo9 (tmp) != 0x8358358358358358ULL) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo10 (tmp) != 0x8358358358358358ULL) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo11 (tmp) != 0x8358358358358358ULL) > > > > > + __builtin_abort (); > > > > > +} > > > > > + > > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr103462-3.c > > > > > b/gcc/testsuite/gcc.target/i386/pr103462-3.c > > > > > new file mode 100644 > > > > > index 00000000000..4ba248a4872 > > > > > --- /dev/null > > > > > +++ b/gcc/testsuite/gcc.target/i386/pr103462-3.c > > > > > @@ -0,0 +1,111 @@ > > > > > +/* { dg-do compile } */ > > > > > +/* { dg-options "-O1 -fdump-tree-sccp-details" } */ > > > > > +/* { dg-final { scan-tree-dump-times {final value replacement} 12 > > > > > "sccp" } } */ > > > > > + > > > > > +unsigned int > > > > > +__attribute__((noipa)) > > > > > +foo (unsigned int tmp) > > > > > +{ > > > > > + for (int bit = 0; bit < 32; bit += 3) > > > > > + tmp &= ~(1U << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned int > > > > > +__attribute__((noipa)) > > > > > +foo1 (unsigned int tmp) > > > > > +{ > > > > > + for (int bit = 31; bit >= 0; bit -= 3) > > > > > + tmp &= ~(1U << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned int > > > > > +__attribute__((noipa)) > > > > > +foo2 (unsigned int tmp) > > > > > +{ > > > > > + for (int bit = 0; bit < 32; bit += 3) > > > > > + tmp &= (1U << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned int > > > > > +__attribute__((noipa)) > > > > > +foo3 (unsigned int tmp) > > > > > +{ > > > > > + for (int bit = 31; bit >= 0; bit -= 3) > > > > > + tmp &= (1U << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned int > > > > > +__attribute__((noipa)) > > > > > +foo4 (unsigned int tmp) > > > > > +{ > > > > > + for (int bit = 0; bit < 32; bit += 3) > > > > > + tmp |= ~(1U << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned int > > > > > +__attribute__((noipa)) > > > > > +foo5 (unsigned int tmp) > > > > > +{ > > > > > + for (int bit = 31; bit >= 0; bit -= 3) > > > > > + tmp |= ~(1U << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned int > > > > > +__attribute__((noipa)) > > > > > +foo6 (unsigned int tmp) > > > > > +{ > > > > > + for (int bit = 0; bit < 32; bit += 3) > > > > > + tmp |= (1U << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned int > > > > > +__attribute__((noipa)) > > > > > +foo7 (unsigned int tmp) > > > > > +{ > > > > > + for (int bit = 31; bit >= 0; bit -= 3) > > > > > + tmp |= (1U << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned int > > > > > +__attribute__((noipa)) > > > > > +foo8 (unsigned int tmp) > > > > > +{ > > > > > + for (int bit = 0; bit < 32; bit += 3) > > > > > + tmp ^= ~(1U << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned int > > > > > +__attribute__((noipa)) > > > > > +foo9 (unsigned int tmp) > > > > > +{ > > > > > + for (int bit = 31; bit >= 0; bit -= 3) > > > > > + tmp ^= ~(1U << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned int > > > > > +__attribute__((noipa)) > > > > > +foo10 (unsigned int tmp) > > > > > +{ > > > > > + for (int bit = 0; bit < 32; bit += 3) > > > > > + tmp ^= (1U << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned int > > > > > +__attribute__((noipa)) > > > > > +foo11 (unsigned int tmp) > > > > > +{ > > > > > + for (int bit = 31; bit >= 0; bit -= 3) > > > > > + tmp ^= (1U << bit); > > > > > + return tmp; > > > > > +} > > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr103462-4.c > > > > > b/gcc/testsuite/gcc.target/i386/pr103462-4.c > > > > > new file mode 100644 > > > > > index 00000000000..e2f4056f525 > > > > > --- /dev/null > > > > > +++ b/gcc/testsuite/gcc.target/i386/pr103462-4.c > > > > > @@ -0,0 +1,46 @@ > > > > > +/* { dg-do run } */ > > > > > +/* { dg-options "-O1" } */ > > > > > + > > > > > +#include "pr103462-3.c" > > > > > + > > > > > +int main() > > > > > +{ > > > > > + unsigned int tmp = 0x11111111U; > > > > > + > > > > > + if (foo (tmp) != 0x10110110U) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo1 (tmp) != 0x1101101U) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo2 (tmp) != 0x0U) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo3 (tmp) != 0x0U) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo4 (tmp) != 0xffffffffU) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo5 (tmp) != 0xffffffffU) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo6 (tmp) != 0x59359359U) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo7 (tmp) != 0x93593593U) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo8 (tmp) != 0xa7ca7ca7U) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo9 (tmp) != 0x7ca7ca7cU) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo10 (tmp) != 0x58358358U) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo11 (tmp) != 0x83583583U) > > > > > + __builtin_abort (); > > > > > +} > > > > > + > > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr103462-5.c > > > > > b/gcc/testsuite/gcc.target/i386/pr103462-5.c > > > > > new file mode 100644 > > > > > index 00000000000..1f4ffa34b48 > > > > > --- /dev/null > > > > > +++ b/gcc/testsuite/gcc.target/i386/pr103462-5.c > > > > > @@ -0,0 +1,111 @@ > > > > > +/* { dg-do compile } */ > > > > > +/* { dg-options "-O1 -fdump-tree-sccp-details" } */ > > > > > +/* { dg-final { scan-tree-dump-times {final value replacement} 12 > > > > > "sccp" } } */ > > > > > + > > > > > +unsigned short > > > > > +__attribute__((noipa)) > > > > > +foo (unsigned short tmp) > > > > > +{ > > > > > + for (int bit = 0; bit < 16; bit += 3) > > > > > + tmp &= ~(1U << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned short > > > > > +__attribute__((noipa)) > > > > > +foo1 (unsigned short tmp) > > > > > +{ > > > > > + for (int bit = 15; bit >= 0; bit -= 3) > > > > > + tmp &= ~(1U << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned short > > > > > +__attribute__((noipa)) > > > > > +foo2 (unsigned short tmp) > > > > > +{ > > > > > + for (int bit = 0; bit < 16; bit += 3) > > > > > + tmp &= (1U << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned short > > > > > +__attribute__((noipa)) > > > > > +foo3 (unsigned short tmp) > > > > > +{ > > > > > + for (int bit = 15; bit >= 0; bit -= 3) > > > > > + tmp &= (1U << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned short > > > > > +__attribute__((noipa)) > > > > > +foo4 (unsigned short tmp) > > > > > +{ > > > > > + for (int bit = 0; bit < 16; bit += 3) > > > > > + tmp |= ~(1U << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned short > > > > > +__attribute__((noipa)) > > > > > +foo5 (unsigned short tmp) > > > > > +{ > > > > > + for (int bit = 15; bit >= 0; bit -= 3) > > > > > + tmp |= ~(1U << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned short > > > > > +__attribute__((noipa)) > > > > > +foo6 (unsigned short tmp) > > > > > +{ > > > > > + for (int bit = 0; bit < 16; bit += 3) > > > > > + tmp |= (1U << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned short > > > > > +__attribute__((noipa)) > > > > > +foo7 (unsigned short tmp) > > > > > +{ > > > > > + for (int bit = 15; bit >= 0; bit -= 3) > > > > > + tmp |= (1U << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned short > > > > > +__attribute__((noipa)) > > > > > +foo8 (unsigned short tmp) > > > > > +{ > > > > > + for (int bit = 0; bit < 16; bit += 3) > > > > > + tmp ^= ~(1U << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned short > > > > > +__attribute__((noipa)) > > > > > +foo9 (unsigned short tmp) > > > > > +{ > > > > > + for (int bit = 15; bit >= 0; bit -= 3) > > > > > + tmp ^= ~(1U << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned short > > > > > +__attribute__((noipa)) > > > > > +foo10 (unsigned short tmp) > > > > > +{ > > > > > + for (int bit = 0; bit < 16; bit += 3) > > > > > + tmp ^= (1U << bit); > > > > > + return tmp; > > > > > +} > > > > > + > > > > > +unsigned short > > > > > +__attribute__((noipa)) > > > > > +foo11 (unsigned short tmp) > > > > > +{ > > > > > + for (int bit = 15; bit >= 0; bit -= 3) > > > > > + tmp ^= (1U << bit); > > > > > + return tmp; > > > > > +} > > > > > diff --git a/gcc/testsuite/gcc.target/i386/pr103462-6.c > > > > > b/gcc/testsuite/gcc.target/i386/pr103462-6.c > > > > > new file mode 100644 > > > > > index 00000000000..65426d71efe > > > > > --- /dev/null > > > > > +++ b/gcc/testsuite/gcc.target/i386/pr103462-6.c > > > > > @@ -0,0 +1,46 @@ > > > > > +/* { dg-do run } */ > > > > > +/* { dg-options "-O1" } */ > > > > > + > > > > > +#include "pr103462-5.c" > > > > > + > > > > > +int main() > > > > > +{ > > > > > + unsigned short tmp = 0x1111U; > > > > > + > > > > > + if (foo (tmp) != 0x110) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo1 (tmp) != 0x110) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo2 (tmp) != 0x0) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo3 (tmp) != 0x0) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo4 (tmp) != 0xffff) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo5 (tmp) != 0xffff) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo6 (tmp) != 0x9359) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo7 (tmp) != 0x9359) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo8 (tmp) != 0x8358) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo9 (tmp) != 0x8358) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo10 (tmp) != 0x8358) > > > > > + __builtin_abort (); > > > > > + > > > > > + if (foo11 (tmp) != 0x8358) > > > > > + __builtin_abort (); > > > > > +} > > > > > + > > > > > diff --git a/gcc/tree-scalar-evolution.cc > > > > > b/gcc/tree-scalar-evolution.cc > > > > > index 72ceb4001e3..9b0aec4fd09 100644 > > > > > --- a/gcc/tree-scalar-evolution.cc > > > > > +++ b/gcc/tree-scalar-evolution.cc > > > > > @@ -3487,6 +3487,164 @@ expression_expensive_p (tree expr) > > > > > || expanded_size > cache.elements ()); > > > > > } > > > > > > > > > > +/* Match.pd function to match bitwise inductive expression. > > > > > + .i.e. > > > > > + _2 = 1 << _1; > > > > > + _3 = ~_2; > > > > > + tmp_9 = _3 & tmp_12; */ > > > > > +extern bool gimple_bitwise_induction_p (tree, tree *, tree > > > > > (*)(tree)); > > > > > + > > > > > +/* Return the inductive expression of bitwise operation if possible, > > > > > + otherwise returns DEF. */ > > > > > +static tree > > > > > +analyze_and_compute_bitwise_induction_effect (class loop* ex_loop, > > > > > + class loop* loop, > > > > > + tree phidef, > > > > > + tree def, > > > > > + tree niter) > > > > > +{ > > > > > + tree match_op[3],inv, bitwise_scev, bitwise_res; > > > > > + bool folded_casts = false; > > > > > + tree type = TREE_TYPE (phidef); > > > > > + gimple* header_phi = NULL; > > > > > > > > use a gphi * > > > > > > > Changed. > > > > > + > > > > > + /* Match things like op2(MATCH_OP[2]), op1(MATCH_OP[1]), > > > > > phidef(PHIDEF) > > > > > + > > > > > + op2 = PHI <phidef, inv> > > > > > + _1 = (int) bit_17; > > > > > + _3 = 1 << _1; > > > > > + op1 = ~_3; > > > > > + phidef = op1 & op2; */ > > > > > + if (!gimple_bitwise_induction_p (phidef, &match_op[0], NULL) > > > > > + || TREE_CODE (match_op[2]) != SSA_NAME > > > > > + || gimple_code (SSA_NAME_DEF_STMT (match_op[2])) != GIMPLE_PHI > > > > > > > > || !(header_phi = dyn_cast <gphi *> (SSA_NAME_DEF_STMT (match_op[2]))) > > > > > > > > > + || gimple_phi_num_args (SSA_NAME_DEF_STMT (match_op[2])) != 2) > > > > > > > > and then use header_phi here, if you pass in the PHI to use you'll > > > > always have 2 args > > > changed. > > > > > > > > > + return def; > > > > > + > > > > > + header_phi = SSA_NAME_DEF_STMT (match_op[2]); > > > > > + if (gimple_bb (header_phi) != loop->header) > > > > > + return def; > > > > > > > > I think passing in the PHI node and the exit edge instead of phidef > > > > would make the above more clear > > > Remove the upper condition. > > > > > > > > > + if (PHI_ARG_DEF_FROM_EDGE (header_phi, loop_latch_edge (loop)) != > > > > > phidef) > > > > > + return def; > > > > > + > > > > > + inv = gimple_phi_arg_def (header_phi, 0); > > > > > + if (inv == phidef) > > > > > + inv = gimple_phi_arg_def (header_phi, 1); > > > > > > > > inv = gimple_phi_arg_def_from_edge (header_phi, loop_preheader_edge > > > > (loop)); > > > > > > > > where is this used? > > > Move to the place where it's used. > > > > > > > > > + > > > > > + bitwise_scev = analyze_scalar_evolution_in_loop (ex_loop, loop, > > > > > + match_op[1], > > > > > + &folded_casts); > > > > > > > > you want > > > > > > > > bitwise_scev = analyze_scalar_evolution (loop, match_op[1]); > > > > bitwise_scev = instantiate_parameters (loop, bitwise_scev); > > > > > > > > instead since you are interested in in-loop behavior? > > > Yes, changed. > > > > > > > > > + > > > > > + /* Make sure bits is in range of type precision. */ > > > > > + if (TREE_CODE (bitwise_scev) != POLYNOMIAL_CHREC > > > > > > > > I think you want to test || !INTEGRAL_TYPE_P (bitwise_scev) since you > > > > do not rule out pointers or vector types. > > > Changed. > > > > > > > > > + || !tree_fits_uhwi_p (CHREC_LEFT (bitwise_scev)) > > > > > + || tree_to_uhwi (CHREC_LEFT (bitwise_scev)) >= TYPE_PRECISION > > > > > (type) > > > > > + || !tree_fits_shwi_p (CHREC_RIGHT (bitwise_scev))) > > > > > + return def; > > > > > + > > > > > + bitwise_res = compute_overall_effect_of_inner_loop (ex_loop, > > > > > bitwise_scev); > > > > > + if (!tree_fits_uhwi_p (bitwise_res) > > > > > + || tree_to_uhwi (bitwise_res) >= TYPE_PRECISION (type)) > > > > > + return def; > > > > > > > > Hmm, so I think that bitwise_res is the final value of the 'bit' IV, > > > > correct? > > > > So name it bit_final? > > > Changed. > > > > > > > > > + > > > > > +enum bit_op_kind > > > > > + { > > > > > + INDUCTION_BIT_CLEAR, > > > > > + INDUCTION_BIT_IOR, > > > > > + INDUCTION_BIT_XOR, > > > > > + INDUCTION_BIT_RESET, > > > > > + INDUCTION_ZERO, > > > > > + INDUCTION_ALL > > > > > + }; > > > > > + > > > > > + enum bit_op_kind induction_kind; > > > > > + enum tree_code code1 > > > > > + = gimple_assign_rhs_code (SSA_NAME_DEF_STMT (phidef)); > > > > > + enum tree_code code2 > > > > > + = gimple_assign_rhs_code (SSA_NAME_DEF_STMT (match_op[0])); > > > > > + > > > > > + /* BIT_CLEAR: A &= ~(1 << bit) > > > > > + BIT_RESET: A ^= (1 << bit). > > > > > + BIT_IOR: A |= (1 << bit) > > > > > + BIT_ZERO: A &= (1 << bit) > > > > > + BIT_ALL: A |= ~(1 << bit) > > > > > + BIT_XOR: A ^= ~(1 << bit). > > > > > + bit is induction variable. */ > > > > > + switch (code1) > > > > > + { > > > > > + case BIT_AND_EXPR: > > > > > + induction_kind = code2 == BIT_NOT_EXPR > > > > > + ? INDUCTION_BIT_CLEAR > > > > > + : INDUCTION_ZERO; > > > > > + break; > > > > > + case BIT_IOR_EXPR: > > > > > + induction_kind = code2 == BIT_NOT_EXPR > > > > > + ? INDUCTION_ALL > > > > > + : INDUCTION_BIT_IOR; > > > > > + break; > > > > > + case BIT_XOR_EXPR: > > > > > + induction_kind = code2 == BIT_NOT_EXPR > > > > > + ? INDUCTION_BIT_XOR > > > > > + : INDUCTION_BIT_RESET; > > > > > + break; > > > > > + /* A ^ ~(1 << bit) is equal to ~(A ^ (1 << bit)). */ > > > > > + case BIT_NOT_EXPR: > > > > > + gcc_assert (code2 == BIT_XOR_EXPR); > > > > > + induction_kind = INDUCTION_BIT_XOR; > > > > > + break; > > > > > + default: > > > > > + gcc_unreachable(); > > > > > + } > > > > > + > > > > > + if (induction_kind == INDUCTION_ZERO) > > > > > + return build_zero_cst (type); > > > > > > > > is that true even when the loop doesn't iterate? > > > > > > > > > + if (induction_kind == INDUCTION_ALL) > > > > > + return build_all_ones_cst (type); > > > > > > > > Likewise? > > > the niter is known > 0 and < TYPE_PRECISION (TYPE_SIZE (type), so it > > > must be iterated. > > > > > > > > > + > > > > > + wide_int bits = wi::zero (TYPE_PRECISION (type)); > > > > > + HOST_WIDE_INT start = tree_to_shwi (CHREC_LEFT (bitwise_scev)); > > > > > > > > you checked fits_uhwi above, name it bit_start? > > > > > > > > > + HOST_WIDE_INT step = tree_to_shwi (CHREC_RIGHT (bitwise_scev)); > > > > > + unsigned HOST_WIDE_INT num_iter = tree_to_uhwi (niter); > > > > > + > > > > > + /* Loop tripcount should be num_iter + 1. */ > > > > > + for (unsigned i = 0; i != num_iter + 1; i++) > > > > > + { > > > > > + bits = wi::bit_or (bits, > > > > > + wi::lshift (wi::one (TYPE_PRECISION (type)), > > > > > + start)); > > > > > > > > you can use bits = wi::set_bit (bits, start); > > > Changed. > > > > > > > > > + start += step; > > > > > + } > > > > > > > > Uh. You don't limit 'num_iter' but you limit bit_final. So supposed > > > > that > > > > the IV wraps and is unsigned char and you have a 256 bit integer the > > > > above could run quite a long time. Please limit num_iter to > > > > TYPE_PRECISION > > > > as well (do you have a testcase with a wrapping IV? try some != CST > > > > exit > > > > condition and a large increment) > > > Limit to > 0 and < TYPE_SIZE. > > > > > > > > > + > > > > > + bool inverted = false; > > > > > + switch (induction_kind) > > > > > + { > > > > > + case INDUCTION_BIT_CLEAR: > > > > > + code1 = BIT_AND_EXPR; > > > > > + inverted = true; > > > > > + break; > > > > > + case INDUCTION_BIT_IOR: > > > > > + code1 = BIT_IOR_EXPR; > > > > > + break; > > > > > + case INDUCTION_BIT_RESET: > > > > > + code1 = BIT_XOR_EXPR; > > > > > + break; > > > > > + /* A ^= ~(1 << bit) is special, when loop tripcount is even, > > > > > + it's equal to A ^= bits, else A ^= ~bits. */ > > > > > + case INDUCTION_BIT_XOR: > > > > > + code1 = BIT_XOR_EXPR; > > > > > + if (num_iter % 2 == 0) > > > > > + inverted = true; > > > > > + break; > > > > > + default: > > > > > + gcc_unreachable (); > > > > > + } > > > > > + > > > > > + if (inverted) > > > > > + bits = wi::bit_not (bits); > > > > > + return fold_build2 (code1, type, inv, wide_int_to_tree (type, > > > > > bits)); > > > > > +} > > > > > + > > > > > /* Do final value replacement for LOOP, return true if we did > > > > > anything. */ > > > > > > > > > > bool > > > > > @@ -3519,7 +3677,8 @@ final_value_replacement_loop (class loop *loop) > > > > > { > > > > > gphi *phi = psi.phi (); > > > > > tree rslt = PHI_RESULT (phi); > > > > > - tree def = PHI_ARG_DEF_FROM_EDGE (phi, exit); > > > > > + tree phidef = PHI_ARG_DEF_FROM_EDGE (phi, exit); > > > > > + tree def = phidef; > > > > > if (virtual_operand_p (def)) > > > > > { > > > > > gsi_next (&psi); > > > > > @@ -3537,6 +3696,23 @@ final_value_replacement_loop (class loop *loop) > > > > > def = analyze_scalar_evolution_in_loop (ex_loop, loop, def, > > > > > &folded_casts); > > > > > def = compute_overall_effect_of_inner_loop (ex_loop, def); > > > > > + > > > > > + /* Handle bitwise induction expression. > > > > > + > > > > > + .i.e. > > > > > + for (int i = 0; i != 64; i+=3) > > > > > + res &= ~(1UL << i); > > > > > + > > > > > + RES can't be analyzed out by SCEV because it is not > > > > > polynomially > > > > > + expressible, but in fact final value of RES can be replaced > > > > > by > > > > > + RES & CONSTANT where CONSTANT all ones with bit {0,3,6,9,... > > > > > ,63} > > > > > + being cleared, similar for BIT_IOR_EXPR/BIT_XOR_EXPR. */ > > > > > + if (tree_fits_uhwi_p (niter) > > > > > + && tree_to_uhwi (niter) > > > > > > > > that's redundant - ah, it's the not iterating case that's excluded. > > > > please write it > > > > as tree_to_uhwi (niter) != 0, also consider assigning to an unsigned > > > > HOST_WIDE_INT > > > > and pass it that way to analyze_and_compute_bitwise_induction_effect > > > Changed. > > > > > > > > > + && tree_to_uhwi (niter) != HOST_WIDE_INT_M1U) > > > > > + def = analyze_and_compute_bitwise_induction_effect (ex_loop, > > > > > loop, > > > > > + phidef, > > > > > def, niter); > > > > > + > > > > > > > > it seems you overwrite the SCEV analysis result unconditionally here, > > > > please > > > > move this inside of the following if () so it is done as fallback only > > > > (or add it as && !(def = ...) to the condition) > > > original def will be returned when bitwise induction fails which means > > > rewriting is ok, but i think it's better to not pass original def and > > > use your suggestion. > > > Changed. > > > > > > > > > if (!tree_does_not_contain_chrecs (def) > > > > > || chrec_contains_symbols_defined_in_loop (def, > > > > > ex_loop->num) > > > > > /* Moving the computation from the loop may prolong life > > > > > range > > > > > -- > > > > > 2.18.1 > > > > > > > > > > > Here's an updated V2 patch. > > > > + || !(header_phi = dyn_cast <gphi *> (SSA_NAME_DEF_STMT > > (match_op[2]))) > > + || header_phi->nargs != 2) > > > > please use gimple_phi_num_args (header_phi) != 2 > Changed. > > > > + bitwise_scev = analyze_scalar_evolution (loop, match_op[1]); > > + bitwise_scev = instantiate_parameters (loop, bitwise_scev); > > + > > > > so that's now OK to get the evolution of the bit IV > > > > + bit_final = compute_overall_effect_of_inner_loop (ex_loop, bitwise_scev); > > > > but this might not, for the final value you probably want to re-analyze the > > SCEV wrt the outer loop using your original analyze_scalar_evolution_in_loop > > query. Of course the final value could be also computed with > > CHREC_LEFT + niter * CHREC_RIGHT, so it might be more convenient > > to remove the above and re-formulate the check it is used in after > > > > + wide_int bits = wi::zero (TYPE_PRECISION (type)); > > + HOST_WIDE_INT bit_start = tree_to_shwi (CHREC_LEFT (bitwise_scev)); > > + HOST_WIDE_INT step = tree_to_shwi (CHREC_RIGHT (bitwise_scev)); > > > > ? > > > How about > + HOST_WIDE_INT bit_final = bit_start + step * niter; > + > + /* bit_start, bit_final in range of [0,TYPE_PRECISION) > + implies all bits are set in range. */ > + if (bit_final >= TYPE_PRECISION (type) > + || bit_final < 0) > + return NULL_TREE; > > Also remove ex_loop from the parameter.
OK. Thanks, Richard. > Updated patch. > > OK with that changes. > > > > Thanks, > > Richard. > > > > > > > > > > > -- > > > BR, > > > Hongtao > > > > -- > BR, > Hongtao