Hi Jeff,
On 3 September 2017 at 16:44, Jeff Law <l...@redhat.com> wrote: > On 01/13/2016 05:30 AM, Richard Biener wrote: >> On Wed, Jan 13, 2016 at 7:39 AM, Jeff Law <l...@redhat.com> wrote: >>> On 01/12/2016 08:11 AM, Richard Biener wrote: >>>> >>>> On Tue, Jan 12, 2016 at 6:10 AM, Jeff Law <l...@redhat.com> wrote: >>>>> >>>>> On 01/11/2016 03:32 AM, Richard Biener wrote: >>>>> >>>>>> >>>>>> Yeah, reassoc is largely about canonicalization. >>>>>> >>>>>>> Plus doing it in TER is almost certainly more complex than getting it >>>>>>> right >>>>>>> in reassoc to begin with. >>>>>> >>>>>> >>>>>> >>>>>> I guess canonicalizing differently is ok but you'll still create >>>>>> ((a & b) & 1) & c then if you only change the above place. >>>>> >>>>> >>>>> What's best for that expression would depend on factors like whether or >>>>> not >>>>> the target can exploit ILP. ie (a & b) & (1 & c) exposes more >>>>> parallelism >>>>> while (((a & b) & c) & 1) is not good for parallelism, but does expose >>>>> the >>>>> bit test. >>>>> >>>>> reassoc currently generates ((a & 1) & b) & c which is dreadful as >>>>> there's >>>>> no ILP or chance of creating a bit test. My patch shuffles things >>>>> around, >>>>> but still doesn't expose the ILP or bit test in the 4 operand case. >>>>> Based >>>>> on the comments in reassoc, it didn't seem like the author thought >>>>> anything >>>>> beyond the 3-operand case was worth handling. So my patch just handles >>>>> the >>>>> 3-operand case. >>>>> >>>>> >>>>> >>>>>> >>>>>> So I'm not sure what pattern the backend is looking for? >>>>> >>>>> >>>>> It just wants the constant last in the sequence. That exposes bit clear, >>>>> set, flip, test, etc idioms. >>>> >>>> >>>> But those don't feed another bit operation, right? Thus we'd like to see >>>> ((a & b) & c) & 1, not ((a & b) & 1) & c? It sounds like the instructions >>>> are designed to feed conditionals (aka CC consuming ops)? >>> >>> At the gimple level they could feed a conditional, or be part of a series of >>> ops on an SSA_NAME that eventually gets stored to memory, etc. At the RTL >>> level they'll feed CC consumers and bit manipulations of pseudos or memory. >>> >>> For the 3-op case, we always want the constant last. For the 4-op case it's >>> less clear. Though ((a & b) & c) & 1 is certainly better than ((a & b) & 1) >>> & c. >> >> Ok, so handling it in swap_ops_for_binary_stmt is merely a convenient place >> to special-case bitwise ops. The "real" fix to the sorting heuristic would >> be >> to sort constants at the opposite end. >> >> That might be too invasive right now but there is another "convenient" place: >> >> /* If the operand vector is now empty, all operands were >> consumed by the __builtin_powi optimization. */ >> ... >> else >> { >> machine_mode mode = TYPE_MODE (TREE_TYPE (lhs)); >> int ops_num = ops.length (); >> int width = get_reassociation_width (ops_num, rhs_code, >> mode); >> tree new_lhs = lhs; >> >> if (dump_file && (dump_flags & TDF_DETAILS)) >> fprintf (dump_file, >> "Width = %d was chosen for >> reassociation\n", width); >> >> at this point you can check rhs_code and move the (single) constant >> entry in ops (if there is any constant entry) from .last () to the beginning. >> >> That'll get the 4 operand case correct as well and properly models >> "constant last" for the specified operation kind. > Resurrecting an old thread... Just now getting around to flushing this > out of the queue. > > To recap, given something like (x & y) & C reassociation will turn that > into (x & C) & y. It's functionally equivalent, but it will inhibit > generation of bit test instructions. > > I originally hacked up swap_ops_for_binary_stmt. You requested that > change be made in reassociate_bb so that it would apply to cases where > there are more than 3 args. > > So that's what this patch does. OK for the trunk now? > > Bootstrapped and regression tested on x86_64. Also tested the new > testcase on m68k. > > > commit c10ae0339674c27c89a1fa1904217a55bf530cb3 > Author: Jeff Law <l...@torsion.usersys.redhat.com> > Date: Sun Sep 3 10:42:30 2017 -0400 > > 2017-09-03 Jeff Law <l...@redhat.com> > > PR tree-optimization/64910 > * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops, > swap the first and last operand if the last is a constant. > > PR tree-optimization/64910 > * gcc.dg/tree-ssa/pr64910-2.c: New test. > > diff --git a/gcc/ChangeLog b/gcc/ChangeLog > index 3f632ca31c2..2c9a8c8265a 100644 > --- a/gcc/ChangeLog > +++ b/gcc/ChangeLog > @@ -1,3 +1,9 @@ > +2017-09-03 Jeff Law <l...@redhat.com> > + > + PR tree-optimization/64910 > + * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops, > + swap the first and last operand if the last is a constant. > + > 2017-09-01 Segher Boessenkool <seg...@kernel.crashing.org> > > PR rtl-optimization/82024 > diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog > index 4ead57edfa2..766677c899b 100644 > --- a/gcc/testsuite/ChangeLog > +++ b/gcc/testsuite/ChangeLog > @@ -1,3 +1,8 @@ > +2017-09-03 Jeff Law <l...@redhat.com> > + > + PR tree-optimization/64910 > + * gcc.dg/tree-ssa/pr64910-2.c: New test. > + > 2017-09-01 Jakub Jelinek <ja...@redhat.com> > > PR target/81766 > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c > b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c > new file mode 100644 > index 00000000000..2e3d6790776 > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c > @@ -0,0 +1,85 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-reassoc1" } */ > + > +/* We want to make sure that we reassociate in a way that has the > + constant last. With the constant last, it's more likely to result > + in a bitfield test on targets with such capabilities. */ > + > +extern void boo (); > + > +int b2b_uc (unsigned char u, unsigned char w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +int b2b_us (unsigned short u, unsigned short w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +int b2b_ui (unsigned int u, unsigned int w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > +int b2b_ul (unsigned long u, unsigned long w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > +int b2b_ull (unsigned long long u, unsigned long long w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +int b2b_sc (signed char u, signed char w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +int b2b_ss (signed short u, signed short w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +int b2b_si (signed int u, signed int w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > +int b2b_sl (signed long u, signed long w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > +int b2b_sll (signed long long u, signed long long w) > +{ > + if ((u & w) & 0x20) > + boo (); > +} > + > +/* The AND of U & W should go into a temporary, when is then ANDed > + with the constant. > + > + First verify that we have the right number of ANDs between U and W. */ > +/* { dg-final { scan-tree-dump-times "\[uw\]_\[0-9\]+.D. \& > \[uw\]_\[0-9\]+.D.;" 10 "reassoc1"} } */ > + > +/* Then verify that we have the right number of ANDS between a temporary > + and the constant. */ > +/* { dg-final { scan-tree-dump-times "_\[0-9]+ \& 32;" 10 "reassoc1"} } */ This part of the new testcase fails on aarch64 & arm: FAIL: gcc.dg/tree-ssa/pr64910-2.c scan-tree-dump-times reassoc1 "_[0-9]+ & 32;" 10 Christophe > + > +/* Each function has one AND. It will have either a second AND or TEST. So > + we can count the number of AND and TEST instructions. They must be 2X > + the number of test functions in this file. */ > +/* { dg-final { scan-assembler-times "and|test" 20 { target { i?86-*-* > x86_64-*-*} } } } */ > + > +/* Similarly on the m68k. The code for the long long tests is suboptimal, > + which catch via the second pattern and xfail. */ > +/* { dg-final { scan-assembler-times "and|btst" 20 { target { m68k-*-* } } } > } */ > +/* { dg-final { scan-assembler-not "or" { target { m68k-*-* } xfail { *-*-* > } } } } */ > + > diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c > index 561acea4dcc..76048196b27 100644 > --- a/gcc/tree-ssa-reassoc.c > +++ b/gcc/tree-ssa-reassoc.c > @@ -5762,6 +5762,18 @@ reassociate_bb (basic_block bb) > fprintf (dump_file, > "Width = %d was chosen for reassociation\n", > width); > > + > + /* For binary bit operations, if the last operand in > + OPS is a constant, move it to the front. This > + helps ensure that we generate (X & Y) & C rather > + than (X & C) & Y. The former will often match > + a canonical bit test when we get to RTL. */ > + if ((rhs_code == BIT_AND_EXPR > + || rhs_code == BIT_IOR_EXPR > + || rhs_code == BIT_XOR_EXPR) > + && TREE_CODE (ops.last ()->op) == INTEGER_CST) > + std::swap (*ops[0], *ops[ops_num - 1]); > + > if (width > 1 > && ops.length () > 3) > rewrite_expr_tree_parallel (as_a <gassign *> (stmt), >