On 01/13/2016 05:30 AM, Richard Biener wrote: > On Wed, Jan 13, 2016 at 7:39 AM, Jeff Law <l...@redhat.com> wrote: >> On 01/12/2016 08:11 AM, Richard Biener wrote: >>> >>> On Tue, Jan 12, 2016 at 6:10 AM, Jeff Law <l...@redhat.com> wrote: >>>> >>>> On 01/11/2016 03:32 AM, Richard Biener wrote: >>>> >>>>> >>>>> Yeah, reassoc is largely about canonicalization. >>>>> >>>>>> Plus doing it in TER is almost certainly more complex than getting it >>>>>> right >>>>>> in reassoc to begin with. >>>>> >>>>> >>>>> >>>>> I guess canonicalizing differently is ok but you'll still create >>>>> ((a & b) & 1) & c then if you only change the above place. >>>> >>>> >>>> What's best for that expression would depend on factors like whether or >>>> not >>>> the target can exploit ILP. ie (a & b) & (1 & c) exposes more >>>> parallelism >>>> while (((a & b) & c) & 1) is not good for parallelism, but does expose >>>> the >>>> bit test. >>>> >>>> reassoc currently generates ((a & 1) & b) & c which is dreadful as >>>> there's >>>> no ILP or chance of creating a bit test. My patch shuffles things >>>> around, >>>> but still doesn't expose the ILP or bit test in the 4 operand case. >>>> Based >>>> on the comments in reassoc, it didn't seem like the author thought >>>> anything >>>> beyond the 3-operand case was worth handling. So my patch just handles >>>> the >>>> 3-operand case. >>>> >>>> >>>> >>>>> >>>>> So I'm not sure what pattern the backend is looking for? >>>> >>>> >>>> It just wants the constant last in the sequence. That exposes bit clear, >>>> set, flip, test, etc idioms. >>> >>> >>> But those don't feed another bit operation, right? Thus we'd like to see >>> ((a & b) & c) & 1, not ((a & b) & 1) & c? It sounds like the instructions >>> are designed to feed conditionals (aka CC consuming ops)? >> >> At the gimple level they could feed a conditional, or be part of a series of >> ops on an SSA_NAME that eventually gets stored to memory, etc. At the RTL >> level they'll feed CC consumers and bit manipulations of pseudos or memory. >> >> For the 3-op case, we always want the constant last. For the 4-op case it's >> less clear. Though ((a & b) & c) & 1 is certainly better than ((a & b) & 1) >> & c. > > Ok, so handling it in swap_ops_for_binary_stmt is merely a convenient place > to special-case bitwise ops. The "real" fix to the sorting heuristic would be > to sort constants at the opposite end. > > That might be too invasive right now but there is another "convenient" place: > > /* If the operand vector is now empty, all operands were > consumed by the __builtin_powi optimization. */ > ... > else > { > machine_mode mode = TYPE_MODE (TREE_TYPE (lhs)); > int ops_num = ops.length (); > int width = get_reassociation_width (ops_num, rhs_code, > mode); > tree new_lhs = lhs; > > if (dump_file && (dump_flags & TDF_DETAILS)) > fprintf (dump_file, > "Width = %d was chosen for > reassociation\n", width); > > at this point you can check rhs_code and move the (single) constant > entry in ops (if there is any constant entry) from .last () to the beginning. > > That'll get the 4 operand case correct as well and properly models > "constant last" for the specified operation kind. Resurrecting an old thread... Just now getting around to flushing this out of the queue.
To recap, given something like (x & y) & C reassociation will turn that into (x & C) & y. It's functionally equivalent, but it will inhibit generation of bit test instructions. I originally hacked up swap_ops_for_binary_stmt. You requested that change be made in reassociate_bb so that it would apply to cases where there are more than 3 args. So that's what this patch does. OK for the trunk now? Bootstrapped and regression tested on x86_64. Also tested the new testcase on m68k.
commit c10ae0339674c27c89a1fa1904217a55bf530cb3 Author: Jeff Law <l...@torsion.usersys.redhat.com> Date: Sun Sep 3 10:42:30 2017 -0400 2017-09-03 Jeff Law <l...@redhat.com> PR tree-optimization/64910 * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops, swap the first and last operand if the last is a constant. PR tree-optimization/64910 * gcc.dg/tree-ssa/pr64910-2.c: New test. diff --git a/gcc/ChangeLog b/gcc/ChangeLog index 3f632ca31c2..2c9a8c8265a 100644 --- a/gcc/ChangeLog +++ b/gcc/ChangeLog @@ -1,3 +1,9 @@ +2017-09-03 Jeff Law <l...@redhat.com> + + PR tree-optimization/64910 + * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops, + swap the first and last operand if the last is a constant. + 2017-09-01 Segher Boessenkool <seg...@kernel.crashing.org> PR rtl-optimization/82024 diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog index 4ead57edfa2..766677c899b 100644 --- a/gcc/testsuite/ChangeLog +++ b/gcc/testsuite/ChangeLog @@ -1,3 +1,8 @@ +2017-09-03 Jeff Law <l...@redhat.com> + + PR tree-optimization/64910 + * gcc.dg/tree-ssa/pr64910-2.c: New test. + 2017-09-01 Jakub Jelinek <ja...@redhat.com> PR target/81766 diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c new file mode 100644 index 00000000000..2e3d6790776 --- /dev/null +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c @@ -0,0 +1,85 @@ +/* { dg-do compile } */ +/* { dg-options "-O2 -fdump-tree-reassoc1" } */ + +/* We want to make sure that we reassociate in a way that has the + constant last. With the constant last, it's more likely to result + in a bitfield test on targets with such capabilities. */ + +extern void boo (); + +int b2b_uc (unsigned char u, unsigned char w) +{ + if ((u & w) & 0x20) + boo (); +} + +int b2b_us (unsigned short u, unsigned short w) +{ + if ((u & w) & 0x20) + boo (); +} + +int b2b_ui (unsigned int u, unsigned int w) +{ + if ((u & w) & 0x20) + boo (); +} +int b2b_ul (unsigned long u, unsigned long w) +{ + if ((u & w) & 0x20) + boo (); +} +int b2b_ull (unsigned long long u, unsigned long long w) +{ + if ((u & w) & 0x20) + boo (); +} + +int b2b_sc (signed char u, signed char w) +{ + if ((u & w) & 0x20) + boo (); +} + +int b2b_ss (signed short u, signed short w) +{ + if ((u & w) & 0x20) + boo (); +} + +int b2b_si (signed int u, signed int w) +{ + if ((u & w) & 0x20) + boo (); +} +int b2b_sl (signed long u, signed long w) +{ + if ((u & w) & 0x20) + boo (); +} +int b2b_sll (signed long long u, signed long long w) +{ + if ((u & w) & 0x20) + boo (); +} + +/* The AND of U & W should go into a temporary, when is then ANDed + with the constant. + + First verify that we have the right number of ANDs between U and W. */ +/* { dg-final { scan-tree-dump-times "\[uw\]_\[0-9\]+.D. \& \[uw\]_\[0-9\]+.D.;" 10 "reassoc1"} } */ + +/* Then verify that we have the right number of ANDS between a temporary + and the constant. */ +/* { dg-final { scan-tree-dump-times "_\[0-9]+ \& 32;" 10 "reassoc1"} } */ + +/* Each function has one AND. It will have either a second AND or TEST. So + we can count the number of AND and TEST instructions. They must be 2X + the number of test functions in this file. */ +/* { dg-final { scan-assembler-times "and|test" 20 { target { i?86-*-* x86_64-*-*} } } } */ + +/* Similarly on the m68k. The code for the long long tests is suboptimal, + which catch via the second pattern and xfail. */ +/* { dg-final { scan-assembler-times "and|btst" 20 { target { m68k-*-* } } } } */ +/* { dg-final { scan-assembler-not "or" { target { m68k-*-* } xfail { *-*-* } } } } */ + diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c index 561acea4dcc..76048196b27 100644 --- a/gcc/tree-ssa-reassoc.c +++ b/gcc/tree-ssa-reassoc.c @@ -5762,6 +5762,18 @@ reassociate_bb (basic_block bb) fprintf (dump_file, "Width = %d was chosen for reassociation\n", width); + + /* For binary bit operations, if the last operand in + OPS is a constant, move it to the front. This + helps ensure that we generate (X & Y) & C rather + than (X & C) & Y. The former will often match + a canonical bit test when we get to RTL. */ + if ((rhs_code == BIT_AND_EXPR + || rhs_code == BIT_IOR_EXPR + || rhs_code == BIT_XOR_EXPR) + && TREE_CODE (ops.last ()->op) == INTEGER_CST) + std::swap (*ops[0], *ops[ops_num - 1]); + if (width > 1 && ops.length () > 3) rewrite_expr_tree_parallel (as_a <gassign *> (stmt),