On 01/13/2016 05:30 AM, Richard Biener wrote:
> On Wed, Jan 13, 2016 at 7:39 AM, Jeff Law <l...@redhat.com> wrote:
>> On 01/12/2016 08:11 AM, Richard Biener wrote:
>>>
>>> On Tue, Jan 12, 2016 at 6:10 AM, Jeff Law <l...@redhat.com> wrote:
>>>>
>>>> On 01/11/2016 03:32 AM, Richard Biener wrote:
>>>>
>>>>>
>>>>> Yeah, reassoc is largely about canonicalization.
>>>>>
>>>>>> Plus doing it in TER is almost certainly more complex than getting it
>>>>>> right
>>>>>> in reassoc to begin with.
>>>>>
>>>>>
>>>>>
>>>>> I guess canonicalizing differently is ok but you'll still create
>>>>> ((a & b) & 1) & c then if you only change the above place.
>>>>
>>>>
>>>> What's best for that expression would depend on factors like whether or
>>>> not
>>>> the target can exploit ILP.  ie (a & b) & (1 & c) exposes more
>>>> parallelism
>>>> while (((a & b) & c) & 1) is not good for parallelism, but does expose
>>>> the
>>>> bit test.
>>>>
>>>> reassoc currently generates ((a & 1) & b) & c which is dreadful as
>>>> there's
>>>> no ILP or chance of creating a bit test.  My patch shuffles things
>>>> around,
>>>> but still doesn't expose the ILP or bit test in the 4 operand case.
>>>> Based
>>>> on the comments in reassoc, it didn't seem like the author thought
>>>> anything
>>>> beyond the 3-operand case was worth handling. So my patch just handles
>>>> the
>>>> 3-operand case.
>>>>
>>>>
>>>>
>>>>>
>>>>> So I'm not sure what pattern the backend is looking for?
>>>>
>>>>
>>>> It just wants the constant last in the sequence.  That exposes bit clear,
>>>> set, flip, test, etc idioms.
>>>
>>>
>>> But those don't feed another bit operation, right?  Thus we'd like to see
>>> ((a & b) & c) & 1, not ((a & b) & 1) & c?  It sounds like the instructions
>>> are designed to feed conditionals (aka CC consuming ops)?
>>
>> At the gimple level they could feed a conditional, or be part of a series of
>> ops on an SSA_NAME that eventually gets stored to memory, etc.  At the RTL
>> level they'll feed CC consumers and bit manipulations of pseudos or memory.
>>
>> For the 3-op case, we always want the constant last.  For the 4-op case it's
>> less clear.  Though ((a & b) & c) & 1 is certainly better than ((a & b) & 1)
>> & c.
> 
> Ok, so handling it in swap_ops_for_binary_stmt is merely a convenient place
> to special-case bitwise ops.  The "real" fix to the sorting heuristic would be
> to sort constants at the opposite end.
> 
> That might be too invasive right now but there is another "convenient" place:
> 
>               /* If the operand vector is now empty, all operands were
>                  consumed by the __builtin_powi optimization.  */
> ...
>               else
>                 {
>                   machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
>                   int ops_num = ops.length ();
>                   int width = get_reassociation_width (ops_num, rhs_code, 
> mode);
>                   tree new_lhs = lhs;
> 
>                   if (dump_file && (dump_flags & TDF_DETAILS))
>                     fprintf (dump_file,
>                              "Width = %d was chosen for
> reassociation\n", width);
> 
> at this point you can check rhs_code and move the (single) constant
> entry in ops (if there is any constant entry) from .last () to the beginning.
> 
> That'll get the 4 operand case correct as well and properly models
> "constant last" for the specified operation kind.
Resurrecting an old thread...  Just now getting around to flushing this
out of the queue.

To recap, given something like (x & y) & C reassociation will turn that
into (x & C) & y.  It's functionally equivalent, but it will inhibit
generation of bit test instructions.

I originally hacked up swap_ops_for_binary_stmt.  You requested that
change be made in reassociate_bb so that it would apply to cases where
there are more than 3 args.

So that's what this patch does.   OK for the trunk now?

Bootstrapped and regression tested on x86_64.  Also tested the new
testcase on m68k.

commit c10ae0339674c27c89a1fa1904217a55bf530cb3
Author: Jeff Law <l...@torsion.usersys.redhat.com>
Date:   Sun Sep 3 10:42:30 2017 -0400

    2017-09-03  Jeff Law  <l...@redhat.com>
    
            PR tree-optimization/64910
            * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops,
            swap the first and last operand if the last is a constant.
    
            PR tree-optimization/64910
            * gcc.dg/tree-ssa/pr64910-2.c: New test.

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 3f632ca31c2..2c9a8c8265a 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,9 @@
+2017-09-03  Jeff Law  <l...@redhat.com>
+
+       PR tree-optimization/64910
+       * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops,
+       swap the first and last operand if the last is a constant.
+
 2017-09-01  Segher Boessenkool  <seg...@kernel.crashing.org>
 
        PR rtl-optimization/82024
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 4ead57edfa2..766677c899b 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,8 @@
+2017-09-03  Jeff Law  <l...@redhat.com>
+
+       PR tree-optimization/64910
+       * gcc.dg/tree-ssa/pr64910-2.c: New test.
+
 2017-09-01  Jakub Jelinek  <ja...@redhat.com>
 
        PR target/81766
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c
new file mode 100644
index 00000000000..2e3d6790776
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c
@@ -0,0 +1,85 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-reassoc1" } */
+
+/* We want to make sure that we reassociate in a way that has the
+   constant last.  With the constant last, it's more likely to result
+   in a bitfield test on targets with such capabilities.  */
+
+extern void boo ();
+
+int b2b_uc (unsigned char u, unsigned char w)
+{
+  if ((u & w) & 0x20)
+    boo ();
+}
+
+int b2b_us (unsigned short u, unsigned short w)
+{
+  if ((u & w) & 0x20)
+    boo ();
+}
+
+int b2b_ui (unsigned int u, unsigned int w)
+{
+  if ((u & w) & 0x20)
+    boo ();
+}
+int b2b_ul (unsigned long u, unsigned long w)
+{
+  if ((u & w) & 0x20)
+    boo ();
+}
+int b2b_ull (unsigned long long u, unsigned long long w)
+{
+  if ((u & w) & 0x20)
+    boo ();
+}
+
+int b2b_sc (signed char u, signed char w)
+{
+  if ((u & w) & 0x20)
+    boo ();
+}
+
+int b2b_ss (signed short u, signed short w)
+{
+  if ((u & w) & 0x20)
+    boo ();
+}
+
+int b2b_si (signed int u, signed int w)
+{
+  if ((u & w) & 0x20)
+    boo ();
+}
+int b2b_sl (signed long u, signed long w)
+{
+  if ((u & w) & 0x20)
+    boo ();
+}
+int b2b_sll (signed long long u, signed long long w)
+{
+  if ((u & w) & 0x20)
+    boo ();
+}
+
+/* The AND of U & W should go into a temporary, when is then ANDed
+   with the constant.
+
+   First verify that we have the right number of ANDs between U and W.  */
+/* { dg-final { scan-tree-dump-times "\[uw\]_\[0-9\]+.D. \& 
\[uw\]_\[0-9\]+.D.;" 10 "reassoc1"} } */
+
+/* Then verify that we have the right number of ANDS between a temporary
+   and the constant.  */
+/* { dg-final { scan-tree-dump-times "_\[0-9]+ \& 32;" 10 "reassoc1"} } */
+
+/* Each function has one AND.  It will have either a second AND or TEST.  So
+   we can count the number of AND and TEST instructions.  They must be 2X
+   the number of test functions in this file.  */
+/* { dg-final { scan-assembler-times "and|test" 20 { target { i?86-*-* 
x86_64-*-*} } } } */
+
+/* Similarly on the m68k.  The code for the long long tests is suboptimal,
+   which catch via the second pattern and xfail.  */
+/* { dg-final { scan-assembler-times "and|btst" 20 { target { m68k-*-* } } } } 
*/
+/* { dg-final { scan-assembler-not "or" { target { m68k-*-* } xfail { *-*-* } 
} } } */
+
diff --git a/gcc/tree-ssa-reassoc.c b/gcc/tree-ssa-reassoc.c
index 561acea4dcc..76048196b27 100644
--- a/gcc/tree-ssa-reassoc.c
+++ b/gcc/tree-ssa-reassoc.c
@@ -5762,6 +5762,18 @@ reassociate_bb (basic_block bb)
                    fprintf (dump_file,
                             "Width = %d was chosen for reassociation\n", 
width);
 
+
+                 /* For binary bit operations, if the last operand in
+                    OPS is a constant, move it to the front.  This
+                    helps ensure that we generate (X & Y) & C rather
+                    than (X & C) & Y.  The former will often match
+                    a canonical bit test when we get to RTL.  */
+                 if ((rhs_code == BIT_AND_EXPR
+                      || rhs_code == BIT_IOR_EXPR
+                      || rhs_code == BIT_XOR_EXPR)
+                     && TREE_CODE (ops.last ()->op) == INTEGER_CST)
+                   std::swap (*ops[0], *ops[ops_num - 1]);
+
                  if (width > 1
                      && ops.length () > 3)
                    rewrite_expr_tree_parallel (as_a <gassign *> (stmt),

Reply via email to