On 09/05/2017 12:38 AM, Christophe Lyon wrote: > Hi Jeff, > > > On 3 September 2017 at 16:44, Jeff Law <l...@redhat.com> wrote: >> On 01/13/2016 05:30 AM, Richard Biener wrote: >>> On Wed, Jan 13, 2016 at 7:39 AM, Jeff Law <l...@redhat.com> wrote: >>>> On 01/12/2016 08:11 AM, Richard Biener wrote: >>>>> >>>>> On Tue, Jan 12, 2016 at 6:10 AM, Jeff Law <l...@redhat.com> wrote: >>>>>> >>>>>> On 01/11/2016 03:32 AM, Richard Biener wrote: >>>>>> >>>>>>> >>>>>>> Yeah, reassoc is largely about canonicalization. >>>>>>> >>>>>>>> Plus doing it in TER is almost certainly more complex than getting it >>>>>>>> right >>>>>>>> in reassoc to begin with. >>>>>>> >>>>>>> >>>>>>> >>>>>>> I guess canonicalizing differently is ok but you'll still create >>>>>>> ((a & b) & 1) & c then if you only change the above place. >>>>>> >>>>>> >>>>>> What's best for that expression would depend on factors like whether or >>>>>> not >>>>>> the target can exploit ILP. ie (a & b) & (1 & c) exposes more >>>>>> parallelism >>>>>> while (((a & b) & c) & 1) is not good for parallelism, but does expose >>>>>> the >>>>>> bit test. >>>>>> >>>>>> reassoc currently generates ((a & 1) & b) & c which is dreadful as >>>>>> there's >>>>>> no ILP or chance of creating a bit test. My patch shuffles things >>>>>> around, >>>>>> but still doesn't expose the ILP or bit test in the 4 operand case. >>>>>> Based >>>>>> on the comments in reassoc, it didn't seem like the author thought >>>>>> anything >>>>>> beyond the 3-operand case was worth handling. So my patch just handles >>>>>> the >>>>>> 3-operand case. >>>>>> >>>>>> >>>>>> >>>>>>> >>>>>>> So I'm not sure what pattern the backend is looking for? >>>>>> >>>>>> >>>>>> It just wants the constant last in the sequence. That exposes bit clear, >>>>>> set, flip, test, etc idioms. >>>>> >>>>> >>>>> But those don't feed another bit operation, right? Thus we'd like to see >>>>> ((a & b) & c) & 1, not ((a & b) & 1) & c? It sounds like the instructions >>>>> are designed to feed conditionals (aka CC consuming ops)? >>>> >>>> At the gimple level they could feed a conditional, or be part of a series >>>> of >>>> ops on an SSA_NAME that eventually gets stored to memory, etc. At the RTL >>>> level they'll feed CC consumers and bit manipulations of pseudos or memory. >>>> >>>> For the 3-op case, we always want the constant last. For the 4-op case >>>> it's >>>> less clear. Though ((a & b) & c) & 1 is certainly better than ((a & b) & >>>> 1) >>>> & c. >>> >>> Ok, so handling it in swap_ops_for_binary_stmt is merely a convenient place >>> to special-case bitwise ops. The "real" fix to the sorting heuristic would >>> be >>> to sort constants at the opposite end. >>> >>> That might be too invasive right now but there is another "convenient" >>> place: >>> >>> /* If the operand vector is now empty, all operands were >>> consumed by the __builtin_powi optimization. */ >>> ... >>> else >>> { >>> machine_mode mode = TYPE_MODE (TREE_TYPE (lhs)); >>> int ops_num = ops.length (); >>> int width = get_reassociation_width (ops_num, rhs_code, >>> mode); >>> tree new_lhs = lhs; >>> >>> if (dump_file && (dump_flags & TDF_DETAILS)) >>> fprintf (dump_file, >>> "Width = %d was chosen for >>> reassociation\n", width); >>> >>> at this point you can check rhs_code and move the (single) constant >>> entry in ops (if there is any constant entry) from .last () to the >>> beginning. >>> >>> That'll get the 4 operand case correct as well and properly models >>> "constant last" for the specified operation kind. >> Resurrecting an old thread... Just now getting around to flushing this >> out of the queue. >> >> To recap, given something like (x & y) & C reassociation will turn that >> into (x & C) & y. It's functionally equivalent, but it will inhibit >> generation of bit test instructions. >> >> I originally hacked up swap_ops_for_binary_stmt. You requested that >> change be made in reassociate_bb so that it would apply to cases where >> there are more than 3 args. >> >> So that's what this patch does. OK for the trunk now? >> >> Bootstrapped and regression tested on x86_64. Also tested the new >> testcase on m68k. >> >> >> commit c10ae0339674c27c89a1fa1904217a55bf530cb3 >> Author: Jeff Law <l...@torsion.usersys.redhat.com> >> Date: Sun Sep 3 10:42:30 2017 -0400 >> >> 2017-09-03 Jeff Law <l...@redhat.com> >> >> PR tree-optimization/64910 >> * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops, >> swap the first and last operand if the last is a constant. >> >> PR tree-optimization/64910 >> * gcc.dg/tree-ssa/pr64910-2.c: New test. >> >> diff --git a/gcc/ChangeLog b/gcc/ChangeLog >> index 3f632ca31c2..2c9a8c8265a 100644 >> --- a/gcc/ChangeLog >> +++ b/gcc/ChangeLog >> @@ -1,3 +1,9 @@ >> +2017-09-03 Jeff Law <l...@redhat.com> >> + >> + PR tree-optimization/64910 >> + * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops, >> + swap the first and last operand if the last is a constant. >> + >> 2017-09-01 Segher Boessenkool <seg...@kernel.crashing.org> >> >> PR rtl-optimization/82024 >> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog >> index 4ead57edfa2..766677c899b 100644 >> --- a/gcc/testsuite/ChangeLog >> +++ b/gcc/testsuite/ChangeLog >> @@ -1,3 +1,8 @@ >> +2017-09-03 Jeff Law <l...@redhat.com> >> + >> + PR tree-optimization/64910 >> + * gcc.dg/tree-ssa/pr64910-2.c: New test. >> + >> 2017-09-01 Jakub Jelinek <ja...@redhat.com> >> >> PR target/81766 >> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c >> b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c >> new file mode 100644 >> index 00000000000..2e3d6790776 >> --- /dev/null >> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr64910-2.c >> @@ -0,0 +1,85 @@ >> +/* { dg-do compile } */ >> +/* { dg-options "-O2 -fdump-tree-reassoc1" } */ >> + >> +/* We want to make sure that we reassociate in a way that has the >> + constant last. With the constant last, it's more likely to result >> + in a bitfield test on targets with such capabilities. */ >> + >> +extern void boo (); >> + >> +int b2b_uc (unsigned char u, unsigned char w) >> +{ >> + if ((u & w) & 0x20) >> + boo (); >> +} >> + >> +int b2b_us (unsigned short u, unsigned short w) >> +{ >> + if ((u & w) & 0x20) >> + boo (); >> +} >> + >> +int b2b_ui (unsigned int u, unsigned int w) >> +{ >> + if ((u & w) & 0x20) >> + boo (); >> +} >> +int b2b_ul (unsigned long u, unsigned long w) >> +{ >> + if ((u & w) & 0x20) >> + boo (); >> +} >> +int b2b_ull (unsigned long long u, unsigned long long w) >> +{ >> + if ((u & w) & 0x20) >> + boo (); >> +} >> + >> +int b2b_sc (signed char u, signed char w) >> +{ >> + if ((u & w) & 0x20) >> + boo (); >> +} >> + >> +int b2b_ss (signed short u, signed short w) >> +{ >> + if ((u & w) & 0x20) >> + boo (); >> +} >> + >> +int b2b_si (signed int u, signed int w) >> +{ >> + if ((u & w) & 0x20) >> + boo (); >> +} >> +int b2b_sl (signed long u, signed long w) >> +{ >> + if ((u & w) & 0x20) >> + boo (); >> +} >> +int b2b_sll (signed long long u, signed long long w) >> +{ >> + if ((u & w) & 0x20) >> + boo (); >> +} >> + >> +/* The AND of U & W should go into a temporary, when is then ANDed >> + with the constant. >> + >> + First verify that we have the right number of ANDs between U and W. */ >> +/* { dg-final { scan-tree-dump-times "\[uw\]_\[0-9\]+.D. \& >> \[uw\]_\[0-9\]+.D.;" 10 "reassoc1"} } */ >> + >> +/* Then verify that we have the right number of ANDS between a temporary >> + and the constant. */ >> +/* { dg-final { scan-tree-dump-times "_\[0-9]+ \& 32;" 10 "reassoc1"} } */ > > This part of the new testcase fails on aarch64 & arm: > FAIL: gcc.dg/tree-ssa/pr64910-2.c scan-tree-dump-times reassoc1 > "_[0-9]+ & 32;" 10 Thanks. I'm investigating.
Jeff