On Wed, 4 Nov 2020 at 10:59, Richard Biener <rguent...@suse.de> wrote: > > On Wed, 4 Nov 2020, Jakub Jelinek wrote: > > > Hi! > > > > The following patch generalizes the x ? 1 : 0 -> (int) x optimization > > to handle also left shifts by constant. > > > > During x86_64-linux and i686-linux bootstraps + regtests it triggered > > in 1514 unique non-LTO -m64 cases (sort -u on log mentioning > > filename, function name and shift count) and 1866 -m32 cases. > > > > Unfortunately, the patch regresses: > > +FAIL: gcc.dg/tree-ssa/ssa-ccp-11.c scan-tree-dump-times optimized "if " 0 > > +FAIL: gcc.dg/vect/bb-slp-pattern-2.c -flto -ffat-lto-objects > > scan-tree-dump-times slp1 "optimized: basic block" 1 > > +FAIL: gcc.dg/vect/bb-slp-pattern-2.c scan-tree-dump-times slp1 "optimized: > > basic block" 1 > > and in both cases it actually results in worse code. > > > > In ssa-ccp-11.c since phiopt2 it results in smaller IL due to the > > optimization, e.g. > > - if (_1 != 0) > > - goto <bb 5>; [21.72%] > > - else > > - goto <bb 6>; [78.28%] > > - > > - <bb 5> [local count: 233216728]: > > - > > - <bb 6> [local count: 1073741824]: > > - # _4 = PHI <2(5), 0(4)> > > - return _4; > > + _7 = (int) _1; > > + _8 = _7 << 1; > > + return _8; > > but dom2 actually manages to optimize it only without this optimization: > > - # a_7 = PHI <0(3), 1(2)> > > - # b_8 = PHI <1(3), 0(2)> > > - _9 = a_7 & b_8; > > - return 0; > > + # a_2 = PHI <1(2), 0(3)> > > + # b_3 = PHI <0(2), 1(3)> > > + _1 = a_2 & b_3; > > + _7 = (int) _1; > > + _8 = _7 << 1; > > + return _8; > > We'd need some optimization that would go through all PHI edges and > > compute if some use of the phi results don't actually compute a constant > > across all the PHI edges - 1 & 0 and 0 & 1 is always 0. > > PRE should do this, IMHO only optimizing it at -O2 is fine. Can you > check? > > > Similarly in the > > other function > > + # a_1 = PHI <3(2), 2(3)> > > + # b_2 = PHI <2(2), 3(3)> > > + c_5 = a_1 + b_2; > > is always c_5 = 5; > > Similarly, in the slp vectorization test there is: > > a[0] = b[0] ? 1 : 7; > > note this, carefully avoiding the already "optimized" b[0] ? 1 : 0 ... > > > a[1] = b[1] ? 2 : 0; > > a[2] = b[2] ? 3 : 0; > > a[3] = b[3] ? 4 : 0; > > a[4] = b[4] ? 5 : 0; > > a[5] = b[5] ? 6 : 0; > > a[6] = b[6] ? 7 : 0; > > a[7] = b[7] ? 8 : 0; > > and obviously if the ? 2 : 0 and ? 4 : 0 and ? 8 : 0 are optimized > > into shifts, it doesn't match anymore. > > So the option is to put : 7 in the 2, 4 an 8 case as well. The testcase > wasn't added for any real-world case but is artificial I guess for > COND_EXPR handling of invariants. > > > So, I wonder if we e.g. shouldn't perform this optimization only in the last > > phiopt pass (i.e. change the bool early argument to int late where it would > > be 0 (early), 1 (late) and 2 (very late) and perform this only if very late. > > Well, we always have the issue that a more "complex" expression might > be more easily canonical. But removing control flow is important > and if we decide that we want to preserve it it more "canonical" > (general) form then we should consider replacing > > if (_1 != 0) > > # _2 = PHI <0, 1> > > with > > _2 = _1 ? 0 : 1; > > in general and doing fancy expansion late. But we're already doing > the other thing so ... > > But yeah, for things like SLP it means we eventually have to > implement reverse transforms for all of this to make the lanes > matching. But that's true anyway for things like x + 1 vs. x + 0 > or x / 3 vs. x / 2 or other simplifications we do. > > > Thoughts on this? > > OK with the FAILing testcases adjusted (use -O2 / different constants). >
The patch introduced a regression on aarch64: FAIL: gcc.target/aarch64/sve/vcond_3.c -march=armv8.2-a+sve scan-assembler-times \\tmov\\tz[0-9]+\\.[hsd], p[0-7]/z, #-32768\\n 3 FAIL: gcc.target/aarch64/sve/vcond_3.c -march=armv8.2-a+sve scan-assembler-times \\tmov\\tz[0-9]+\\.[hsd], p[0-7]/z, #256\\n 3 FAIL: gcc.target/aarch64/sve/vcond_3.c -march=armv8.2-a+sve scan-assembler-times \\tmov\\tz[0-9]+\\.[hsd], p[0-7]/z, #2\\n 3 FAIL: gcc.target/aarch64/sve/vcond_3.c -march=armv8.2-a+sve scan-assembler-times \\tmov\\tz[0-9]+\\.b, p[0-7]/z, #-128\\n 1 FAIL: gcc.target/aarch64/sve/vcond_3.c -march=armv8.2-a+sve scan-assembler-times \\tmov\\tz[0-9]+\\.b, p[0-7]/z, #2\\n 1 Christophe > Thanks, > Richard. > > > 2020-11-03 Jakub Jelinek <ja...@redhat.com> > > > > PR tree-optimization/97690 > > * tree-ssa-phiopt.c (conditional_replacement): Also optimize > > cond ? pow2p_cst : 0 as ((type) cond) << cst. > > > > * gcc.dg/tree-ssa/phi-opt-22.c: New test. > > > > --- gcc/tree-ssa-phiopt.c.jj 2020-10-22 09:36:25.602484491 +0200 > > +++ gcc/tree-ssa-phiopt.c 2020-11-03 17:59:18.133662581 +0100 > > @@ -752,7 +752,9 @@ conditional_replacement (basic_block con > > gimple_stmt_iterator gsi; > > edge true_edge, false_edge; > > tree new_var, new_var2; > > - bool neg; > > + bool neg = false; > > + int shift = 0; > > + tree nonzero_arg; > > > > /* FIXME: Gimplification of complex type is too hard for now. */ > > /* We aren't prepared to handle vectors either (and it is a question > > @@ -763,14 +765,22 @@ conditional_replacement (basic_block con > > || POINTER_TYPE_P (TREE_TYPE (arg1)))) > > return false; > > > > - /* The PHI arguments have the constants 0 and 1, or 0 and -1, then > > - convert it to the conditional. */ > > - if ((integer_zerop (arg0) && integer_onep (arg1)) > > - || (integer_zerop (arg1) && integer_onep (arg0))) > > - neg = false; > > - else if ((integer_zerop (arg0) && integer_all_onesp (arg1)) > > - || (integer_zerop (arg1) && integer_all_onesp (arg0))) > > + /* The PHI arguments have the constants 0 and 1, or 0 and -1 or > > + 0 and (1 << cst), then convert it to the conditional. */ > > + if (integer_zerop (arg0)) > > + nonzero_arg = arg1; > > + else if (integer_zerop (arg1)) > > + nonzero_arg = arg0; > > + else > > + return false; > > + if (integer_all_onesp (nonzero_arg)) > > neg = true; > > + else if (integer_pow2p (nonzero_arg)) > > + { > > + shift = tree_log2 (nonzero_arg); > > + if (shift && POINTER_TYPE_P (TREE_TYPE (nonzero_arg))) > > + return false; > > + } > > else > > return false; > > > > @@ -782,12 +792,12 @@ conditional_replacement (basic_block con > > falls through into BB. > > > > There is a single PHI node at the join point (BB) and its arguments > > - are constants (0, 1) or (0, -1). > > + are constants (0, 1) or (0, -1) or (0, (1 << shift)). > > > > So, given the condition COND, and the two PHI arguments, we can > > rewrite this PHI into non-branching code: > > > > - dest = (COND) or dest = COND' > > + dest = (COND) or dest = COND' or dest = (COND) << shift > > > > We use the condition as-is if the argument associated with the > > true edge has the value one or the argument associated with the > > @@ -822,6 +832,14 @@ conditional_replacement (basic_block con > > cond = fold_build1_loc (gimple_location (stmt), > > NEGATE_EXPR, TREE_TYPE (cond), cond); > > } > > + else if (shift) > > + { > > + cond = fold_convert_loc (gimple_location (stmt), > > + TREE_TYPE (result), cond); > > + cond = fold_build2_loc (gimple_location (stmt), > > + LSHIFT_EXPR, TREE_TYPE (cond), cond, > > + build_int_cst (integer_type_node, shift)); > > + } > > > > /* Insert our new statements at the end of conditional block before the > > COND_STMT. */ > > --- gcc/testsuite/gcc.dg/tree-ssa/phi-opt-22.c.jj 2020-11-03 > > 18:22:19.756124543 +0100 > > +++ gcc/testsuite/gcc.dg/tree-ssa/phi-opt-22.c 2020-11-03 > > 18:25:12.795176885 +0100 > > @@ -0,0 +1,11 @@ > > +/* PR tree-optimization/97690 */ > > +/* { dg-do compile } */ > > +/* { dg-options "-O2 -fdump-tree-phiopt2" } */ > > + > > +int foo (_Bool d) { return d ? 2 : 0; } > > +int bar (_Bool d) { return d ? 1 : 0; } > > +int baz (_Bool d) { return d ? -__INT_MAX__ - 1 : 0; } > > +int qux (_Bool d) { return d ? 1024 : 0; } > > + > > +/* { dg-final { scan-tree-dump-not "if" "phiopt2" } } */ > > +/* { dg-final { scan-tree-dump-times " << " 3 "phiopt2" } } */ > > > > Jakub > > > > > > -- > Richard Biener <rguent...@suse.de> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg, > Germany; GF: Felix Imend