Hi Richard, Many thanks for the review. Here's the final version that I've committed, including your suggested improvements, following another make bootstrap and make -k check on x86_64-pc-linux-gnu with no new failures. Thanks again.
2021-12-01 Roger Sayle <ro...@nextmovesoftware.com> Richard Biener <rguent...@suse.de> gcc/ChangeLog * tree-ssa-loop-niter.c (number_of_iterations_until_wrap): Check if simplify_using_initial_conditions allows us to simplify the expression for may_be_zero. * match.pd (X != C ? -X : -C -> -X): New transform. (X != C ? ~X : ~C -> ~X): Likewise. ((X+1) > Y ? -X : 1 -> X >= Y ? -X : 1): Likewise. gcc/testsuite/ChangeLog * gcc.dg/fold-condneg-1.c: New test case. * gcc.dg/fold-condneg-2.c: New test case. * gcc.dg/fold-condnot-1.c: New test case. * gcc.dg/pr101145-1.c: New test case. * gcc.dg/pr101145-2.c: New test case. Roger -- > -----Original Message----- > From: Richard Biener <richard.guent...@gmail.com> > Sent: 30 November 2021 10:00 > To: Roger Sayle <ro...@nextmovesoftware.com> > Cc: GCC Patches <gcc-patches@gcc.gnu.org> > Subject: Re: [PATCH] Final value replacement improvements for until-wrap > loops. > > On Mon, Nov 29, 2021 at 10:07 AM Roger Sayle > <ro...@nextmovesoftware.com> wrote: > > > > > > This middle-end patch is inspired by Richard Biener's until-wrap loop > > example in PR tree-optimization/101145. > > > > unsigned foo(unsigned val, unsigned start) { > > unsigned cnt = 0; > > for (unsigned i = start; i > val; ++i) > > cnt++; > > return cnt; > > } > > > > For this loop, the tree optimizers currently generate: > > > > unsigned int foo (unsigned int val, unsigned int start) { > > unsigned int cnt; > > unsigned int _1; > > unsigned int _5; > > > > <bb 2> [local count: 118111600]: > > if (start_3(D) > val_4(D)) > > goto <bb 3>; [89.00%] > > else > > goto <bb 4>; [11.00%] > > > > <bb 3> [local count: 105119324]: > > _1 = start_3(D) + 1; > > _5 = -start_3(D); > > cnt_2 = _1 > val_4(D) ? _5 : 1; > > > > <bb 4> [local count: 118111600]: > > # cnt_11 = PHI <cnt_2(3), 0(2)> > > return cnt_11; > > } > > > > or perhaps slightly easier to read: > > > > if (start > val) { > > cnt = (start+1) > val ? -start : 1; > > } else cnt = 0; > > > > In this snippet, if we know start > val, then (start+1) > val unless > > start+1 overflows, i.e. (start+1) == 0 and start == ~0. > > We can use this (loop header) context to simplify the ternary > > expression to "(start != -1) ? -start : 1", which with a little help > > from match.pd can be folded to -start. Hence the optimal final value > > replacement should be: > > > > cnt = (start > val) ? -start : 0; > > > > Or as now generated by this patch: > > > > unsigned int foo (unsigned int val, unsigned int start) { > > unsigned int cnt; > > > > <bb 2> [local count: 118111600]: > > if (start_3(D) > val_4(D)) > > goto <bb 3>; [89.00%] > > else > > goto <bb 4>; [11.00%] > > > > <bb 3> [local count: 105119324]: > > cnt_2 = -start_3(D); > > > > <bb 4> [local count: 118111600]: > > # cnt_11 = PHI <cnt_2(3), 0(2)> > > return cnt_11; > > } > > > > > > We can also improve until-wrap loops that don't have a (suitable) loop > > header, as determined by simplify_using_initial_conditions. > > > > unsigned bar(unsigned val, unsigned start) { > > unsigned cnt = 0; > > unsigned i = start; > > do { > > cnt++; > > i++; > > } while (i > val); > > return cnt; > > } > > > > which is currently optimized to: > > > > unsigned int foo (unsigned int val, unsigned int start) { > > unsigned int cnt; > > unsigned int _9; > > unsigned int _10; > > > > <bb 2> [local count: 118111600]: > > _9 = start_4(D) + 1; > > _10 = -start_4(D); > > cnt_3 = val_7(D) < _9 ? _10 : 1; > > return cnt_3; > > } > > > > Here we have "val < (start+1) ? -start : 1", which again with the help > > of match.pd can be slightly simplified to "val <= start ? -start : 1" > > when dealing with unsigned types, because at the complicating value > > where start == ~0, we fortunately have -start == 1, hence it doesn't > > matter whether the second or third operand of the ternary operator is > returned. > > > > To summarize, this patch (in addition to tweaking may_be_zero in > > number_of_iterations_until_wrap) adds three new constant folding > > transforms to match.pd. > > > > X != C1 ? -X : C2 simplifies to -X when -C1 == C2. > > which is the generalized form of the simplification above. > > > > X != C1 ? ~X : C2 simplifies to ~X when ~C1 == C2. > > which is the BIT_NOT_EXPR analog of the NEGATE_EXPR case. > > > > and the "until-wrap final value replacement without context": > > > > (X + 1) > Y ? -X : 1 simplifies to X >= Y ? -X : 1 when X is unsigned, > > as when X + 1 overflows, X is -1, so -X == 1. > > > > > > This patch has been tested on x86_64-pc-linux-gnu with make bootstrap > > and make -k check with no new failures. Ok for mainline? > > +/* X != C1 ? -X : C2 simplifies to -X when -C1 == C2. */ (simplify > +(cond (ne @0 INTEGER_CST@1) (negate@3 @0) INTEGER_CST@2) (if > +((!wi::only_sign_bit_p (wi::to_wide (@1)) > + || TYPE_UNSIGNED (type) > + || TYPE_OVERFLOW_WRAPS (type) > + || (!TYPE_OVERFLOW_SANITIZED (type) > + && !TYPE_OVERFLOW_TRAPS (type) > + && !TYPE_SATURATING (type))) > > I'm wondering about TYPE_UNSIGNED && TYPE_SATURATING. > Also I think unsigned cannot trap or be sanitized and with > TYPE_OVERFLOW_WRAPS sanitizing or trapping cannot be true either. Also > unsigned types always wrap on overflow. So maybe > > (if (!TYPE_SATURATING (type) > && (TYPE_OVERFLOW_WRAPS (type) > || !wi::only_sign_bit_p (..))) > > ? > > + && wi::eq_p (wi::neg (wi::to_wide (@1)), wi::to_wide (@2))) > + @3)) > > +/* (X + 1) > Y ? -X : 1 simplifies to X >= Y ? -X : 1 when > + X is unsigned, as when X + 1 overflows, X is -1, so -X == 1. */ > +(simplify (cond (gt (plus @0 integer_onep) @1) (negate @0) > +integer_onep@2) (if (TYPE_UNSIGNED (type) > + && TYPE_UNSIGNED (TREE_TYPE (@0))) > > I think the second test is redundant since @0 participates in both the > comparison > and the condition value. > > + (cond (ge @0 @1) (negate @0) @2))) > > Otherwise looks OK to me. > > Thanks, > Richard. > > > > > 2021-11-29 Roger Sayle <ro...@nextmovesoftware.com> > > > > gcc/ChangeLog > > * tree-ssa-loop-niter.c (number_of_iterations_until_wrap): > > Check if simplify_using_initial_conditions allows us to > > simplify the expression for may_be_zero. > > * match.pd (X != C ? -X : -C -> -X): New transform. > > (X != C ? ~X : ~C -> ~X): Likewise. > > ((X+1) > Y ? -X : 1 -> X >= Y ? -X : 1): Likewise. > > > > gcc/testsuite/ChangeLog > > * gcc.dg/fold-condneg-1.c: New test case. > > * gcc.dg/fold-condneg-2.c: New test case. > > * gcc.dg/fold-condnot-1.c: New test case. > > * gcc.dg/pr101145-1.c: New test case. > > * gcc.dg/pr101145-2.c: New test case. > > > > > > Thanks in advance, > > Roger > > -- > >