On Thu, Feb 18, 2021 at 12:35 AM Hans-Peter Nilsson via Gcc-patches <gcc-patches@gcc.gnu.org> wrote: > > If we're not going to eliminate the clz, it's better for the > comparison to use that result than its input, so we don't > extend the lifetime of the input. Also, an additional use > of the result is more likely cheaper than a compare of the > input, in particular considering that the clz may have made > available a non-zero condition matching the original use.
Of course the non-zero compare can execute concurrently with the clz if we use the input which can result in faster operation. It's one of the many cases where local folding doesn't tell us enough. One of the usual issues with single_use checks is that the IL is not fully DCEd while we fold it and thus dead uses can make sth !single_use even though it really is single_use. Also if there's a use of the input live at the point of the compare we want to replace it's likely better to do the transform. > The "s" modifier doesn't stop this situation, as the > transformation wouldn't result in "an expression with more > than one operator"; a gating single_use condition on the > result is necessary. Note that :s was done this way for the use in VN where we can this way CSE c != 0 with an earlier a >= 0. It might be an interesting idea to allow match.pd consumers to "switch" behavior of :s according to their needs. > Regtested cris-elf and x86_64-linux. > Ok to commit? OK. Thanks, Richard. > gcc: > PR tree-optimization/99142 > * match.pd (clz cmp 0): Gate replacement on single_use of clz result. > > gcc/testsuite: > PR tree-optimization/99142 > * testsuite/gcc.dg/tree-ssa/pr99142.c: New test. > --- > gcc/match.pd | 4 ++-- > gcc/testsuite/gcc.dg/tree-ssa/pr99142.c | 14 ++++++++++++++ > 2 files changed, 16 insertions(+), 2 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/tree-ssa/pr99142.c > > diff --git a/gcc/match.pd b/gcc/match.pd > index e14f69744d7e..760f773cf1b1 100644 > --- a/gcc/match.pd > +++ b/gcc/match.pd > @@ -6315,8 +6315,8 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT) > (for op (eq ne) > cmp (lt ge) > (simplify > - (op (clz:s @0) INTEGER_CST@1) > - (if (integer_zerop (@1)) > + (op (clz:s@2 @0) INTEGER_CST@1) > + (if (integer_zerop (@1) && single_use (@2)) > /* clz(X) == 0 is (int)X < 0 and clz(X) != 0 is (int)X >= 0. */ > (with { tree stype = signed_type_for (TREE_TYPE (@0)); > HOST_WIDE_INT val = 0; > diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr99142.c > b/gcc/testsuite/gcc.dg/tree-ssa/pr99142.c > new file mode 100644 > index 000000000000..1781a89de32a > --- /dev/null > +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr99142.c > @@ -0,0 +1,14 @@ > +/* { dg-do compile } */ > +/* { dg-options "-O2 -fdump-tree-optimized" } */ > +/* { dg-final { scan-tree-dump-not " >= 0\\)" "optimized" } } */ > +int f(int a, int *b, int *d) > +{ > + int c = __builtin_clz(a); > + > + *b = c; > + > + if (c != 0) > + *d = c; > + > + return c; > +} > -- > 2.11.0 >