On Tue, Jun 21, 2022 at 11:12:15AM -0700, Noah Goldstein wrote:
> This patch allows for strchr(x, c) to the replace with memchr(x, c,
> strlen(x) + 1) if strlen(x) has already been computed earlier in the
> tree.
> 
> Handles PR95821: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95821
> 
> Since memchr doesn't need to re-find the null terminator it is faster
> than strchr.
> 
> bootstrapped and tested on x86_64-linux.
> 
>       PR tree-optimization/95821
> 
> gcc/
> 
>       * tree-ssa-strlen.cc (strlen_pass::handle_builtin_strchr): Emit
>       memchr instead of strchr if strlen already computed.
> 
> gcc/testsuite/
> 
>       * c-c++-common/pr95821-1.c: New test.
>       * c-c++-common/pr95821-2.c: New test.
>       * c-c++-common/pr95821-3.c: New test.
>       * c-c++-common/pr95821-4.c: New test.
>       * c-c++-common/pr95821-5.c: New test.
>       * c-c++-common/pr95821-6.c: New test.
>       * c-c++-common/pr95821-7.c: New test.
>       * c-c++-common/pr95821-8.c: New test.

Sorry for the delay.

> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/pr95821-1.c
> @@ -0,0 +1,15 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler "memchr" } } */

Please don't scan assembler, whether memchr will expand
to a call or be expanded inline etc. is not known.
Better use "-O2 -fdump-tree-optimize" in dg-options
and scan the optimized dump for "memchr \\\(".
Ditto for other tests.

> @@ -2452,32 +2459,96 @@ strlen_pass::handle_builtin_strchr ()
>             fprintf (dump_file, "Optimizing: ");
>             print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
>           }
> -       if (si != NULL && si->endptr != NULL_TREE)
> +       /* Three potential optimizations assume t=strlen (s) has already been
> +          computed:
> +             1. strchr (s, chr) where chr is known to be zero -> t

-> s + t
rather than
-> t
actually.

> +             2. strchr (s, chr) where chr is known not to be zero ->
> +                memchr (s, chr, t)
> +             3. strchr (s, chr) where chr is not known to be zero or

nor instead of or?

> +                non-zero -> memchr (s, chr, t + 1).  */
> +       if (!is_strchr_zerop)
>           {
> -           rhs = unshare_expr (si->endptr);
> -           if (!useless_type_conversion_p (TREE_TYPE (lhs),
> -                                           TREE_TYPE (rhs)))
> -             rhs = fold_convert_loc (loc, TREE_TYPE (lhs), rhs);
> +           /* If its not strchr (s, zerop) then try and convert to
> +              memchr since strlen has already been computed.  */
> +           tree fn = builtin_decl_explicit (BUILT_IN_MEMCHR);
> +
> +           /* Only need to check length strlen (s) + 1 if chr may be zero.
> +              Otherwise the last chr (which is known to be zero) can never
> +              be a match.  */
> +           bool chr_nonzero = false;
> +           if (TREE_CODE (chr) == INTEGER_CST
> +               && integer_nonzerop (fold_convert (char_type_node, chr)))
> +             chr_nonzero = true;
> +           else if (TREE_CODE (chr) == SSA_NAME
> +                    && CHAR_TYPE_SIZE < INT_TYPE_SIZE)
> +             {
> +               value_range r;
> +               /* Try to determine using ranges if (char) chr must
> +                  be always 0.  That is true e.g. if all the subranges

must be always non-zero ?

> +                  have the INT_TYPE_SIZE - CHAR_TYPE_SIZE bits
> +                  the same on lower and upper bounds.  */

That is actually not enough, see below.

> +               if (get_range_query (cfun)->range_of_expr (r, chr, stmt)
> +                   && r.kind () == VR_RANGE)
> +                 {
> +                   wide_int mask
> +                       = wi::mask (CHAR_TYPE_SIZE, true, INT_TYPE_SIZE);

Wrong indentation, = should be 2 columns left of wide_int.

> +                   for (unsigned i = 0; i < r.num_pairs (); ++i)
> +                     if ((r.lower_bound (i) & mask)
> +                         != (r.upper_bound (i) & mask))
> +                       {
> +                         chr_nonzero = false;
> +                         break;
> +                       }

This else if actually can't do what it indends to, because
chr_nonzero is initialized to false at the start and in the loop you
also just set it to false, so it is always false.
You need to add chr_nonzero = true; before the for loop above.
With that, all the above test proves is that there is no range like
[15, 257] where it would include 256 in the middle of the range or
at the end.  But the above doesn't clear chr_nonzero on ranges like
[0, 32] or [256, 511] where (char) chr can still be zero.
So, the test should be:
                        if ((r.lower_bound (i) & mask)
                            != (r.upper_bound (i) & mask)
                            || (r.lower_bound (i) & ~mask) == 0)
or so, that will rule out also the above ranges and if one just has ranges
like:
[1, 32] U [48, 56] U [257, 511]
all is fine, (char) chr is non-zero.

But this also shows that the testsuite coverage is insufficient because
nothing caught this.
I don't see almost any tests where the second argument to strchr would be
constant (ideally check for all of 0, ~0 & ~(unsigned char) ~0, ' ',
(~0 & ~(unsigned char) ~0) + ' ') - I see you have one test with
if (c != 0x100) return else strchr which effectively is strchr (, 0x100)
and one if (c != 0) return else strchr which has c range of ~[0, 0] with
which you can't do much (just can verify that we don't treat that as
(char) c can't be zero).  Beyond the tests with constant strchr arguments
(and I think you want to check in each case if there is
"= slen\[a-zA-Z.0-9_]* \\\+ 1;"
or not (and how many times if you e.g. stick more tests into one source
file, ideally all where you want the + 1 and in another one all that should
not have it)) it would be nice to have at least some tests where you
test the above problematic cases, say something like:
  if (c < 256)
    {
      if (c < 1 || c > 64)
        return ...;
    }
  else
    {
      if (c < 257 || c > 511)
        return ...;
    }
...
  strchr (..., c);
c above should be (needs to be verified in the debugger) [1, 64] U [257, 511]
and so chr_nonzero.  Similarly construct cases like [1, 32] U [48, 56] U [257, 
511]
(chr_nonzero) or [0, 32] U [256, 511] (unknown whether c is zero or
non-zero) or [15, 257] (unknown too).

        Jakub

Reply via email to