Hi, 

This is the first time I am asking for a design review for fixing a GCC 
enhancement request, Let me know if I need to send this email to other mailing 
list as well.

I have been studying PR 78809 for some time
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809 
<https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809>

with a lot of help from Wilco and other people, and detailed study about the 
previous discussion and current GCC behavior, I was able to come up with the 
following writeup
(basically serve as a design doc), and ready for implementation. 

Please take a look at it, and let me know any comments and suggestions:

thanks a lot.

Qing

========================================
str(n)cmp and memcmp optimization in gcc
-- A design document for PR78809

11/01/2017

Qing Zhao
===========================================================

0. Summary:

   For PR 78809 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=78809,
   Will add the following str(n)cmp and memcmp optimizations into GCC 8:

   A. for strncmp (s1, s2, n) 
      if one of "s1" or "s2" is a constant string, "n" is a constant, and 
larger than the length of the constant string:
      change strncmp (s1, s2, n) to strcmp (s1, s2);

   B. for strncmp (s1, s2, n) (!)= 0 or strcmp (s1, s2) (!)= 0
      if the result is ONLY used to do a simple equality test against zero, one 
of "s1" or "s2" is a small constant string, n is a constant, and the other 
non-constant string is guaranteed to not read beyond the end of the string:
      change strncmp (s1, s2, n) or strcmp (s1, s2) to corresponding memcmp 
(s1, s2, n); 

      (NOTE, currently, memcmp(s1, s2, N) (!)=0 has been optimized to a simple 
sequence to access all bytes and accumulate the overall result in GCC by 
       https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52171
      ) 
      as a result, such str(n)cmp call would like to be replaced by simple 
sequences to access all types and accumulate the overall results. 

   C. for strcmp (s1, s2), strncmp (s1, s2, n), and memcmp (s1, s2, n)
      if the result is NOT used to do simple equality test against zero, one of 
"s1" or "s2" is a small constant string, n is a constant, and the Min value of 
the length of the constant string and "n" is smaller than a predefined 
threshold T, 
      inline the call by a byte-to-byte comparision sequence to avoid calling 
overhead. 

   A would like to be put in gimple fold phase (in routine 
"gimple_fold_builtin_string_compare" of gimple-fold.c)
   B would like to be put in strlen phase (add new "handle_builtin_str(n)cmp" 
routines in tree-ssa-strlen.c)
   C would like to be put in expand phase (tree-ssa-strlen.c or builtins.c): 
run-time performance testing is needed to decide the predefined threshold T. 

   The reasons to put the optimizations in the above order are:

   * A needs very simple information from source level, and A should be BEFORE 
B to simplify algorithm in B, gimple-fold phase should be a good place to do 
it. (Currently, similar transformation such as replacing strncat with strcat is 
put in this phase);
   * B needs type information (size and alignment) for the safety checking, and 
B should be BEFORE expand phase to utilize the available memcmp optimization in 
expand phase, strlen phase is a good place to do it. 
   * C would like to replace the call to byte-to-byte comparision, since we 
want some high-level optimization been applied first on the calls, the inlining 
of the call might be better to be done in the late stage of the tree 
optimization stage.  expand phase is a good place to do it. 
 
   These 3 optimization can be implemented seperated, 3 patches might be 
needed. 
 
the following are details to explain the above.

1. some background:

   #include <string.h>
   int strcmp(const char *s1, const char *s2);
   int strncmp(const char *s1, const char *s2, size_t n);
   int memcmp(const void *s1, const void *s2, size_t n);

   • strcmp compares null-terminated C strings
   • strncmp compares at most N characters of null-terminated C strings
   • memcmp compares binary byte buffers of N bytes.

The major common part among these three is:

   * they all return an integer less than, equal to, or greater than zero if s1 
is found, respectively, to be less than, to match, or be greater than s2.

The major different part among these three is:

   * both strcmp and strncmp might early stop at NULL terminator of the 
compared strings. but memcmp will NOT early stop, it means to compare exactly N 
bytes of both buffers.
   * strcmp compare the whole string, but strncmp only compare the first n 
chars (or fewer, if the string ends sooner) of the string.  

So, when optimizing memcmp and str(n)cmp, we need to consider the following:

   * The compiler can compare multiple bytes at the same time and doesn't have 
to worry about beyond the end of a string for memcmp, but have to worry about 
read beyond the end of a string for str(n)cmp when comparing multiple bytes at 
the same time because it might early stop at Null terminator. 
   * when the "n" of strncmp is sufficiently large, strncmp means to compare 
the whole strings, its behavior is same as strcmp. however, strncmp is likely 
to be moderately slower because it has to keep track of a counter. therefore, 
using strcmp to replace strncmp when "n" is big enough might be better for the 
performance.


2. Optimization:

   Three possible optimization applied on calls to str(n)cmp and memcmp:

   A.  When the "n" of strncmp is larger than the length of constant string 
"s1" or "s2", it can be replaced with corresponding strcmp. 

   B.  When the result of the str(n)cmp or memcmp is ONLY used to do a simple 
equality test against zero, the compiler is allowed to generate a simpler 
sequence to access all bytes and accumulate the overall result when guarantee 
that no read beyond the end of a string.
 
   C.  When the result of the str(n)cmp or memcmp is NOT used to do simple 
equality test against zero, if one of the s1 or s2 is a small constant string, 
it might be better for the compiler to inline the call by a byte-to-byte 
comparision sequence to avoid calling overhead. 

3. The current situation of GCC:

   For 2.A strncmp is replaced by strcmp when "n" is large enough:
    * strncmp(p, "fishi", 6)
    * __builtin_strncmp("fishi", q, 6)

   GLIBC used to replace the strncmp with strcmp when "n" is large enough, but 
this optimization was removed in: 
   https://sourceware.org/git/?p=glibc.git;a=commit;h=f7db120f67d853e0cfa2

   The reasons to delete it from glibc and move to GCC are:
      ** when doing it in glibc, the user cannot disable it by using 
-fno-builtin-strcmp/strncmp;
      ** __builtin_strncmp cannot be replaced similarly;
      ** when compile C++ or not using glibc, the transformation is not applied.

   For 2.B when the result is used to do equality test against zero:

    * memcmp (!)= 0  
      is currently optimized by GCC to use a simple sequcence to access all 
bytes and accumulate the overall result. 

      all the following patterns are optimized:
      memcmp (pa, qa, 5) != 0
      __builtin_memcmp (pa, qa, 29) != 0
      memcmp (p, "fishi", 5) != 0
      __builtin_memcmp (p, "fishiiii", 8) != 0

    * strncmp (!)= 0 
      the following pattersn are optimized ONLY when size is 1
      strncmp (pa, qa, 1) != 0
      strncmp (p, "f", 1) != 0
      __builtin_strncmp (pa, qa, 1) != 0
      __builtin_strncmp (p, "f", 1) != 0
     
    * strcmp (!)== 0
      the following patterns is optimized Only when size is 0 (by gimple)
      __builtin_strcmp (p, "") != 0

      the following patterns is optimized when size <= 3 (by glibc), this is 
for 2.B.
      strcmp (p, "abc") != 0

See https://gcc.gnu.org/bugzilla/show_bug.cgi?id=52171

   For 2.C when the result is Not used to do equality test against zero, but 
one of the s1 or s2 is small constant string, it might be better to inline the 
call by a byte-to-byte comparision sequence to avoid calling overhead.

    * memcmp
      is currently optimized by glibc ONLY when size is 1;

    * strncmp
      is currently optimized by gimple-fold phase ONLY when size is 1;

    * strcmp
      was inlined by byte-to-byte comparison by glibc when size <= 3, but this 
functionality was removed from the recent Glibc in:
      https://sourceware.org/git/?p=glibc.git;a=commit;h=f7db120f67d853e0cfa2

      the reasons to delete it from glibc and move to GCC are:
      ** when doing it in glibc, the user cannot disable it by using 
-fno-builtin-strcmp/strncmp;
      ** __builtin_str(n)cmp cannot be expanded similarly;
      ** when compile C++ or not using glibc, the expansion is not applied.

4. Missing optimization for str(n)cmp and memcmp:

   For 2.A, we miss the whole optimization in GCC.

   For 2.B, we miss the following optimization:

   str(n)cmp compare with constant string, when the result is used to do 
equality test against zero, if satisfy safety checking to read beyond the end 
of the string, we can transform them to corresponding memcmp, which has been 
optimized well in GCC.

   For 2.C, we miss the whole optimization in GCC. 

5. Implementation:
    * for 2.A strncmp is replaced by strcmp when "n" is large enough

      strncmp (s, STR, C) in which, s is a pointer to a string, STR is a 
constant string, C is a constant.

      if (strlen(STR) < C)
      {
        it can be safely transformed to
        strcmp (s, STR)
      }

    * for 2.B str(n)cmp (!)= 0 is replaced by memcmp (!)= 0 when safety 
checking is passed. 

      strcmp (s, STR) (!)= 0  in which, s is a pointer to a string, STR is a 
constant string.

      if  (max(sizeof_array(s), alignments(s)) > strlen(STR))
      {
        it can be safely transformed to
        memcmp (s, STR, strlen(STR)+1) (!)= 0
      } 

      strncmp (s, STR, C) (!)= 0  in which, s is a pointer to a string, STR is 
a constant string, C is a constant.

      if (C <= strlen(STR) && max(sizeof_array(s), alignments(s)) > C) 
      {
        it can be safely transformed to 
        memcmp (s, STR, C) (!)= 0
      }

      (when C > strlen(STR), the strncmp has been replaced with corresponding 
strcmp already).

    * for 2.C strcmp is expanded when n is small than threshold T. 

      strcmp (s, STR) in which, s is a pointer to a string, STR is a constant 
string

      if ((n = strlen(STR)) <= T)   /* T might be 3, we can do experiment to 
decide it */
      {
        result = s[0] - STR[0];
        if (result == 0) 
        {
          result = s[1] - STR[1];
          ...
          ...
          if (result == 0) 
            result = s[n];
        }
      }

      strncmp (s, STR, C) in which, s is a pointer to a string, STR is a 
constant string, C is a constant.

      if ((C <= strlen(STR) && C <= T)  /* T might be 3, do experiments to 
decide it */
      {
        result = s[0] - STR[0];
        if (result == 0) 
        {
          result = s[1] - STR[1];
          ...
          ...
          if (result == 0)
            result = s[C];
        }
      }

      memcmp (s, STR, C) in which, s is a pointer to a string, STR is a 
constant string, C is a constant
      similar as strncmp. 

Reply via email to