On Thu, Nov 3, 2016 at 5:21 AM, Benjamin Coutu <ben.co...@zeyos.com> wrote:
> Hello everyone,
>
> I think there are a few improvements that we could make to some fundamental 
> algorithms.
> I'd like to point out one particular for now, to see if there is any interest 
> in these kind of low level code optimizations from the community.
>
> Please consider the following code snippet extracted from
>
> https://github.com/php/php-src/blob/49412756df244d94a217853395d15e96cb60e18f/Zend/zend_operators.h#L193
>
> static zend_always_inline const void *zend_memrchr(const void *s, int c, 
> size_t n)
> {
>   const unsigned char *e;
>   if (n <= 0) {
>     return NULL;
>   }
>
>   for (e = (const unsigned char *)s + n - 1; e >= (const unsigned char *)s; 
> e--) {
>     if (*e == (const unsigned char)c) {
>       return (const void *)e;
>     }
>   }
>   return NULL;
> }
>
> The issue with this is that on each loop we have to do two comparison tests, 
> one against the length boundary and one to check for the actual occurrence of 
> the search character.
>
> We can avoid checking the length boundary if we use a sentinel by setting the 
> last possible element to the character we are searching for, whereby we 
> guarantee that we will find it and no longer need the test inside the loop. 
> After the search we reset that character and then recheck the last iteration 
> step to see if we have aborted the loop because of reaching the end (false 
> positive) or because of an actual positive.
>
> A patch could look like this:
>
> static zend_always_inline const void *zend_memrchr(const void *s, int c, 
> size_t n)
> {
>   const unsigned char *b;
>   const unsigned char *e;
>   const unsigned char t;
>
>   if (n > 0) {
>     /* Initialize boundary pointers */
>     b = (const unsigned char *)s;
>     e = b + n - 1;
>
>     /* Set sentinel */
>     t = *b;
>     *b = (const unsigned char)c;
>
>     /* Search */
>     while (*e != (const unsigned char)c) {
>       --e;
>     }
>
>     /* Restore sentinel */
>     *b = t;
>
>     /* Recheck */
>     if (e != b || t == (const unsigned char)c) {
>       return (const void *)e;
>     }
>   }
>
>   return NULL;
> }
>
> For the above example we could go one step further and test an entire 
> longword at a time with alignment on longword boundaries, just like some 
> modern C library implementations do. Which begs the question: Is there a 
> reason we are not using standard memrchr here?
>
> This kind of approach might be applicable to other string handling functions 
> as well. Should I look into this further?
>
> Cheers,
>
> Benjamin Coutu
>
> --
>
> Bejamin Coutu
> ben.co...@zeyos.com
>
> ZeyOS, Inc.
> http://www.zeyos.com
>
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php

An obvious issue with this is that not all strings are writable. The
signature of the function also specifies it is a pointer to a const so
you shouldn't be able to do this anyway.

--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Reply via email to