Re: [PHP-DEV] Algorithmic efficiency of zend_memrchr

Benjamin Coutu Thu, 03 Nov 2016 06:35:12 -0700

The type qualifier is just copy paste relic. We'd of course have to remove 
const form the signature (and the function body) for this to work. Quickly 
looking at zend_memrchr's call sites I don't see that it would be a huge issue 
to modify that or am I missing something?


========== Original ==========
From: Levi Morrison <le...@php.net>
To: Benjamin Coutu <ben.co...@zeyos.com>
Date: Thu, 03 Nov 2016 13:23:23 +0100
Subject: Re: [PHP-DEV] Algorithmic efficiency of zend_memrchr

> 
> 
> On Thu, Nov 3, 2016 at 5:21 AM, Benjamin Coutu <ben.co...@zeyos.com> wrote:
> > Hello everyone,
> >
> > I think there are a few improvements that we could make to some fundamental 
> > algorithms.
> > I'd like to point out one particular for now, to see if there is any 
> > interest in these kind of low level code optimizations from the community.
> >
> > Please consider the following code snippet extracted from
> >
> > https://github.com/php/php-src/blob/49412756df244d94a217853395d15e96cb60e18f/Zend/zend_operators.h#L193
> >
> > static zend_always_inline const void *zend_memrchr(const void *s, int c, 
> > size_t n)
> > {
> >   const unsigned char *e;
> >   if (n <= 0) {
> >     return NULL;
> >   }
> >
> >   for (e = (const unsigned char *)s + n - 1; e >= (const unsigned char *)s; 
> > e--) {
> >     if (*e == (const unsigned char)c) {
> >       return (const void *)e;
> >     }
> >   }
> >   return NULL;
> > }
> >
> > The issue with this is that on each loop we have to do two comparison 
> > tests, one against the length boundary and one to check for the actual 
> > occurrence of the search character.
> >
> > We can avoid checking the length boundary if we use a sentinel by setting 
> > the last possible element to the character we are searching for, whereby we 
> > guarantee that we will find it and no longer need the test inside the loop. 
> > After the search we reset that character and then recheck the last 
> > iteration step to see if we have aborted the loop because of reaching the 
> > end (false positive) or because of an actual positive.
> >
> > A patch could look like this:
> >
> > static zend_always_inline const void *zend_memrchr(const void *s, int c, 
> > size_t n)
> > {
> >   const unsigned char *b;
> >   const unsigned char *e;
> >   const unsigned char t;
> >
> >   if (n > 0) {
> >     /* Initialize boundary pointers */
> >     b = (const unsigned char *)s;
> >     e = b + n - 1;
> >
> >     /* Set sentinel */
> >     t = *b;
> >     *b = (const unsigned char)c;
> >
> >     /* Search */
> >     while (*e != (const unsigned char)c) {
> >       --e;
> >     }
> >
> >     /* Restore sentinel */
> >     *b = t;
> >
> >     /* Recheck */
> >     if (e != b || t == (const unsigned char)c) {
> >       return (const void *)e;
> >     }
> >   }
> >
> >   return NULL;
> > }
> >
> > For the above example we could go one step further and test an entire 
> > longword at a time with alignment on longword boundaries, just like some 
> > modern C library implementations do. Which begs the question: Is there a 
> > reason we are not using standard memrchr here?
> >
> > This kind of approach might be applicable to other string handling 
> > functions as well. Should I look into this further?
> >
> > Cheers,
> >
> > Benjamin Coutu
> >
> > --
> >
> > Bejamin Coutu
> > ben.co...@zeyos.com
> >
> > ZeyOS, Inc.
> > http://www.zeyos.com
> >
> >
> > --
> > PHP Internals - PHP Runtime Development Mailing List
> > To unsubscribe, visit: http://www.php.net/unsub.php
> 
> An obvious issue with this is that not all strings are writable. The
> signature of the function also specifies it is a pointer to a const so
> you shouldn't be able to do this anyway.


--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php

Re: [PHP-DEV] Algorithmic efficiency of zend_memrchr

Reply via email to