Re: [PHP-DEV] Algorithm Optimizations - string search

2008-07-09 Thread Michal Dziemianko
Hello, I was again in bed for few days:/// Hope this time I have fully recovered:/ Anyways the patch for today is for auxiliary php_charmask function used by trim, str_word_count and addcslashes. It is basically just bunch of if-statements. But it can sped up by reordering them. Original n

Re: [PHP-DEV] Algorithm Optimizations - string search

2008-06-20 Thread Michal Dziemianko
Hello I am doing benchmarking all the time - not only of the patches, but also some other code that might be useful later. Here are some values for the STRRPOS patch: http://212.85.117.53/gsoc/ index.php?option=com_content&view=article&id=48:strrpos-patch- proposal&catid=36:patches&Itemid=56

Re: [PHP-DEV] Algorithm Optimizations - string search

2008-06-20 Thread Nuno Lopes
. People use PHP for other things than regular web scripting.. Nuno - Original Message - From: "Michal Dziemianko" <[EMAIL PROTECTED]> To: Sent: Friday, June 20, 2008 9:36 AM Subject: Re: [PHP-DEV] Algorithm Optimizations - string search Hello, Here goes first d

Re: [PHP-DEV] Algorithm Optimizations - string search

2008-06-20 Thread Michal Dziemianko
Hello, Here goes first diff - to keep it simple and avoid confusion I will post them one-by-one after the previous is accepted/rejected. Optimization of STRRPOS/STRRIPOS for PHP_5_3. 2 things are changed: * implementation of search loop from theta(NM) to o(N), omega(NM) - it is now the same

Re: [PHP-DEV] Algorithm Optimizations - string search

2008-06-19 Thread Michal Dziemianko
Hi, I am also sorry for delay - got ill recently and spend a day in bed after night at emergency. I am working on other things now, and hope to post some patches soon. I will create patch for zend_memnstr as you suggest and post it here probably tomorrow. I have some ideas/ implementations/

Re: [PHP-DEV] Algorithm Optimizations - string search

2008-06-17 Thread Nuno Lopes
Hi, Sorry for taking so long to answer, but I'm trying to catch up last stuff. It's known that usually to optimize things for longer inputs you usually end up making things for short inputs worst. So IMHO, I think you should have the len==1 optimization and then use the KMP algorithm. Your impl

Re: [PHP-DEV] Algorithm Optimizations - string search

2008-06-13 Thread Michal Dziemianko
Hello again I have setup small page where I am publishing all the profiling and evaluation results. It is here: http://83.168.205.202/~michal/ standard15/ So far I have put there function usage profile, zend_memnstr analysis, stripos and strrpos analysis including some charts etc. CVS diffs

Re: [PHP-DEV] Algorithm Optimizations - string search

2008-06-11 Thread Michal Dziemianko
I;ve just noticed that the diff I posted is form my test-bed, not the actual final version. the code should be: if (needle_len == 1){ return (char *)memchr(p, *needle, (end-p)); } instead of + if (needle_len == 1){ + return (char *)memchr(p, *needle, (end-p+1))

Re: [PHP-DEV] Algorithm Optimizations - string search

2008-06-11 Thread Michal Dziemianko
Hello again On Wed, Jun 11, 2008 at 9:47 AM, Stanislav Malyshev <[EMAIL PROTECTED]> wrote: > I'm not sure very big haystacks really worth the trouble - how many of them > are used? It may be interesting to see medians instead of averages for that. > But len=1 I think worth having special case. >

RE: [PHP-DEV] Algorithm Optimizations - string search

2008-06-11 Thread Texin, Tex
m: Johannes Schlüter [mailto:[EMAIL PROTECTED] > Sent: Wednesday, June 11, 2008 5:32 AM > To: Texin, Tex > Cc: Scott MacVicar; Nuno Lopes; internals@lists.php.net; > Michal Dziemianko > Subject: RE: [PHP-DEV] Algorithm Optimizations - string search > > Hi, > > On Wed,

RE: [PHP-DEV] Algorithm Optimizations - string search

2008-06-11 Thread Johannes Schlüter
Hi, On Wed, 2008-06-11 at 01:01 -0700, Texin, Tex wrote: > When I looked at the code, I assumed that it wasn't intended for > international use > I'll have to go back and look to give you details, but it doesn't work for > international use or unicode. > It would be ok for 8859-1. That's the d

Re: [PHP-DEV] Algorithm Optimizations - string search

2008-06-11 Thread Stanislav Malyshev
Hi! Here are some statistics: - average haystack length: 624.2 - average needle length: 1.9 ! -> 63% of needles of length 1 - avg length of haystacks shorter than avg: 41.0 -> 85% of all haystacks - avg length of haystacks longer than avg: 5685.11 I think it would be interesting to see same e

Re: [PHP-DEV] Algorithm Optimizations - string search

2008-06-11 Thread Michal Dziemianko
code. > It would be ok for 8859-1. > > > -Original Message- > > From: Scott MacVicar [mailto:[EMAIL PROTECTED] > > Sent: Monday, June 09, 2008 6:58 AM > > To: Nuno Lopes > > Cc: internals@lists.php.net; Michal Dziemianko > > Subject: Re: [PHP-DEV]

Re: [PHP-DEV] Algorithm Optimizations - string search

2008-06-11 Thread Michal Dziemianko
Hello everybody, I have done small comparison of the 3 algorithms: the original zend_memnstr implementation, KMP, and Boyer-Moore (inefficient, "book" implementation). The first part - real life data. I have collected the data about the parameters passed to ZEND_MEMNSTR (not only while it is calle

RE: [PHP-DEV] Algorithm Optimizations - string search

2008-06-11 Thread Texin, Tex
ilto:[EMAIL PROTECTED] > Sent: Monday, June 09, 2008 6:58 AM > To: Nuno Lopes > Cc: internals@lists.php.net; Michal Dziemianko > Subject: Re: [PHP-DEV] Algorithm Optimizations - string search > > There is rabin-karp too but its worse case is O(nm) so that > might not be idea

Re: [PHP-DEV] Algorithm Optimizations - string search

2008-06-10 Thread Michal Dziemianko
On 2008-06-09, at 14:43, Antony Dovgal wrote: On 09.06.2008 15:39, Michal Dziemianko wrote: Hello, Here: http://212.85.117.53/DIFF.txt is small patch that will speed up following functions: strpos, stripos, strrpos strripos, and probably some others (all that use zend_memnstr/php_memnstr

Re: [PHP-DEV] Algorithm Optimizations - string search

2008-06-10 Thread Michal Dziemianko
Hello On 2008-06-09, at 14:58, Scott MacVicar wrote: There is rabin-karp too but its worse case is O(nm) so that might not be ideal, perhaps we should try to compare all of them. OK. I think RK will not be much better than original piece of code which runs in time o(n + m) i most cases

Re: [PHP-DEV] Algorithm Optimizations - string search

2008-06-09 Thread Scott MacVicar
There is rabin-karp too but its worse case is O(nm) so that might not be ideal, perhaps we should try to compare all of them. Scott Nuno Lopes wrote: Hi, So some comments: - you have some problems with the indentation. We only use tabs, so please stick to that. Also, there are some lines tha

Re: [PHP-DEV] Algorithm Optimizations - string search

2008-06-09 Thread Nuno Lopes
Hi, So some comments: - you have some problems with the indentation. We only use tabs, so please stick to that. Also, there are some lines that are not indented correctly - Have you considered the Boyer-Moore algorithm? I think it's a little faster than KMP (take a look at e.g. http://www.cs.u

Re: [PHP-DEV] Algorithm Optimizations - string search

2008-06-09 Thread Antony Dovgal
On 09.06.2008 15:39, Michal Dziemianko wrote: Hello, Here: http://212.85.117.53/DIFF.txt is small patch that will speed up following functions: strpos, stripos, strrpos strripos, and probably some others (all that use zend_memnstr/php_memnstr function) There is also another thing - zend_m

Re: [PHP-DEV] Algorithm Optimizations - string search

2008-06-09 Thread Antony Dovgal
On 09.06.2008 15:39, Michal Dziemianko wrote: Hello, Here: http://212.85.117.53/DIFF.txt is small patch that will speed up following functions: strpos, stripos, strrpos strripos, and probably some others (all that use zend_memnstr/php_memnstr function) The code below is definitely wrong.

Re: [PHP-DEV] Algorithm Optimizations - string search

2008-06-09 Thread Scott MacVicar
Hi Michal, Everything looks fine here and it applies cleanly, do you think you could make a patch against HEAD with this as well? I suspect it will be different due to Unicode. Scott Michal Dziemianko wrote: Hello, Here: http://212.85.117.53/DIFF.txt is small patch that will speed up foll