Hi Michal,
The GSOC program lets us share practical advice that we've learnt from
experience.
Looking at
http://212.85.117.53/gsoc/index.php?
option=com_content&view=article&id=49:strripos-possible-
patches&catid=36:patches&Itemid=56
How do the algorithms work in practice? I think timing results should
be given.
Also the profiling results aren't explained well: it took me a while
to work out that your profiling was not time using the first suggested
algorithm, but was occurence of a situation. I know you mentioned
this before, but I'd forgotten. That's exactly where project
documents (like your page) should help the busy reader by being very
explicit.
My experience is that project specifications need to be self
contained. Also they can often been improved by stating what hasn't
or won't be covered - this leave no room for ambiguity. For example,
if you don't plan to do any benchmarks, say so.
Keep up the good work,
Chris
Michal Dziemianko wrote:
> Hello,
> Here goes first diff - to keep it simple and avoid confusion I
will post
> them one-by-one after the previous is accepted/rejected.
> Optimization of STRRPOS/STRRIPOS for PHP_5_3.
>
> 2 things are changed:
> * implementation of search loop from theta(NM) to o(N), omega(NM)
- it
> is now the same as original zend_memnstr - using zend_memrchr
instead of
> memchr function. Here is comparison of performance:
> http://212.85.117.53/gsoc/index.php?
option=com_content&view=article&id=48:strrpos-patch-
proposal&catid=36:patches&Itemid=56
>
> This implementation is faster than original for any reasonable
input (a
> little slower for "aaab" in "aaaaaaaab", but that is really bad
case),
> and I think in this case we should not bother with KMP/BM as I
haven't
> found any call to this functions with haystack longer than 100
(NOTE: on
> MY server, over ~1h - this can be different for wikipedia
guys;))). In
> case somebody needs KMP - I have it implemented to work
"backwards" thus
> suitable for strrpos.
>
> * second thing is pre-lowering the haystack in strripos in case of
> needle len=1 - why is explained here:
> http://212.85.117.53/gsoc/index.php?
option=com_content&view=article&id=49:strripos-possible-
patches&catid=36:patches&Itemid=56
>
>
> Here is the diff:
> --------------------------------------------------
> diff -u -d -r1.445.2.14.2.69.2.22 string.c
> --- ext/standard/string.c 27 May 2008 10:29:33 -0000
> 1.445.2.14.2.69.2.22
> +++ ext/standard/string.c 20 Jun 2008 08:08:34 -0000
> @@ -1876,18 +1876,19 @@
>
> if (needle_len == 1) {
> /* Single character search can shortcut memcmps */
> - while (e >= p) {
> - if (*e == *needle) {
> - RETURN_LONG(e - p + (offset > 0 ?
offset
> : 0));
> - }
> - e--;
> - }
> + if ((e = (char *)zend_memrchr(p, *needle, e - p
+ 1))
> != NULL) {
> + RETURN_LONG(e - p + (offset > 0 ?
offset : 0));
> + }
> RETURN_FALSE;
> }
>
> while (e >= p) {
> - if (memcmp(e, needle, needle_len) == 0) {
> - RETURN_LONG(e - p + (offset > 0 ?
offset : 0));
> + if ((e = (char *)zend_memrchr(p, *needle, e - p +
1)) !=
> NULL) {
> + if (!memcmp(needle, e, needle_len)) {
> + RETURN_LONG(e - p + (offset > 0 ?
offset
> : 0));
> + }
> + } else {
> + RETURN_FALSE;
> }
> e--;
> }
> @@ -1925,6 +1926,11 @@
> if ((haystack_len == 0) || (needle_len == 0)) {
> RETURN_FALSE;
> }
> +
> + haystack_dup = estrndup(haystack, haystack_len);
> + php_strtolower(haystack_dup, haystack_len);
> + needle_dup = estrndup(needle, needle_len);
> + php_strtolower(needle_dup, needle_len);
>
> if (needle_len == 1) {
> /* Single character search can shortcut memcmps
> @@ -1932,34 +1938,36 @@
> if (offset >= 0) {
> if (offset > haystack_len) {
> php_error_docref(NULL TSRMLS_CC,
> E_NOTICE, "Offset is greater than the length of haystack string");
> + efree(needle_dup);
> + efree(haystack_dup);
> RETURN_FALSE;
> }
> - p = haystack + offset;
> - e = haystack + haystack_len - 1;
> + p = haystack_dup + offset;
> + e = haystack_dup + haystack_len - 1;
> } else {
> - p = haystack;
> + p = haystack_dup;
> if (-offset > haystack_len || offset < -
INT_MAX) {
> php_error_docref(NULL TSRMLS_CC,
> E_NOTICE, "Offset is greater than the length of haystack string");
> + efree(needle_dup);
> + efree(haystack_dup);
> RETURN_FALSE;
> }
> - e = haystack + haystack_len + offset;
> + e = haystack_dup + haystack_len + offset;
> }
> - /* Borrow that ord_needle buffer to avoid repeatedly
> tolower()ing needle */
> - *ord_needle = tolower(*needle);
> +
> while (e >= p) {
> - if (tolower(*e) == *ord_needle) {
> + if (*e == *needle_dup) {
> + efree(needle_dup);
> + efree(haystack_dup);
> RETURN_LONG(e - p + (offset > 0 ?
offset
> : 0));
> }
> e--;
> }
> + efree(needle_dup);
> + efree(haystack_dup);
> RETURN_FALSE;
> }
>
> - needle_dup = estrndup(needle, needle_len);
> - php_strtolower(needle_dup, needle_len);
> - haystack_dup = estrndup(haystack, haystack_len);
> - php_strtolower(haystack_dup, haystack_len);
> -
> if (offset >= 0) {
> if (offset > haystack_len) {
> efree(needle_dup);
> @@ -1985,11 +1993,13 @@
> }
>
> while (e >= p) {
> - if (memcmp(e, needle_dup, needle_len) == 0) {
> - efree(haystack_dup);
> - efree(needle_dup);
> - RETURN_LONG(e - p + (offset > 0 ?
offset : 0));
> - }
> + if ((e = (char *)zend_memrchr(p, *needle, e - p +
1)) !=
> NULL) {
> + if (!memcmp(needle, e, needle_len)) {
> + efree(haystack_dup);
> + efree(needle_dup);
> + RETURN_LONG(e - p + (offset > 0 ?
offset
> : 0));
> + }
> + }
> e--;
> }
> ----------------------------------------------
>
> Tested and PASS. Valgrind does not show any memory leaks, but if
> somebody more competent can confirm it, than I will really
appreciate it.
> Cheers,
> Michal
>
> PS If somebody can take a look at that:
> http://212.85.117.53/gsoc/index.php?
option=com_content&view=article&id=52:small-
buginconsistence&catid=38:other&Itemid=58
> it would be also great. If it should be the same in both
functions than
> I have patch and test cases for it prepared.
>
>
--
Christopher Jones, Oracle
Email: [EMAIL PROTECTED] Tel: +1 650 506 8630
Blog: http://blogs.oracle.com/opal/ Free PHP Book: http://
tinyurl.com/f8jad