You probably saw that I have committed initial implementation of
TextIterator. The impetus for this is that direct indexing of Unicode
strings via [] operator is slow, very slow, at least currently. The
reason is that [] cannot simply perform random-offset indexing into
UCHar* strings. It needs to start from the beginning of the string and
iterate forward until it reaches the desired offset, because our
default unit is a codepoint, which can take up 1 or 2 UChar's.
So here are some (rough) numbers on the relative performance of
TextIterator vs. []. The script I used was a simple one (attached after
the signature). Each test was 10000 runs over 500-character string.
[] operator: 27.16373 s
TextIterator: 1.89697 s (!)
For comparison, running the same [] operator test on a 500-character
binary (old-style) string gives me 9.11334 s. Quite interesting, I'd
say.
I am not sure how we can optimize [] to be faster than the iterator
approach. Food for thought?
- Andrei
<?php
$a = str_repeat('a\U010201bcß', 100);
var_dump($a);
/* warm up the engine */
for ($x = 0; $x < 100; $x++) {
foreach (new TextIterator($a) as $c) {
}
}
/* measure [] */
$start = microtime(true);
for ($x = 0; $x < 10000; $x++) {
$len = strlen($a);
for ($i = 0; $i < $len; $i++) {
$c = $a[$i];
}
}
$end = microtime(true);
printf("[] run time: %.5f\n", $end - $start);
/* measure iterator */
$start = microtime(true);
for ($x = 0; $x < 10000; $x++) {
foreach (new TextIterator($a) as $c) {
}
}
$end = microtime(true);
printf("iterator run time: %.5f\n", $end - $start);
?>
--
PHP Internals - PHP Runtime Development Mailing List
To unsubscribe, visit: http://www.php.net/unsub.php