Benjamin Goldberg writes: > I *really* *really* want string iterators. The current API for > iterating through the characters of a string is, IMHO, vastly > insufficient.
Not only because it's inconvenient, but it's also essential for doing pattern matching efficiently on some multibyte encodings, most notably UTF-8. > The following are what I want for string iterators: > > 5/ It should take O(n) time to advance an iterator n characters > (either forwards or backwards). It would be nice if it took O(1) time, > but it's not necessary. And also impossible for certain encodings. O(n) should suffice. > 6/ It should take O(1) time to decode whatever characters are at the > iterator. > > 7/ If two iterators are N characters apart, it should take O(N) time > to measure that distance. > > 8/ The encoding/iterator API should be sufficiently complete to allow > someone to write a character-rope string type, and have it work > seamlessly with other strings. More importantly, it should be possible to write a lazy string, perhaps generated from a filehandle. There are downsides to this, however. The more indirection we put in, especially in the form of function calls, the more efficiency we lose when things need to be tight and local. This is particularly true if we're going to do pattern matching at the bytecode level as opposed to the op level. > 9/ New ops which provide access to the string iterator API. Yes. What is going to be used to store an iterator. An I reg, a P reg? If it's a PMC, would it be possible to just implement the iterator itself as a PMC, and use the standard iterator vtable methods (which are?) for motion and dereferencing? Again, that involves a vtable overhead and doesn't lend itself to JIT very well (which is very, very important). > 10/ Add methods to PerlString to make it compatible with Iterator. > > 11/ Any string_ function which takes a character index as a > parameter, should be able to take a string iterator. > > 12/ The rx engine should use the new ops. > > 12a/ We should be able to use the rx engine to "match" a stream of > values from an Iterator PMC. Whether this Iterator is crawling over a > PerlString, or PerlArray, or something else, shouldn't matter to the rx > engine. Luke > -- > $a=24;split//,240513;s/\B/ => /for@@=qw(ac ab bc ba cb ca > );{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print "[EMAIL PROTECTED] > ]\n";((6<=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))&&redo;}