Tim Bunce wrote: > > On Tue, Aug 19, 2003 at 12:07:22AM -0400, Benjamin Goldberg wrote: > > There are a number of shortcomings in the API, which I'd like to > > address here, and propose improvments for. > > Just to be sure people are keeping it in mind, I'll repost this from > Larry: > > On Wed, Jan 30, 2002 at 10:47:36AM -0800, Larry Wall wrote: > > > > For various reasons, some of which relate to the sequence-of-integer > > abstraction, and some of which relate to "infinite" strings and > > arrays, I think Perl 6 strings are likely to be represented by a list > > of chunks, where each chunk is a sequence of integers of the same size > > or representation, but different chunks can have different integer > > sizes or representations. The abstract string interface must hide > > this from any module that wishes to work at the abstract string level. > > In particular, it must hide this from the regex engine, which works on > > pure sequences in the abstract. > > Tim.
*I* was certainly keeping it in mind ;). Just for the curious, the *reasoning* behind my proposed requirements are as follows: 1/ The regex engine uses string_index all over the place. This is an O(n) operation for the utf8 encoding. This is bad. If we had real string iterators, then this would be an O(1) operation. My requirements 4..7 are a description of the time complexity which most normal people can expect of iterators. 2/ There's no way for a string to refer to other strings or pmcs, making all sorts of things (including what Larry mentioned) impossible. Or at least, if we tried, there's no way to prevent the things we're pointing to from being cleaned up out from underneath us, since we've no way of marking them as alive. This is my requirement 8, and, to a lesser degree, requirement 3. /***/ Most of everything else assumes that the solution to failing 1/ of the current API will actually be a string iterator. 3/ String iterator usage should be *simple*. Making them pmcs would mean that we'd need to temporarily anchor them. Having them as void* pointers to gc-relocatable memory means that they can become invalid at unexpected times. Obviously if we temporarily disable DOD/GC, then these can be avoided, but that has other drawbacks. This is my requirements 1 and 2. I fear that we're going to lose the requirement that iterators be simple objects, since letting them be pmcs gives us so much more flexibility. (Which, I fear, we need, if the encoding is a not-so-simple data structure (like a tree, or a lazily concatenated sequence of substrings)). It's also my requirements 9..12: if we're going to have them, use them. /***/ 4/ If we've got fast string iterators, then an Iterator.pmc object for a PerlString object won't be significantly slower than using iterators for strings directly. So... why not make the rx engine work on Iterator.pmc objects? -- $a=24;split//,240513;s/\B/ => /for@@=qw(ac ab bc ba cb ca );{push(@b,$a),($a-=6)^=1 for 2..$a/6x--$|;print "[EMAIL PROTECTED] ]\n";((6<=($a-=6))?$a+=$_[$a%6]-$a%6:($a=pop @b))&&redo;}