On Wed, Sep 08, 2004 at 11:00:54PM -0700, Steve Fink wrote: : I vote for leaving all of these sorts of cases undefined. Well, : partially defined -- I'd rather we didn't allow ($a = "aaa") =~ s/a/b/g : to turn $a into "gawrsh". At the very least, define the exact number of : output and stores for "strict aka slow mode", but have an optional : optimization flag that explicitly drops those guarantees. It would allow : for more flexibility in implementations.
I don't claim to follow all this talk about "stores", but from previous Perl experience, your time in general is going to be dominated by copying characters, not by the number of operations that control it (unless by "store" you mean storing individual characters). In a few cases you can do a modification in place, in which case you should, but those cases are rather rare in real life, and getting rarer with Unicode, and often end up being tr/// instead of s/// anyway. So the main thing you have to avoid is copying all the characters twice. This is not obvious with toy examples, but think in terms of doing s/TTAGGG// on, say, the human genome. In the absence of a chunking data structure, the most efficient general substitution on a straight string is to build the new string once out of bits of the old string and the substitution, then swap that entire string in as the new definition of the string. And in an example like ($a = $genome) ~~ s/TTAGGG// you'd like to capture the idea that there's already a copy being forced in the context, so it'd be nice to use that copy to do the transformation without inducing an additional copy. There are also optimization modes where a single substitution like $genome ~~ s/TTAGGG// can be done "mostly in place". If the location to change is in the front half of the string, you only relocate characters before it. If the location is in the back half, you only relocate characters after it. And of course, if the match never matches, you should never copy anything (except in the "en passant" case). I'm probably preaching to the choir here, but I thought this all ought to be made explicit. There's a reason Perl is the language of choice for bioinformatics, and we need to be careful not to throw that away. And indeed, it would be nice to pass those benefits off to other languages running on Parrot. Larry