On Wed, Sep 08, 2004 at 11:00:54PM -0700, Steve Fink wrote:
: I vote for leaving all of these sorts of cases undefined. Well,
: partially defined -- I'd rather we didn't allow ($a = "aaa") =~ s/a/b/g
: to turn $a into "gawrsh". At the very least, define the exact number of
: output and stores for "strict aka slow mode", but have an optional
: optimization flag that explicitly drops those guarantees. It would allow
: for more flexibility in implementations.

I don't claim to follow all this talk about "stores", but from previous
Perl experience, your time in general is going to be dominated by
copying characters, not by the number of operations that control it
(unless by "store" you mean storing individual characters).  In a few
cases you can do a modification in place, in which case you should,
but those cases are rather rare in real life, and getting rarer with
Unicode, and often end up being tr/// instead of s/// anyway.

So the main thing you have to avoid is copying all the characters
twice.  This is not obvious with toy examples, but think in terms
of doing s/TTAGGG// on, say, the human genome.  In the absence of a
chunking data structure, the most efficient general substitution on
a straight string is to build the new string once out of bits of the
old string and the substitution, then swap that entire string in as
the new definition of the string.  And in an example like

    ($a = $genome) ~~ s/TTAGGG//

you'd like to capture the idea that there's already a copy being forced
in the context, so it'd be nice to use that copy to do the transformation
without inducing an additional copy.

There are also optimization modes where a single substitution like

    $genome ~~ s/TTAGGG//

can be done "mostly in place".  If the location to change is in the
front half of the string, you only relocate characters before it.
If the location is in the back half, you only relocate characters
after it.

And of course, if the match never matches, you should never copy anything
(except in the "en passant" case).

I'm probably preaching to the choir here, but I thought this all ought
to be made explicit.  There's a reason Perl is the language of choice
for bioinformatics, and we need to be careful not to throw that away.
And indeed, it would be nice to pass those benefits off to other
languages running on Parrot.

Larry

Reply via email to