Re: Semantics for regexes

Steve Fink Wed, 01 Sep 2004 20:24:54 -0700

On Sep-01, Dan Sugalski wrote:
> 
> This is a list of the semantics that I see as needed for a regex 
> engine. When we have 'em, we'll map them to string ops, and may well 
> add in some special-case code for faster access.
> 
> *) extract substring
> *) exact string compare
> *) find string in string
> *) find first character of class X in string
> *) find first character not of class X in string
> *) find boundary between X and not-X
> *) Find boundary defined by arbitrary code (mainly for word breaks)


Huh? What do you mean by "semantics"? The only semantics needed are the
minimum necessary to answer the question "is the fred at offset i equal
to the fred X?" (Sorry, not sure if fred is actually character or
codepoint or whatever, and is probably all of them at different levels.)

We also almost certainly need to be able to do character class
comparisons, although if you assume that you can always transcode to
what the regex was compiled with, then you don't even need that --
instead, you need to be able to convert to something like a difference
list of numbered freds. But if we're talking about semantics, then yes
you need the character class manipulation.

Everything else in this list sounds like optimizations to me, and
probably not the right optimizations (I don't think it's possible to
predict what will be useful yet.)

For other things that parrot will be used for, I suspect that the first
3 will be needed.

I'm curious as to how you came up with that list; it seems to imply a
particular way of implementing the grammar engine. I would expect all of
that, barring certain optimizations, to be done directly with existing
pasm instructions.

There will be a need for saving a stack of former values of hypothetical
variables, which can also be done with pasm ops but might interact with
overloaded assignment or something wacky like that.

Re: Semantics for regexes

Reply via email to