Re: Semantics for regexes

Aaron Sherman Thu, 02 Sep 2004 09:36:08 -0700

On Thu, 2004-09-02 at 11:27, Felix Gallo wrote:

> Although the next regex engine has to deal with the horribly
> crufty new perl6 syntax


Keep in mind that Perl 6 regexen are really just Perl 5 regexen with a
call stack and backtracking control. Absolutely everything else that I
see in P6 is either just a different syntax for the same thing (e.g.
character classes) or unrelated to the actual regex engine itself (e.g.
hypotheticals). There's nothing that I see in this that would slow down
a mundane regexp OTHER than Unicode, and in many respects P5 has already
taken that hit.

Now, that's not to say that:

        rule perl6_program { <perl5_program> | <perl6_statement>(*) }

is supposed to run as fast as a Perl 5 regexp, but that's a WHOLE other
beast, and we don't expect it to be as simple as a regexp.

Under the hood, I would expect that P6 regexen will be broken down into
"matching" and "flow control" parts, and handed off to Parrot
differently. While there might or might not be an op for the matching
part, the flow control part is just code (though code with significant
magic, I will admit).

In other words, you might see:

        rule { a <b> c }

get broken down into:

        rule { {pasm('regexp P0, P1, "a"')} <b> {pasm('regexp P0, P1, "c"')} }

As Dan points out "regexp" might not exist, or (this seems more likely)
might just be a call-back into a tiny regexp compiler that generates
Parrot bytecode for the convenience of languages for which regex is not
a core feature that the compiler would want to get its hands dirty with.

-- 
â 781-324-3772
â [EMAIL PROTECTED]
â http://www.ajs.com/~ajs

Re: Semantics for regexes

Reply via email to