Re: Regex helper opcodes

Dan Sugalski Mon, 05 Nov 2001 12:29:36 -0800

At 11:54 AM 11/5/2001 -0800, Steve Fink wrote:
> > >It's pretty
> > >much functional, including reOneof.  Still, these could be useful
> > >internal functions... *ponder*
> >
> > I was thinking that the places they could come in really handy for were
> > character classes. \w, \s, and \d are potentially a lot faster this way,
> > 'specially if you throw in Unicode support. (The sets get rather a bit
> > larger...) It also may make some character-set independence easier.
>
>But why would you be generating character classes at runtime?


Because someone does:

   while (<>) {
         next unless /[aeiou]/;
   }

and we want that character class to be reasonably fast?

>For
>ASCII or iso-8859 or whatever regular ol' bytes are properly called, I
>would expect \w \s \d charclasses to be constants. In fact, all
>character classes would be constants. And as Dax mentioned, the
>constructors for those constants would properly be internal functions.

Sure, the predefined ones would be, and they'd get loaded up along with the 
character encoding libraries.

>For UTF-32 etc., I don't know. I was thinking we'd have to have
>something like a multi-level lookup table for character classes. I see
>a character class as a full-blown ADT with operators for
>addition/unions, subtraction/intersections, etc.

Ah, point. A bitmap won't work too well with the full UTF-32 set.

Having a good set of set operations would be useful for the core, though.

>You aren't thinking that the regular expression _compiler_ needs to be
>written in Parrot opcodes, are you? I assumed you'd reach it through
>some callout mechanism in the same way that eval"" will be handled.

The core of the parser's still a bit up in the air. Larry's leaning towards 
it being in perl.

                                        Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Re: Regex helper opcodes

Reply via email to