> >It's pretty > >much functional, including reOneof. Still, these could be useful > >internal functions... *ponder* > > I was thinking that the places they could come in really handy for were > character classes. \w, \s, and \d are potentially a lot faster this way, > 'specially if you throw in Unicode support. (The sets get rather a bit > larger...) It also may make some character-set independence easier.
But why would you be generating character classes at runtime? For ASCII or iso-8859 or whatever regular ol' bytes are properly called, I would expect \w \s \d charclasses to be constants. In fact, all character classes would be constants. And as Dax mentioned, the constructors for those constants would properly be internal functions. For UTF-32 etc., I don't know. I was thinking we'd have to have something like a multi-level lookup table for character classes. I see a character class as a full-blown ADT with operators for addition/unions, subtraction/intersections, etc. You aren't thinking that the regular expression _compiler_ needs to be written in Parrot opcodes, are you? I assumed you'd reach it through some callout mechanism in the same way that eval"" will be handled.