At 11:54 AM 11/5/2001 -0800, Steve Fink wrote: > > >It's pretty > > >much functional, including reOneof. Still, these could be useful > > >internal functions... *ponder* > > > > I was thinking that the places they could come in really handy for were > > character classes. \w, \s, and \d are potentially a lot faster this way, > > 'specially if you throw in Unicode support. (The sets get rather a bit > > larger...) It also may make some character-set independence easier. > >But why would you be generating character classes at runtime?
Because someone does: while (<>) { next unless /[aeiou]/; } and we want that character class to be reasonably fast? >For >ASCII or iso-8859 or whatever regular ol' bytes are properly called, I >would expect \w \s \d charclasses to be constants. In fact, all >character classes would be constants. And as Dax mentioned, the >constructors for those constants would properly be internal functions. Sure, the predefined ones would be, and they'd get loaded up along with the character encoding libraries. >For UTF-32 etc., I don't know. I was thinking we'd have to have >something like a multi-level lookup table for character classes. I see >a character class as a full-blown ADT with operators for >addition/unions, subtraction/intersections, etc. Ah, point. A bitmap won't work too well with the full UTF-32 set. Having a good set of set operations would be useful for the core, though. >You aren't thinking that the regular expression _compiler_ needs to be >written in Parrot opcodes, are you? I assumed you'd reach it through >some callout mechanism in the same way that eval"" will be handled. The core of the parser's still a bit up in the air. Larry's leaning towards it being in perl. Dan --------------------------------------"it's like this"------------------- Dan Sugalski even samurai [EMAIL PROTECTED] have teddy bears and even teddy bears get drunk