At 8:24 PM -0700 9/1/04, Steve Fink wrote:
On Sep-01, Dan Sugalski wrote:

This is a list of the semantics that I see as needed for a regex engine. When we have 'em, we'll map them to string ops, and may well add in some special-case code for faster access.

 *) extract substring
 *) exact string compare
 *) find string in string
 *) find first character of class X in string
 *) find first character not of class X in string
 *) find boundary between X and not-X
 *) Find boundary defined by arbitrary code (mainly for word breaks)

Huh? What do you mean by "semantics"?

I mean "What actions does a regular expression engine need to perform," especially in the face of potentially opaque or really annoying and pushed off character classes. (Like you get with Unicode, for example -- regardless of how much Deep Evil Knowledge you have of it, it's still likely best to push off most character class handling to the Unicode library, especially if there are locale-specific overrides of some of the classes in force)


Everything else in this list sounds like optimizations to me, and
probably not the right optimizations (I don't think it's possible to
predict what will be useful yet.)

All I did was run through the current regex engine and pull out what I saw as its primitives. The grammar bits being added in for the new grammar engine need some extra functionality, but none of it involves looking in strings for things.


I'm curious as to how you came up with that list; it seems to imply a
particular way of implementing the grammar engine. I would expect all of
that, barring certain optimizations, to be done directly with existing
pasm instructions.

Yes, and some of the initial list already has ops to do those bits, though I fully plan on evil cheating versions for some extra speed.
--
Dan


--------------------------------------it's like this-------------------
Dan Sugalski                          even samurai
[EMAIL PROTECTED]                         have teddy bears and even
                                      teddy bears get drunk

Reply via email to