[Tim Peters] >> [...] The most valuable general technique [Friedl] (eventually ;-) >> explained he called "unrolling", and consists of writing a regexp in >> the form: >> >> normal* (?: special normal* )* >> >> where the sets of characters with which `normal` and `special` can >> start are disjoint. Under most "NFA" implementation, such a regexp is >> immune to most bad behaviors, whether finding very long matches, or in >> searching a very long string that happens not to contain any matches.
[Bryan Olson] > So how general is that? Between 6 and 7 ;-) > The books just says "certain common expressions". I think the real > answer is that one can unroll exactly the true regular expressions. I'm sure he didn't have that much in mind; e.g., it's a real strain to fit a fat alternation of fixed strings into that "pattern" (cat|dog|wombat|weasel|...).' > The unrolling is essentially resolving the states of a DFA by hand. Oh yes, for those who've studied the theoretical underpinnings. Most people have not. Much of what he gives as practical advice amounts to ways to "trick" an "NFA" engine into doing what a "DFA" engine would do, but explained from the POV of how a backtracking search engine works. And it makes /sense/ at that level too, divorced from any knowledge of how a linear-time automaton might be constructed in theory; e.g., you "want to" give the search engine only one choice because you want to avoid the possibility of needing to backtrack later, not because you're picturing a theoretical linear-time automaton and striving to mimic its behavior. That explanation may not work for you, but it's effective for people who haven't studied the theory here. Such explanations also extend naturally to gimmicks like backreferences, which have no counterpart in CompSci regexps. -- http://mail.python.org/mailman/listinfo/python-list