Tim Peters wrote: > [...] The most valuable general technique he (eventually ;-) > explained he called "unrolling", and consists of writing a regexp in > the form: > > normal* (?: special normal* )* > > where the sets of characters with which `normal` and `special` can > start are disjoint. Under most "NFA" implementation, such a regexp is > immune to most bad behaviors, whether finding very long matches, or in > searching a very long string that happens not to contain any matches.
[...] > As a result, you can use it with reasonably high confidence. For > /why/ that's so, read the rest of his book ;-) In effect, it gives > the search engine only one possible course of action at each character > in the target string. So long as that's "normal", it has to stay in > the "normal" part of the regexp, and as soon as it's "special" it has > to leave the "normal" part. The intent is to eliminate any > possibility for backtracking, thus ensuring predictable runtime and > ensuring that no form of internal backtracking stack needs to be used > (let alone hit its limit). So how general is that? The books just says "certain common expressions". I think the real answer is that one can unroll exactly the true regular expressions. The unrolling is essentially resolving the states of a DFA by hand. -- --Bryan -- http://mail.python.org/mailman/listinfo/python-list