On Jul 5, 1:54 pm, Carl Banks <[EMAIL PROTECTED]> wrote: > I don't think you've illustrated that at all. What you've illustrated > is that one implementation of regexp optimizes something that another > doesn't. It might be due to differences in complexity; it might not. > (Maybe there's something about PCREs that precludes the optimization > that the default grep uses, but I'd be inclined to think not.)
It seems like an appropriate moment to point out *this* paper: http://swtch.com/~rsc/regexp/regexp1.html Apparently, grep and Tcl convert a regex to a finite state machine. Matching is then *very* fast: essentially linear time in the length of the string being matched, even in the worst case. Though it is possible for the size of the finite state machine to grow exponentially with the size of the regex. But not all PCREs can be converted to a finite state machine, so Perl, Python, etc. use a backtracking approach, which has exponential running time in the worst case. In particular, it's not possible to use a finite state machine to represent a regular expression that contains backreferences. Part of the problem is a lack of agreement on what 'regular expression' means. Strictly speaking, PCREs aren't regular expressions at all, for some values of the term 'regular expression'. See http://en.wikipedia.org/wiki/Regular_expression Mark -- http://mail.python.org/mailman/listinfo/python-list