Mark Wooding <[EMAIL PROTECTED]>: > Sebastian "lunar" Wiesner <[EMAIL PROTECTED]> wrote: > >> I just wanted to illustrate, that the speed of the given search is >> somehow related to the complexity of the engine. >> >> Btw, other pcre implementation are as slow as Python or "grep -P". I >> tried a sample C++-code using pcre++ (a wrapper around libpcre) and saw >> it running equally long. > > So some other implementations are equally poor. I note that Perl itself > handles this case very quickly, as does Edi Weitz's CL-PPCRE library.
I don't know about the latter, but perl doesn't use any smart algorithm, it just heavily relies on memoization to gain speed. This makes perl perform poorly in other situations, as mentioned in the paper cited by Mark Dickinson: # perl -e '("a" x 100000) =~ /^(ab?)*$/;' zsh: segmentation fault perl -e '("a" x 100000) =~ /^(ab?)*$/;' In Python on the other, this pattern works fine, at the cost of performance losses on other patterns. It'd be interesting to know, how CL-PPCRE performs here (I don't know this library). > Yes, Perl-compatible `regular expressions' are more complicated than > POSIX extended regular expressions; but that doesn't mean that you have > to implement them in a dumb way. Indeed, it stands to reason that > expressions describing truly regular languages can be matched using the > same old algorithms that grep uses. I completely agree. I'd just believe, that the combination of some finite state machine for "classic" expressions with some backtracking code is terribly hard to implement. But I'm not an expert in this, probably some libraries out there already do this. In this case, it'd be time to give pythons re engine a more sophisticated implementation ;) -- Freedom is always the freedom of dissenters. (Rosa Luxemburg) -- http://mail.python.org/mailman/listinfo/python-list