On Tuesday, June 18, 2013 4:05:25 PM UTC-4, Antoine Pitrou wrote: > One invokes a fast special-purpose substring searching routine (the > str.__contains__ operator), the other a generic matching engine able to > process complex patterns. It's hardly a surprise for the specialized routine > to be faster.
Except that the complexity in regexes is compiling the pattern down to a FSM. Once you've got the FSM built, the inner loop should be pretty quick. In C, the inner loop for executing a FSM should be something like: for(char* p = input; p; ++p) { next_state = current_state[*p]; if (next_state == MATCH) { break; } } which should compile down to a couple of machine instructions which run entirely in the instruction pipeline cache. But I'm probably simplifying it more than I should :-) > (to be fair, on CPython there's also the fact that operators are faster > than method calls, so some overhead is added by that too) I've been doing some experimenting, and I'm inclined to believe this is indeed a significant part of it. I also took some ideas from André Malo and factored out some name lookups from the inner loop. That bummed me another 10% in speed. -- http://mail.python.org/mailman/listinfo/python-list