Re: Why is regex so slow?

Roy Smith Tue, 18 Jun 2013 13:28:39 -0700

On Tuesday, June 18, 2013 4:05:25 PM UTC-4, Antoine Pitrou wrote:

> One invokes a fast special-purpose substring searching routine (the
> str.__contains__ operator), the other a generic matching engine able to
> process complex patterns. It's hardly a surprise for the specialized routine
> to be faster.


Except that the complexity in regexes is compiling the pattern down to a FSM.  
Once you've got the FSM built, the inner loop should be pretty quick. In C, the 
inner loop for executing a FSM should be something like:

for(char* p = input; p; ++p) {
    next_state = current_state[*p];
    if (next_state == MATCH) {
        break;
   }
}

which should compile down to a couple of machine instructions which run 
entirely in the instruction pipeline cache.  But I'm probably simplifying it 
more than I should :-)

> (to be fair, on CPython there's also the fact that operators are faster
> than method calls, so some overhead is added by that too)

I've been doing some experimenting, and I'm inclined to believe this is indeed 
a significant part of it.  I also took some ideas from André Malo and factored 
out some name lookups from the inner loop.  That bummed me another 10% in speed.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Why is regex so slow?

Reply via email to