On Fri, Jul 4, 2008 at 8:36 AM, Peter Otten <[EMAIL PROTECTED]> wrote: > Henning_Thornblad wrote: > >> What can be the cause of the large difference between re.search and >> grep? > > grep uses a smarter algorithm ;) > >> This script takes about 5 min to run on my computer: >> #!/usr/bin/env python >> import re >> >> row="" >> for a in range(156000): >> row+="a" >> print re.search('[^ "=]*/',row) >> >> >> While doing a simple grep: >> grep '[^ "=]*/' input (input contains 156.000 a in >> one row) >> doesn't even take a second. >> >> Is this a bug in python? > > You could call this a performance bug, but it's not common enough in real > code to get the necessary brain cycles from the core developers. > So you can either write a patch yourself or use a workaround. > > re.search('[^ "=]*/', row) if "/" in row else None > > might be good enough. >
Wow... I'm rather surprised at how slow this is... using re.match yields much quicker results, but of course it's not quite the same as re.search Incidentally, if you add the '/' to "row" at the end of the string, re.search returns instantly with a match object. @ Peter I'm not versed enough in regex to tell if this is a bug or not (although I suspect it is), but why would you say this particular regex isn't common enough in real code? filipe -- http://mail.python.org/mailman/listinfo/python-list