On Jul 5, 6:44 am, "Sebastian \"lunar\" Wiesner" <[EMAIL PROTECTED]> wrote: > Carl Banks <[EMAIL PROTECTED]>: > > > > > On Jul 5, 4:12 am, "Sebastian \"lunar\" Wiesner" > > <[EMAIL PROTECTED]> wrote: > >> Paddy <[EMAIL PROTECTED]>: > > >> > On Jul 4, 1:36 pm, Peter Otten <[EMAIL PROTECTED]> wrote: > >> >> Henning_Thornblad wrote: > >> >> > What can be the cause of the large difference between re.search and > >> >> > grep? > > >> >> grep uses a smarter algorithm ;) > > >> >> > This script takes about 5 min to run on my computer: > >> >> > #!/usr/bin/env python > >> >> > import re > > >> >> > row="" > >> >> > for a in range(156000): > >> >> > row+="a" > >> >> > print re.search('[^ "=]*/',row) > > >> >> > While doing a simple grep: > >> >> > grep '[^ "=]*/' input (input contains 156.000 a in > >> >> > one row) > >> >> > doesn't even take a second. > > >> >> > Is this a bug in python? > > >> >> You could call this a performance bug, but it's not common enough in > >> >> real code to get the necessary brain cycles from the core developers. > >> >> So you can either write a patch yourself or use a workaround. > > >> >> re.search('[^ "=]*/', row) if "/" in row else None > > >> >> might be good enough. > > >> >> Peter > > >> > It is not a smarter algorithm that is used in grep. Python RE's have > >> > more capabilities than grep RE's which need a slower, more complex > >> > algorithm. > > >> FWIW, grep itself can confirm this statement. The following command > >> roughly takes as long as Python's re.search: > > >> # grep -P '[^ "=]*/' input > > >> -P tells grep to use real perl-compatible regular expressions. > > > This confirms that a particular engine might not be optimized for it, > > but it's not necessarily a reflection that the engine is more complex. > > My posting wasn't intended to reflect the differences in complexity between > normal GNU grep expressions (which are basically extended POSIX > expressions) and perl-compatible expressions. The latter just _are_ more > complex, having additional features like look aheads or non-greedy > qualifiers. > > I just wanted to illustrate, that the speed of the given search is somehow > related to the complexity of the engine.
I don't think you've illustrated that at all. What you've illustrated is that one implementation of regexp optimizes something that another doesn't. It might be due to differences in complexity; it might not. (Maybe there's something about PCREs that precludes the optimization that the default grep uses, but I'd be inclined to think not.) Carl Banks -- http://mail.python.org/mailman/listinfo/python-list