".*?" is a "not greedy" match, which is significantly more difficult to handle than a normal ".*". so the performance will depend on quite complex details of how the regular expression engine is implemented. it wouldn't surprise me if perl was better here, because it comes from a background with a much stronger emphasis on regular expressions.
i know that not an exact answer, but unless you find the author of the re library i am not sure you will get a much better explanation. it comes down to whether the regular expression can be implemented using a deterministic finite automaton (rather than an indeterministic one). if you look at the table at the bottom of http://en.wikipedia.org/wiki/Finite_state_machine then i believe (i am not 100% sure) that the ".*?" match requires at least a pushdown automota, while ".*" can be handled with a simple finite automaton. disclaimer: this is all fairly new to me as i just recently implemented a regular expression matcher myself, and i may be wrong on some of the details. andrew akshat agarwal wrote: > Hi, > > I am trying to use the following snippet of code to print a regex match. > > s = '01234567890123456789x012' > > pat = r'(.*?x|[^a]+)*y' > > mo = re.search(pat, s) > if mo is not None: > print mo.group(0) > > By adding one character before the 'x' in the input string, the time taken > to print the match doubles. This behaviour is not observed in perl. I am > curious to know about the difference the in regex implementations of perl > and python which causes this. > > Thanks > -- > http://mail.python.org/mailman/listinfo/python-list > -- http://mail.python.org/mailman/listinfo/python-list