more exactly, my guess is perl has a special case for this that avoids doing a search over all possible matchers via the pushdown stack.
andrew cooke wrote: > > ".*?" is a "not greedy" match, which is significantly more difficult to > handle than a normal ".*". so the performance will depend on quite > complex details of how the regular expression engine is implemented. it > wouldn't surprise me if perl was better here, because it comes from a > background with a much stronger emphasis on regular expressions. > > i know that not an exact answer, but unless you find the author of the re > library i am not sure you will get a much better explanation. it comes > down to whether the regular expression can be implemented using a > deterministic finite automaton (rather than an indeterministic one). if > you look at the table at the bottom of > http://en.wikipedia.org/wiki/Finite_state_machine then i believe (i am not > 100% sure) that the ".*?" match requires at least a pushdown automota, > while ".*" can be handled with a simple finite automaton. > > disclaimer: this is all fairly new to me as i just recently implemented a > regular expression matcher myself, and i may be wrong on some of the > details. > > andrew > > > akshat agarwal wrote: >> Hi, >> >> I am trying to use the following snippet of code to print a regex match. >> >> s = '01234567890123456789x012' >> >> pat = r'(.*?x|[^a]+)*y' >> >> mo = re.search(pat, s) >> if mo is not None: >> print mo.group(0) >> >> By adding one character before the 'x' in the input string, the time >> taken >> to print the match doubles. This behaviour is not observed in perl. I am >> curious to know about the difference the in regex implementations of >> perl >> and python which causes this. >> >> Thanks >> -- >> http://mail.python.org/mailman/listinfo/python-list >> > > -- http://mail.python.org/mailman/listinfo/python-list