John Machin wrote: > The point was made in a context where the OP appeared to be reading a > line at a time and parsing it, and re.compile(r'something').match() > would do the job; re.compile(r'^something').search() will do the job too > -- BECAUSE ^ means start of line anchor -- but somewhat redundantly, and > very inefficiently in the failing case with dopey implementations of > search() (which apply match() at offsets 0, 1, 2, .....).
Answering the question you think should have been asked rather than the question which was actually asked is a great newsnet tradition, and often more helpful to the poster than a straight answer would have been. However, you do have to be careful to make it clear that is what you are doing. The OP did not use the word 'line' once in his post. He simply said he was searching a string. You didn't use the word 'line' either. If you are going to read more into the question than was actually asked, please try to say what question it is you are actually answering. If he is using individual lines and re.match then the presence or absence of a leading ^ makes virtually no difference. If he is looking for all occurences in a multiline string then re.search with an anchored match is a correct way to do it (splitting the string into lines and using re.match is an alternative which may or may not be appropriate). Either way, putting the focus on the ^ was inappropriate: the issue is whether to use re.search or re.match. If you assume that the search fails on an 80 character line, then I get timings of 6.48uS (re.search), 4.68uS (re.match with ^), 4.66uS (re.match without ^). A failing search on a 10,000 character line shows how performance will degrade (225uS for search, no change for match), but notice that searching 1 10,000 character string is more than twice as fast as matching 125 80 character lines. I don't understand what you think an implementation of search() can do in this case apart from trying for a match at offsets 0, 1, 2, ...? It could find a match at any starting offset within the string, so it must scan the string in some form. A clever regex implementation will use Boyer-Moore where it can to avoid checking every index in the string, but for the pattern I suggested it would suprise me if any implementations actually manage much of an optimisation. -- http://mail.python.org/mailman/listinfo/python-list