On 22/07/2006 2:18 AM, Simon Forman wrote: > John Salerno wrote: >> Simon Forman wrote: >> >>> Python's re.match() matches from the start of the string, so if you
(1) Every regex library's match() starts matching from the beginning of the string (unless of course there's an arg for an explicit starting position) -- where else would it start? (2) This has absolutely zero relevance to the "match whole string or not" question. >>> want to ensure that the whole string matches completely you'll probably >>> want to end your re pattern with the "$" character (depending on what >>> the rest of your pattern matches.) *NO* ... if you want to ensure that the whole string matches completely, you need to end your pattern with "\Z", *not* "$". Perusal of the manual would seem to be indicated :-) >> Is that necessary? I was thinking that match() was used to match the >> full RE and string, and if they weren't the same, they wouldn't match >> (meaning a begin/end of string character wasn't necessary). That's wrong? Yes. If the default were to match the whole string, then a metacharacter would be required to signal "*don't* match the whole string" ... functionality which is quite useful. > > My understanding, from the docs and from dim memories of using > re.match() long ago, is that it will match on less than the full input > string if the re pattern allows it (for instance, if the pattern > *doesn't* end in '.*' or something similar.) Ending a pattern with '.*' or something similar is typically a mistake and does nothing but waste CPU cycles: C:\junk>python -mtimeit -s"import re;s='a'+80*'z';m=re.compile('a').match" "m(s)" 1000000 loops, best of 3: 1.12 usec per loop C:\junk>python -mtimeit -s"import re;s='a'+8000*'z';m=re.compile('a').match" "m(s)" 100000 loops, best of 3: 1.15 usec per loop C:\junk>python -mtimeit -s"import re;s='a'+80*'z';m=re.compile('a.*').match" "m(s)" 100000 loops, best of 3: 1.39 usec per loop C:\junk>python -mtimeit -s"import re;s='a'+8000*'z';m=re.compile('a.*').match" "m(s)" 10000 loops, best of 3: 24.2 usec per loop The regex engine can't optimise it away because '.' means by default "any character except a newline" , so it has to trundle all the way to the end just in case there's a newline lurking somewhere. Oh and just in case you were wondering: C:\junk>python -mtimeit -s"import re;s='a'+8000*'z';m=re.compile('a.*',re.DOTALL).match" "m(s)" 1000000 loops, best of 3: 1.18 usec per loop In this case, logic says the '.*' will match anything, so it can stop immediately. > > I'd test this, though, before trusting it. > > What the heck, I'll do that now: > >>>> import re >>>> re.match('ab', 'abcde') > <_sre.SRE_Match object at 0xb6ff8790> >>>> m = _ ??? What's wrong with _.group() ??? >>>> m.group() > 'ab' >>>> print re.match('ab$', 'abcde') > None > HTH, John -- http://mail.python.org/mailman/listinfo/python-list