Re: regular expresions and dolar sign ($) in source string

Peter Otten Thu, 23 Apr 2009 00:15:39 -0700

Jax wrote:

> I encountered problem with dolar sign in source string. It seems that $
> require special threatening. Below is copy of session with interactive
> Python's shell:
> 
> Python 2.5.2 (r252:60911, Jan  8 2009, 12:17:37)
> [GCC 4.3.2] on linux2
> Type "help", "copyright", "credits" or "license" for more information.
>>>> import re
>>>> a = unicode(r"(instead of $399.99)", "utf8")
>>>> print re.search(unicode(r"^\(instead of.*(\d+[.]\d+)\)$", "utf8"),
> a).group(1)
> 9.99
>>>> print re.search(unicode(r"^\(.*(\d+[.]\d+)\)$", "utf8"), a).group(1)
> 9.99
>>>> print re.search(unicode(r"^\(.*\$(\d+[.]\d+)\)$", "utf8"), a).group(1)
> 399.99
> 
> My question is: Why only third regular expression is correct?


They are all correct, they just don't give what you expect. This has nothing
to do with the $. The ".*" expression is "greedy", it tries to match as
many characters as possible. You can see that by adding another group:

>>> a = u"(instead of $399.99)"
>>> re.search(ur"^\(instead of(.*)(\d+[.]\d+)\)$", a).groups()
(u' $39', u'9.99')

Fortunately there is also a non-greedy variant ".*?" which matches as few
characters as possible:

>>> a = u"(instead of $399.99)"
>>> re.search(ur"^\(instead of.*?(\d+[.]\d+)\)$", a).group(1)
u'399.99'

Peter
--
http://mail.python.org/mailman/listinfo/python-list

Re: regular expresions and dolar sign ($) in source string

Reply via email to