Jax wrote: > I encountered problem with dolar sign in source string. It seems that $ > require special threatening. Below is copy of session with interactive > Python's shell: > > Python 2.5.2 (r252:60911, Jan 8 2009, 12:17:37) > [GCC 4.3.2] on linux2 > Type "help", "copyright", "credits" or "license" for more information. >>>> import re >>>> a = unicode(r"(instead of $399.99)", "utf8") >>>> print re.search(unicode(r"^\(instead of.*(\d+[.]\d+)\)$", "utf8"), > a).group(1) > 9.99 >>>> print re.search(unicode(r"^\(.*(\d+[.]\d+)\)$", "utf8"), a).group(1) > 9.99 >>>> print re.search(unicode(r"^\(.*\$(\d+[.]\d+)\)$", "utf8"), a).group(1) > 399.99 > > My question is: Why only third regular expression is correct?
They are all correct, they just don't give what you expect. This has nothing to do with the $. The ".*" expression is "greedy", it tries to match as many characters as possible. You can see that by adding another group: >>> a = u"(instead of $399.99)" >>> re.search(ur"^\(instead of(.*)(\d+[.]\d+)\)$", a).groups() (u' $39', u'9.99') Fortunately there is also a non-greedy variant ".*?" which matches as few characters as possible: >>> a = u"(instead of $399.99)" >>> re.search(ur"^\(instead of.*?(\d+[.]\d+)\)$", a).group(1) u'399.99' Peter -- http://mail.python.org/mailman/listinfo/python-list