phasma wrote:
string = u"Привет"
(u'\u041f\u0440\u0438\u0432\u0435\u0442',)
string = u"Hi.Привет"
(u'Hi',)
the [\w\s] pattern you used matches letters, numbers, underscore, and
whitespace. "." doesn't fall into that category, so the "match" method
stops when it gets to that character.
maybe you could use re.sub or re.findall?
>>> # replace all non-alphanumerics with the empty string
>>> re.sub("(?u)\W+", "", string)
u'Hi\u041f\u0440\u0438\u0432\u0435\u0442'
>>> # find runs of alphanumeric characters
>>> re.findall("(?u)\w+", string)
[u'Hi', u'\u041f\u0440\u0438\u0432\u0435\u0442']
>>> "".join(re.findall("(?u)\w+", string))
u'Hi\u041f\u0440\u0438\u0432\u0435\u0442'
(the "sub" example expects you to specify what characters you want to
skip, while "findall" expects you to specify what you want to keep.)
</F>
--
http://mail.python.org/mailman/listinfo/python-list