phasma wrote:

string = u"Привет"
(u'\u041f\u0440\u0438\u0432\u0435\u0442',)

string = u"Hi.Привет"
(u'Hi',)

the [\w\s] pattern you used matches letters, numbers, underscore, and whitespace. "." doesn't fall into that category, so the "match" method stops when it gets to that character.

maybe you could use re.sub or re.findall?

>>> # replace all non-alphanumerics with the empty string
>>> re.sub("(?u)\W+", "", string)
u'Hi\u041f\u0440\u0438\u0432\u0435\u0442'

>>> # find runs of alphanumeric characters
>>> re.findall("(?u)\w+", string)
[u'Hi', u'\u041f\u0440\u0438\u0432\u0435\u0442']
>>> "".join(re.findall("(?u)\w+", string))
u'Hi\u041f\u0440\u0438\u0432\u0435\u0442'

(the "sub" example expects you to specify what characters you want to skip, while "findall" expects you to specify what you want to keep.)

</F>

--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to