On Sep 4, 3:42 pm, phasma <[EMAIL PROTECTED]> wrote: > Hi, I'm trying extract all alphabetic characters from string. > > reg = re.compile('(?u)([\w\s]+)', re.UNICODE)
You don't need both (?u) and re.UNICODE: they mean the same thing. This will actually match letters and whitespace. > buf = re.match(string) > > But it's doesn't work. If string starts from Cyrillic character, all > works fine. But if string starts from Latin character, match returns > only Latin characters. > I'm encoding the Unicode results as UTF-8 in order to print them, but I'm not having a problem with it otherwise: Program ======= # -*- coding: utf-8 -*- import re reg = re.compile('(?u)([\w\s]+)') found = reg.match(u"ya я") print found.group(1).encode("utf-8") found = reg.match(u"я ya") print found.group(1).encode("utf-8") Output ====== ya я я ya -- http://mail.python.org/mailman/listinfo/python-list