Re: Python and Cyrillic characters in regular expression

2008-09-05 Thread Fredrik Lundh
phasma wrote: string = u"Привет" (u'\u041f\u0440\u0438\u0432\u0435\u0442',) string = u"Hi.Привет" (u'Hi',) the [\w\s] pattern you used matches letters, numbers, underscore, and whitespace. "." doesn't fall into that category, so the "match" method stops when it gets to that character. ma

Re: Python and Cyrillic characters in regular expression

2008-09-05 Thread MRAB
On Sep 5, 12:28 pm, phasma <[EMAIL PROTECTED]> wrote: > string = u"ðÒÉ×ÅÔ" All the characters are letters. > (u'\u041f\u0440\u0438\u0432\u0435\u0442',) > > string = u"Hi.ðÒÉ×ÅÔ" The third character isn't a letter and isn't whitespace. > (u'Hi',) > > On Sep 4, 9:53špm, Fredrik Lundh <[EMAIL PRO

Re: Python and Cyrillic characters in regular expression

2008-09-05 Thread phasma
string = u"Привет" (u'\u041f\u0440\u0438\u0432\u0435\u0442',) string = u"Hi.Привет" (u'Hi',) On Sep 4, 9:53 pm, Fredrik Lundh <[EMAIL PROTECTED]> wrote: > phasma wrote: > > Hi, I'm trying extract all alphabetic characters from string. > > > reg = re.compile('(?u)([\w\s]+)', re.UNICODE) > > buf =

Re: Python and Cyrillic characters in regular expression

2008-09-04 Thread Fredrik Lundh
phasma wrote: Hi, I'm trying extract all alphabetic characters from string. reg = re.compile('(?u)([\w\s]+)', re.UNICODE) buf = re.match(string) But it's doesn't work. If string starts from Cyrillic character, all works fine. But if string starts from Latin character, match returns only Latin

Re: Python and Cyrillic characters in regular expression

2008-09-04 Thread MRAB
On Sep 4, 3:42 pm, phasma <[EMAIL PROTECTED]> wrote: > Hi, I'm trying extract all alphabetic characters from string. > > reg = re.compile('(?u)([\w\s]+)', re.UNICODE) You don't need both (?u) and re.UNICODE: they mean the same thing. This will actually match letters and whitespace. > buf = re.ma

Python and Cyrillic characters in regular expression

2008-09-04 Thread phasma
Hi, I'm trying extract all alphabetic characters from string. reg = re.compile('(?u)([\w\s]+)', re.UNICODE) buf = re.match(string) But it's doesn't work. If string starts from Cyrillic character, all works fine. But if string starts from Latin character, match returns only Latin characters. Plea