Re: Python and Cyrillic characters in regular expression

MRAB Fri, 05 Sep 2008 07:31:33 -0700

On Sep 5, 12:28 pm, phasma <[EMAIL PROTECTED]> wrote:
> string = u"ðÒÉ×ÅÔ"


All the characters are letters.

> (u'\u041f\u0440\u0438\u0432\u0435\u0442',)
>
> string = u"Hi.ðÒÉ×ÅÔ"

The third character isn't a letter and isn't whitespace.

> (u'Hi',)
>

> On Sep 4, 9:53špm, Fredrik Lundh <[EMAIL PROTECTED]> wrote:
>
> > phasma wrote:
> > > Hi, I'm trying extract all alphabetic characters from string.
>
> > > reg = re.compile('(?u)([\w\s]+)', re.UNICODE)
> > > buf = re.match(string)
>
> > > But it's doesn't work. If string starts from Cyrillic character, all
> > > works fine. But if string starts from Latin character, match returns
> > > only Latin characters.
>
> > can you provide a few sample strings that show this behaviour?
>
--
http://mail.python.org/mailman/listinfo/python-list

Re: Python and Cyrillic characters in regular expression

Reply via email to