Re: Python and Cyrillic characters in regular expression

MRAB Thu, 04 Sep 2008 10:51:26 -0700

On Sep 4, 3:42 pm, phasma <[EMAIL PROTECTED]> wrote:
> Hi, I'm trying extract all alphabetic characters from string.
>
> reg = re.compile('(?u)([\w\s]+)', re.UNICODE)


You don't need both (?u) and re.UNICODE: they mean the same thing.

This will actually match letters and whitespace.

> buf = re.match(string)
>
> But it's doesn't work. If string starts from Cyrillic character, all
> works fine. But if string starts from Latin character, match returns
> only Latin characters.
>

I'm encoding the Unicode results as UTF-8 in order to print them, but
I'm not having a problem with it otherwise:

Program
=======
# -*- coding: utf-8 -*-
import re
reg = re.compile('(?u)([\w\s]+)')

found = reg.match(u"ya я")
print found.group(1).encode("utf-8")

found = reg.match(u"я ya")
print found.group(1).encode("utf-8")

Output
======
ya я
я ya
--
http://mail.python.org/mailman/listinfo/python-list

Re: Python and Cyrillic characters in regular expression

Reply via email to