Baz Walter wrote:
> On 03/08/10 21:24, MRAB wrote:
And, BTW, none of your examples pass a UTF-8 bytestring to
re.findall: all those string literals starting with the 'u' prefix
are Unicode strings!
>>>
>>> not sure what you mean by this: if the string was encoded as utf8,
>>> '\w' s
On 03/08/10 21:24, MRAB wrote:
And, BTW, none of your examples pass a UTF-8 bytestring to re.findall:
all those string literals starting with the 'u' prefix are Unicode
strings!
not sure what you mean by this: if the string was encoded as utf8,
'\w' still wouldn't match any of the non-ascii cha
Baz Walter wrote:
On 03/08/10 19:40, MRAB wrote:
Baz Walter wrote:
the python docs say that re.LOCALE makes certain character classes
"dependent on the current locale".
re.LOCALE just passes the character to the underlying C library. It
really only works on bytestrings which have 1 byte per c
On 03/08/10 19:40, MRAB wrote:
Baz Walter wrote:
the python docs say that re.LOCALE makes certain character classes
"dependent on the current locale".
re.LOCALE just passes the character to the underlying C library. It
really only works on bytestrings which have 1 byte per character.
the re
Baz Walter wrote:
the python docs say that re.LOCALE makes certain character classes
"dependent on the current locale".
here's what i currently see on my system:
>>> import re, locale
>>> locale.getdefaultlocale()
('en_GB', 'UTF8')
>>> locale.getlocale()
(None, None)
>>> re.findall(r'\w',
the python docs say that re.LOCALE makes certain character classes
"dependent on the current locale".
here's what i currently see on my system:
>>> import re, locale
>>> locale.getdefaultlocale()
('en_GB', 'UTF8')
>>> locale.getlocale()
(None, None)
>>> re.findall(r'\w', u'a b c \xe5 \xe6 \xe7'