Denis S. Otkidach wrote:
You are right. But isalpha behavior looks strange for me anyway: why
cyrillic character '\u0430' is recognized as alpha one for de_DE locale,
but is not for C?
In glibc, all "real" locales are based on
/usr/share/locale/i18n/locales/i18n, e.g. for de_DE through
LC_CTYPE
co
Serge Orlov wrote:
Emphasis is mine. So how many libc implementations with
non-unicode wide-character codes do we have in 2005?
Solaris has supported 2-byte wchar_t implementations for many
years, and so I believe did HP-UX and AIX.
ISO C99 defines a constant __STDC_ISO_10646__ which an
implementat
On Fri, 11 Feb 2005 18:49:53 +0100
"Fredrik Lundh" <[EMAIL PROTECTED]> wrote:
> >> >>> re.compile(ur'\w+', re.U).findall(u'\xb5\xba\xe4\u0430')
> >> [u'\xb5\xba\xe4\u0430']
> >
> > I can't find the strict definition of isalpha, but I believe average
> > C program shouldn't care about the current l
Serge Orlov wrote:
> The wide-character value for each member of the Portable
> Character Set will equal its value when used as the lone character
> in an integer character constant. Wide-character codes for other
> characters are locale- and *implementation-dependent*
>
> Emphasis is mine.
the r
Fredrik Lundh wrote:
> Serge Orlov wrote:
>
>> re.compile(ur'\w+', re.U).findall(u'\xb5\xba\xe4\u0430')
>> [u'\xb5\xba\xe4\u0430']
>>
>> I can't find the strict definition of isalpha, but I believe average
>> C program shouldn't care about the current locale alphabet, so
>> isalpha is a uni
Fredrik Lundh wrote:
> Serge Orlov wrote:
>
>> re.compile(ur'\w+', re.U).findall(u'\xb5\xba\xe4\u0430')
>> [u'\xb5\xba\xe4\u0430']
>>
>> I can't find the strict definition of isalpha, but I believe average
>> C program shouldn't care about the current locale alphabet, so
>> isalpha is a uni
"Martin v. Löwis" wrote:
> Serge Orlov wrote:
> > To summarize the discussion: either it's a bug in glibc or there
> is an
>> option to specify modern POSIX locale. POSIX locale consist of
>> characters from the portable character set, unicode is certainly
>> portable.
>
> Yes, but U+00E4 is not i
Serge Orlov wrote:
> To summarize the discussion: either it's a bug in glibc or there is an
option to specify modern POSIX locale. POSIX locale consist of
characters from the portable character set, unicode is certainly
portable.
Yes, but U+00E4 is not in the portable character set. The portable
On Sat, 12 Feb 2005 09:42:41 +0100
"Fredrik Lundh" <[EMAIL PROTECTED]> wrote:
> the relevant part for this thread is *locale-*. if wctype depends on
> the locale, it cannot be used for generic build. (custom interpreters
> are an- other thing, but they shouldn't be shipped as "python").
You are
Serge Orlov wrote:
>> >>> re.compile(ur'\w+', re.U).findall(u'\xb5\xba\xe4\u0430')
>> [u'\xb5\xba\xe4\u0430']
>
> I can't find the strict definition of isalpha, but I believe average
> C program shouldn't care about the current locale alphabet, so isalpha
> is a union of all supported characters i
Serge Orlov wrote:
>> >>> re.compile(ur'\w+', re.U).findall(u'\xb5\xba\xe4\u0430')
>> [u'\xb5\xba\xe4\u0430']
>
> I can't find the strict definition of isalpha, but I believe average
> C program shouldn't care about the current locale alphabet, so isalpha
> is a union of all supported characters i
Denis S. Otkidach wrote:
> On 10 Feb 2005 11:49:33 -0800
> "Serge Orlov" <[EMAIL PROTECTED]> wrote:
>
> > This thread is about problems only with LANG=C or LANG=POSIX, it's
not
> > about other locales. Other locales are working as expected.
>
> You are not right. I have LANG=de_DE.UTF-8, and the P
On 10 Feb 2005 11:49:33 -0800
"Serge Orlov" <[EMAIL PROTECTED]> wrote:
> This thread is about problems only with LANG=C or LANG=POSIX, it's not
> about other locales. Other locales are working as expected.
You are not right. I have LANG=de_DE.UTF-8, and the Python test_re.py
doesn't pass. $LANG
Peter Maas wrote:
> Serge Orlov schrieb:
> > Denis S. Otkidach wrote:
> > To summarize the discussion: either it's a bug in glibc or there is
an
> > option to specify modern POSIX locale. POSIX locale consist of
> > characters from the portable character set, unicode is certainly
> > portable.
>
>
Peter Maas wrote:
>> To summarize the discussion: either it's a bug in glibc or there is an
>> option to specify modern POSIX locale. POSIX locale consist of
>> characters from the portable character set, unicode is certainly
>> portable.
>
> What about the environment variable LANG? I have SuSE 9
Serge Orlov schrieb:
Denis S. Otkidach wrote:
To summarize the discussion: either it's a bug in glibc or there is an
option to specify modern POSIX locale. POSIX locale consist of
characters from the portable character set, unicode is certainly
portable.
What about the environment variable LANG? I
Denis S. Otkidach wrote:
> On all platfroms \w matches all unicode letters when used with flag
> re.UNICODE, but this doesn't work on SuSE 9.2:
>
> Python 2.3.4 (#1, Dec 17 2004, 19:56:48)
> [GCC 3.3.4 (pre 3.3.5 20040809)] on linux2
> Type "help", "copyright", "credits" or "license" for more
infor
On Thu, 10 Feb 2005 17:46:06 +0100
"Fredrik Lundh" <[EMAIL PROTECTED]> wrote:
> > Can --with-wctype-functions configure option be the
> > source of problem?
>
> yes.
>
> that option disables Python's own Unicode database, and relies on the C
> library's
> wctype.h (iswalpha, etc) to behave prop
Denis S. Otkidach wrote:
> On 10 Feb 2005 03:59:51 -0800
> "Serge Orlov" <[EMAIL PROTECTED]> wrote:
>
> > > On all platfroms \w matches all unicode letters when used with
flag
> > > re.UNICODE, but this doesn't work on SuSE 9.2:
> [...]
> > I can get the same results on RedHat's python 2.2.3 if I p
Denis S. Otkidach wrote:
>> > On all platfroms \w matches all unicode letters when used with flag
>> > re.UNICODE, but this doesn't work on SuSE 9.2:
>>
>> I think Python on SuSE 9.2 uses UCS4 for unicode strings (as does
>> RedHat), check sys.maxunicode.
>>
>> This is not an explanation, but perh
On Thu, 10 Feb 2005 16:23:09 +0100
Daniel Dittmar <[EMAIL PROTECTED]> wrote:
> Denis S. Otkidach wrote:
>
> > On all platfroms \w matches all unicode letters when used with flag
> > re.UNICODE, but this doesn't work on SuSE 9.2:
>
> I think Python on SuSE 9.2 uses UCS4 for unicode strings (as do
Denis S. Otkidach wrote:
On all platfroms \w matches all unicode letters when used with flag
re.UNICODE, but this doesn't work on SuSE 9.2:
I think Python on SuSE 9.2 uses UCS4 for unicode strings (as does
RedHat), check sys.maxunicode.
This is not an explanation, but perhaps a hint where to look
On 10 Feb 2005 03:59:51 -0800
"Serge Orlov" <[EMAIL PROTECTED]> wrote:
> > On all platfroms \w matches all unicode letters when used with flag
> > re.UNICODE, but this doesn't work on SuSE 9.2:
[...]
> I can get the same results on RedHat's python 2.2.3 if I pass re.L
> option, it looks like this
On Thu, 10 Feb 2005 13:00:42 +0300
"Denis S. Otkidach" <[EMAIL PROTECTED]> wrote:
> On all platfroms \w matches all unicode letters when used with flag
> re.UNICODE, but this doesn't work on SuSE 9.2:
>
> Python 2.3.4 (#1, Dec 17 2004, 19:56:48)
> [GCC 3.3.4 (pre 3.3.5 20040809)] on linux2
> Typ
Denis S. Otkidach wrote:
> On all platfroms \w matches all unicode letters when used with flag
> re.UNICODE, but this doesn't work on SuSE 9.2:
>
> Python 2.3.4 (#1, Dec 17 2004, 19:56:48)
> [GCC 3.3.4 (pre 3.3.5 20040809)] on linux2
> Type "help", "copyright", "credits" or "license" for more
infor
On all platfroms \w matches all unicode letters when used with flag
re.UNICODE, but this doesn't work on SuSE 9.2:
Python 2.3.4 (#1, Dec 17 2004, 19:56:48)
[GCC 3.3.4 (pre 3.3.5 20040809)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import re
>>> re.compil
26 matches
Mail list logo