Re: sre is broken in SuSE 9.2

2005-02-13 Thread "Martin v. Löwis"
Denis S. Otkidach wrote: You are right. But isalpha behavior looks strange for me anyway: why cyrillic character '\u0430' is recognized as alpha one for de_DE locale, but is not for C? In glibc, all "real" locales are based on /usr/share/locale/i18n/locales/i18n, e.g. for de_DE through LC_CTYPE co

Re: sre is broken in SuSE 9.2

2005-02-13 Thread "Martin v. Löwis"
Serge Orlov wrote: Emphasis is mine. So how many libc implementations with non-unicode wide-character codes do we have in 2005? Solaris has supported 2-byte wchar_t implementations for many years, and so I believe did HP-UX and AIX. ISO C99 defines a constant __STDC_ISO_10646__ which an implementat

Re: sre is broken in SuSE 9.2

2005-02-12 Thread Denis S. Otkidach
On Fri, 11 Feb 2005 18:49:53 +0100 "Fredrik Lundh" <[EMAIL PROTECTED]> wrote: > >> >>> re.compile(ur'\w+', re.U).findall(u'\xb5\xba\xe4\u0430') > >> [u'\xb5\xba\xe4\u0430'] > > > > I can't find the strict definition of isalpha, but I believe average > > C program shouldn't care about the current l

Re: sre is broken in SuSE 9.2

2005-02-12 Thread Fredrik Lundh
Serge Orlov wrote: > The wide-character value for each member of the Portable > Character Set will equal its value when used as the lone character > in an integer character constant. Wide-character codes for other > characters are locale- and *implementation-dependent* > > Emphasis is mine. the r

Re: sre is broken in SuSE 9.2

2005-02-12 Thread Serge Orlov
Fredrik Lundh wrote: > Serge Orlov wrote: > >> re.compile(ur'\w+', re.U).findall(u'\xb5\xba\xe4\u0430') >> [u'\xb5\xba\xe4\u0430'] >> >> I can't find the strict definition of isalpha, but I believe average >> C program shouldn't care about the current locale alphabet, so >> isalpha is a uni

Re: sre is broken in SuSE 9.2

2005-02-12 Thread Serge Orlov
Fredrik Lundh wrote: > Serge Orlov wrote: > >> re.compile(ur'\w+', re.U).findall(u'\xb5\xba\xe4\u0430') >> [u'\xb5\xba\xe4\u0430'] >> >> I can't find the strict definition of isalpha, but I believe average >> C program shouldn't care about the current locale alphabet, so >> isalpha is a uni

Re: sre is broken in SuSE 9.2

2005-02-12 Thread Serge Orlov
"Martin v. Löwis" wrote: > Serge Orlov wrote: > > To summarize the discussion: either it's a bug in glibc or there > is an >> option to specify modern POSIX locale. POSIX locale consist of >> characters from the portable character set, unicode is certainly >> portable. > > Yes, but U+00E4 is not i

Re: sre is broken in SuSE 9.2

2005-02-12 Thread "Martin v. Löwis"
Serge Orlov wrote: > To summarize the discussion: either it's a bug in glibc or there is an option to specify modern POSIX locale. POSIX locale consist of characters from the portable character set, unicode is certainly portable. Yes, but U+00E4 is not in the portable character set. The portable

Re: sre is broken in SuSE 9.2

2005-02-12 Thread Denis S. Otkidach
On Sat, 12 Feb 2005 09:42:41 +0100 "Fredrik Lundh" <[EMAIL PROTECTED]> wrote: > the relevant part for this thread is *locale-*. if wctype depends on > the locale, it cannot be used for generic build. (custom interpreters > are an- other thing, but they shouldn't be shipped as "python"). You are

Re: sre is broken in SuSE 9.2

2005-02-11 Thread Fredrik Lundh
Serge Orlov wrote: >> >>> re.compile(ur'\w+', re.U).findall(u'\xb5\xba\xe4\u0430') >> [u'\xb5\xba\xe4\u0430'] > > I can't find the strict definition of isalpha, but I believe average > C program shouldn't care about the current locale alphabet, so isalpha > is a union of all supported characters i

Re: sre is broken in SuSE 9.2

2005-02-11 Thread Fredrik Lundh
Serge Orlov wrote: >> >>> re.compile(ur'\w+', re.U).findall(u'\xb5\xba\xe4\u0430') >> [u'\xb5\xba\xe4\u0430'] > > I can't find the strict definition of isalpha, but I believe average > C program shouldn't care about the current locale alphabet, so isalpha > is a union of all supported characters i

Re: sre is broken in SuSE 9.2

2005-02-11 Thread Serge Orlov
Denis S. Otkidach wrote: > On 10 Feb 2005 11:49:33 -0800 > "Serge Orlov" <[EMAIL PROTECTED]> wrote: > > > This thread is about problems only with LANG=C or LANG=POSIX, it's not > > about other locales. Other locales are working as expected. > > You are not right. I have LANG=de_DE.UTF-8, and the P

Re: sre is broken in SuSE 9.2

2005-02-11 Thread Denis S. Otkidach
On 10 Feb 2005 11:49:33 -0800 "Serge Orlov" <[EMAIL PROTECTED]> wrote: > This thread is about problems only with LANG=C or LANG=POSIX, it's not > about other locales. Other locales are working as expected. You are not right. I have LANG=de_DE.UTF-8, and the Python test_re.py doesn't pass. $LANG

Re: sre is broken in SuSE 9.2

2005-02-10 Thread Serge Orlov
Peter Maas wrote: > Serge Orlov schrieb: > > Denis S. Otkidach wrote: > > To summarize the discussion: either it's a bug in glibc or there is an > > option to specify modern POSIX locale. POSIX locale consist of > > characters from the portable character set, unicode is certainly > > portable. > >

Re: sre is broken in SuSE 9.2

2005-02-10 Thread Fredrik Lundh
Peter Maas wrote: >> To summarize the discussion: either it's a bug in glibc or there is an >> option to specify modern POSIX locale. POSIX locale consist of >> characters from the portable character set, unicode is certainly >> portable. > > What about the environment variable LANG? I have SuSE 9

Re: sre is broken in SuSE 9.2

2005-02-10 Thread Peter Maas
Serge Orlov schrieb: Denis S. Otkidach wrote: To summarize the discussion: either it's a bug in glibc or there is an option to specify modern POSIX locale. POSIX locale consist of characters from the portable character set, unicode is certainly portable. What about the environment variable LANG? I

Re: sre is broken in SuSE 9.2

2005-02-10 Thread Serge Orlov
Denis S. Otkidach wrote: > On all platfroms \w matches all unicode letters when used with flag > re.UNICODE, but this doesn't work on SuSE 9.2: > > Python 2.3.4 (#1, Dec 17 2004, 19:56:48) > [GCC 3.3.4 (pre 3.3.5 20040809)] on linux2 > Type "help", "copyright", "credits" or "license" for more infor

Re: sre is broken in SuSE 9.2

2005-02-10 Thread Denis S. Otkidach
On Thu, 10 Feb 2005 17:46:06 +0100 "Fredrik Lundh" <[EMAIL PROTECTED]> wrote: > > Can --with-wctype-functions configure option be the > > source of problem? > > yes. > > that option disables Python's own Unicode database, and relies on the C > library's > wctype.h (iswalpha, etc) to behave prop

Re: sre is broken in SuSE 9.2

2005-02-10 Thread Serge Orlov
Denis S. Otkidach wrote: > On 10 Feb 2005 03:59:51 -0800 > "Serge Orlov" <[EMAIL PROTECTED]> wrote: > > > > On all platfroms \w matches all unicode letters when used with flag > > > re.UNICODE, but this doesn't work on SuSE 9.2: > [...] > > I can get the same results on RedHat's python 2.2.3 if I p

Re: sre is broken in SuSE 9.2

2005-02-10 Thread Fredrik Lundh
Denis S. Otkidach wrote: >> > On all platfroms \w matches all unicode letters when used with flag >> > re.UNICODE, but this doesn't work on SuSE 9.2: >> >> I think Python on SuSE 9.2 uses UCS4 for unicode strings (as does >> RedHat), check sys.maxunicode. >> >> This is not an explanation, but perh

Re: sre is broken in SuSE 9.2

2005-02-10 Thread Denis S. Otkidach
On Thu, 10 Feb 2005 16:23:09 +0100 Daniel Dittmar <[EMAIL PROTECTED]> wrote: > Denis S. Otkidach wrote: > > > On all platfroms \w matches all unicode letters when used with flag > > re.UNICODE, but this doesn't work on SuSE 9.2: > > I think Python on SuSE 9.2 uses UCS4 for unicode strings (as do

Re: sre is broken in SuSE 9.2

2005-02-10 Thread Daniel Dittmar
Denis S. Otkidach wrote: On all platfroms \w matches all unicode letters when used with flag re.UNICODE, but this doesn't work on SuSE 9.2: I think Python on SuSE 9.2 uses UCS4 for unicode strings (as does RedHat), check sys.maxunicode. This is not an explanation, but perhaps a hint where to look

Re: sre is broken in SuSE 9.2

2005-02-10 Thread Denis S. Otkidach
On 10 Feb 2005 03:59:51 -0800 "Serge Orlov" <[EMAIL PROTECTED]> wrote: > > On all platfroms \w matches all unicode letters when used with flag > > re.UNICODE, but this doesn't work on SuSE 9.2: [...] > I can get the same results on RedHat's python 2.2.3 if I pass re.L > option, it looks like this

Re: sre is broken in SuSE 9.2

2005-02-10 Thread Denis S. Otkidach
On Thu, 10 Feb 2005 13:00:42 +0300 "Denis S. Otkidach" <[EMAIL PROTECTED]> wrote: > On all platfroms \w matches all unicode letters when used with flag > re.UNICODE, but this doesn't work on SuSE 9.2: > > Python 2.3.4 (#1, Dec 17 2004, 19:56:48) > [GCC 3.3.4 (pre 3.3.5 20040809)] on linux2 > Typ

Re: sre is broken in SuSE 9.2

2005-02-10 Thread Serge Orlov
Denis S. Otkidach wrote: > On all platfroms \w matches all unicode letters when used with flag > re.UNICODE, but this doesn't work on SuSE 9.2: > > Python 2.3.4 (#1, Dec 17 2004, 19:56:48) > [GCC 3.3.4 (pre 3.3.5 20040809)] on linux2 > Type "help", "copyright", "credits" or "license" for more infor

sre is broken in SuSE 9.2

2005-02-10 Thread Denis S. Otkidach
On all platfroms \w matches all unicode letters when used with flag re.UNICODE, but this doesn't work on SuSE 9.2: Python 2.3.4 (#1, Dec 17 2004, 19:56:48) [GCC 3.3.4 (pre 3.3.5 20040809)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import re >>> re.compil