John Machin wrote:
> On Nov 30, 4:33 am, Terry Reedy <[EMAIL PROTECTED]> wrote:
>> Martin v. Löwis wrote:
>>> To be fair to Python (and SRE),
>
> I was being unfair?
No - sorry if I gave that impression.
Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list
John Machin wrote:
John, nothing I wrote was directed at you. If you feel insulted, you
have my apology. My intention was and is to get future movement on an
issue that was reported 20 months ago but which has lain dead since,
until re-reported (a bit more clearly) a week ago, because of a
On Nov 30, 4:33 am, Terry Reedy <[EMAIL PROTECTED]> wrote:
> Martin v. Löwis wrote:
> > To be fair to Python (and SRE),
I was being unfair? In the context, "bug" == "needs to be changed";
see below.
> SRE predates TR#18 (IIRC) - atleast
> > annex C was added somewhere between revision 6 and 9, i.
MRAB wrote:
Terry Reedy wrote:
I notice from the manual "All identifiers are converted into the
normal form NFC while parsing; comparison of identifiers is based on
NFC." If NFC used accented letters, then the issue is finesses away
for European words simply because Unicode includes include
Terry Reedy wrote:
Martin v. Löwis wrote:
To be fair to Python (and SRE), SRE predates TR#18 (IIRC) - atleast
annex C was added somewhere between revision 6 and 9, i.e. in early
2004. Python's current definition of \w is a straight-forward extension
of the historical \w definition (of Perl, I b
Martin v. Löwis wrote:
To be fair to Python (and SRE), SRE predates TR#18 (IIRC) - atleast
annex C was added somewhere between revision 6 and 9, i.e. in early
2004. Python's current definition of \w is a straight-forward extension
of the historical \w definition (of Perl, I believe), which,
unfo
> Huh? I thought it was settled. Read Terry Ready's latest message. Read
> the bug report it points to (http://bugs.python.org/issue1693050),
> especially the contribution from MvL. To paraphrase a remark by the
> timbot, Martin reads Unicode tech reports so that we don't have to.
> However if you
On Nov 29, 10:51 am, MRAB <[EMAIL PROTECTED]> wrote:
> John Machin wrote:
> > On Nov 29, 2:47 am, Shiao <[EMAIL PROTECTED]> wrote:
> >> The regex below identifies words in all languages I tested, but not in
> >> Hindi:
>
> >> pat = re.compile('^(\w+)$', re.U)
> >> ...
> >> m = pat.search(l.decod
John Machin wrote:
On Nov 29, 2:47 am, Shiao <[EMAIL PROTECTED]> wrote:
The regex below identifies words in all languages I tested, but not in
Hindi:
pat = re.compile('^(\w+)$', re.U)
...
m = pat.search(l.decode('utf-8'))
[example snipped]
From this is assumed that the Hindi text contain
On Nov 29, 2:47 am, Shiao <[EMAIL PROTECTED]> wrote:
> The regex below identifies words in all languages I tested, but not in
> Hindi:
> pat = re.compile('^(\w+)$', re.U)
> ...
>m = pat.search(l.decode('utf-8'))
[example snipped]
>
> From this is assumed that the Hindi text contains punctuatio
MRAB wrote:
Should the Mc and Mn codepoints match \w in the re module even though
u'हिन्दी'.isalpha() returns False (in Python 2.x, haven't tried Python
3.x)?
Same. And to me, that is wrong. The condensation of vowel characters
(which Hindi, etc, also have for words that begin with vowels)
Terry Reedy wrote:
Jerry Hill wrote:
On Fri, Nov 28, 2008 at 10:47 AM, Shiao <[EMAIL PROTECTED]> wrote:
The regex below identifies words in all languages I tested, but not in
Hindi:
# -*- coding: utf-8 -*-
import re
pat = re.compile('^(\w+)$', re.U)
langs = ('English', '中文', 'हिन्दी')
I thi
Jerry Hill wrote:
On Fri, Nov 28, 2008 at 10:47 AM, Shiao <[EMAIL PROTECTED]> wrote:
The regex below identifies words in all languages I tested, but not in
Hindi:
# -*- coding: utf-8 -*-
import re
pat = re.compile('^(\w+)$', re.U)
langs = ('English', '中文', 'हिन्दी')
I think the problem is th
On Fri, Nov 28, 2008 at 10:47 AM, Shiao <[EMAIL PROTECTED]> wrote:
> The regex below identifies words in all languages I tested, but not in
> Hindi:
>
> # -*- coding: utf-8 -*-
>
> import re
> pat = re.compile('^(\w+)$', re.U)
> langs = ('English', '中文', 'हिन्दी')
I think the problem is that the H
Shiao wrote:
> The regex below identifies words in all languages I tested, but not in
> Hindi:
>
> # -*- coding: utf-8 -*-
>
> import re
> pat = re.compile('^(\w+)$', re.U)
> langs = ('English', '中文', 'हिन्दी')
>
> for l in langs:
> m = pat.search(l.decode('utf-8'))
> print l, m and m.g
The regex below identifies words in all languages I tested, but not in
Hindi:
# -*- coding: utf-8 -*-
import re
pat = re.compile('^(\w+)$', re.U)
langs = ('English', '中文', 'हिन्दी')
for l in langs:
m = pat.search(l.decode('utf-8'))
print l, m and m.group(1)
Output:
English English
中文 中
16 matches
Mail list logo