Re: Unicode regex and Hindi language

Terry Reedy Sat, 29 Nov 2008 15:14:58 -0800

John Machin wrote:

John, nothing I wrote was directed at you. If you feel insulted, youhave my apology. My intention was and is to get future movement on anissue that was reported 20 months ago but which has lain dead since,until re-reported (a bit more clearly) a week ago, because of amisunderstanding by the person who (I believe) rewrote re for unicodeseveral years ago.

Like this:

| >>> w1 = u"L\N{LATIN SMALL LETTER O WITH DIAERESIS}wis"
| >>> w2 = u"Lo\N{COMBINING DIAERESIS}wis"
| >>> w1
| u'L\xf6wis'
| >>> w2
| u'Lo\u0308wis'
| >>> import unicodedats as ucd
| >>> ucd.category(u'\u0308')
| 'Mn'
| >>> u'\u0308'.isalpha()
| False
| >>> regex = re.compile(ur'\w+', re.UNICODE)
| >>> regex.match(w1).group(0)
| u'L\xf6wis'
| >>> regex.match(w2).group(0)
| u'Lo'


Yes, thank you.  FWIW, that confirms my suspicion.

Terry

--
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode regex and Hindi language

Reply via email to