Re: Unicode: matching a word and unaccenting characters

Gabriel Genellina Wed, 14 Nov 2007 17:29:05 -0800

En Wed, 14 Nov 2007 21:21:55 -0300, Jeremie Le Hen <[EMAIL PROTECTED]>  
escribió:


> (Please Cc: me when replying, as I'm not subscribed to this list.)

Not a good thing. *I* may CC you now, but any further replies and comments  
 from other people may leave the CC out. You can always browse this  
newsgroup at Google http://groups.google.com/group/comp.lang.python or  
Gmane http://dir.gmane.org/gmane.comp.python.general

> The first one is with regular expression.  If I want to match a word
> composed of characters only.  One can easily use '[a-zA-Z]+' when
> working in ascii, but unfortunately there is no equivalent when working
> with unicode strings: the latter doesn't match accented characters.  The
> only mean the re package provides is '\w' along with the re.UNICODE
> flag, but unfortunately it also matches digits and underscore.  It
> appears there is no suitable solution for this currently.  Am I right?

I think you're right, unfortunately.

> Secondly, I need to translate accented characters to their unaccented
> form.  I've written this function (sorry if the code isn't as efficient
> as possible, I'm not a long-time Python programmer, feel free to correct
> me, I' be glad to learn anything):

It's hard to do it right - this is another version:  
http://www.effbot.org/zone/unicode-convert.htm

-- 
Gabriel Genellina

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Unicode: matching a word and unaccenting characters

Reply via email to