On Nov 15, 1:21 am, Jeremie Le Hen <[EMAIL PROTECTED]> wrote: > (Mail resent with the proper subject. > > Hi list, > > (Please Cc: me when replying, as I'm not subscribed to this list.) Don't know your mail, hope you will come back to look at the list archive...
> I'm working with Unicode strings to handle accented characters but I'm > experiencing a few problem. [skipped first question] > Secondly, I need to translate accented characters to their unaccented > form. I've written this function (sorry if the code isn't as efficient > as possible, I'm not a long-time Python programmer, feel free to correct > me, I' be glad to learn anything): > > % def unaccent(s): > % """ > % """ > % > % if not isinstance(s, types.UnicodeType): > % return s > % singleletter_re = re.compile(r'(?:^|\s)([A-Z])(?:$|\s)') > % result = '' > % for l in s: > % desc = unicodedata.name(l) > % m = singleletter_re.search(desc) > % if m is None: > % result += str(l) > % continue > % result += m.group(1).lower() > % return result > % > > But I don't feel confortable with it. It strongly depend on the UCD > file format and names that don't contain a single letter cannot > obvisouly all be converted to ascii. How would you implement this > function? my 2 cents: <unaccent.py> # -*- coding: utf-8 -*- import unicodedata def unaccent(s): u""" >>> unaccent(u"Ça crée déjà l'évènement") "Ca cree deja l'evenement" """ s = unicodedata.normalize('NFD', unicode(s.encode("utf-8"), encoding="utf-8")) return "".join(b for b in s.encode("utf-8") if ord(b) < 128) def _test(): import doctest doctest.testmod() if __name__ == "__main__": import sys sys.exit(_test()) </unaccent.py> > Thank you for your help. you are welcome. (left to the reader: - why does it work? - why does doctest work?) renaud > Regards, > -- > Jeremie Le Hen > < jlehen at clesys dot fr > > > ----- End forwarded message ----- > > -- > Jeremie Le Hen > < jlehen at clesys dot fr > -- http://mail.python.org/mailman/listinfo/python-list