Rami Chowdhury wrote: > On Thu, 01 Oct 2009 08:10:58 -0700, Walter Dörwald <wal...@livinglogic.de> > wrote: > >> On 01.10.09 16:09, Hyuga wrote: >>> On Sep 30, 3:34 am, gentlestone <tibor.b...@hotmail.com> wrote: >>>> Why don't work this code on Python 2.6? Or how can I do this job? >>>> >>>> [snip _MAP] >>>> >>>> def downcode(name): >>>> """ >>>> >>> downcode(u"Žabovitá zmiešaná kaša") >>>> u'Zabovita zmiesana kasa' >>>> """ >>>> for key, value in _MAP.iteritems(): >>>> name = name.replace(key, value) >>>> return name >>> >>> Though C Python is pretty optimized under the hood for this sort of >>> single-character replacement, this still seems pretty inefficient >>> since you're calling replace for every character you want to map. I >>> think that a better approach might be something like: >>> >>> def downcode(name): >>> return ''.join(_MAP.get(c, c) for c in name) >>> >>> Or using string.translate: >>> >>> import string >>> def downcode(name): >>> table = string.maketrans( >>> 'ÀÁÂÃÄÅ...', >>> 'AAAAAA...') >>> return name.translate(table) >> >> Or even simpler: >> >> import unicodedata >> >> def downcode(name): >> return unicodedata.normalize("NFD", name)\ >> .encode("ascii", "ignore")\ >> .decode("ascii") >> >> Servus, >> Walter > > As I understand it, the "ignore" argument to str.encode *removes* the > undecodable characters, rather than replacing them with an ASCII > approximation. Is that correct? If so, wouldn't that rather defeat the > purpose?
You didn't take the normalization step into your consideration. Example: >>> import unicodedata >>> s = u"Ä" >>> unicodedata.normalize("NFD", s) u'A\u0308' >>> _.encode("ascii", "ignore") 'A' -- http://mail.python.org/mailman/listinfo/python-list