Re: unicode issue

Rami Chowdhury Thu, 01 Oct 2009 08:51:44 -0700

On Thu, 01 Oct 2009 08:10:58 -0700, Walter Dörwald <wal...@livinglogic.de>wrote:

On 01.10.09 16:09, Hyuga wrote:

On Sep 30, 3:34 am, gentlestone <tibor.b...@hotmail.com> wrote:

Why don't work this code on Python 2.6? Or how can I do this job?


[snip _MAP]

def downcode(name):
    """
    >>> downcode(u"Žabovitá zmiešaná kaša")
    u'Zabovita zmiesana kasa'
    """
    for key, value in _MAP.iteritems():
        name = name.replace(key, value)
    return name


Though C Python is pretty optimized under the hood for this sort of
single-character replacement, this still seems pretty inefficient
since you're calling replace for every character you want to map.  I
think that a better approach might be something like:

def downcode(name):
    return ''.join(_MAP.get(c, c) for c in name)

Or using string.translate:

import string
def downcode(name):
    table = string.maketrans(
        'ÀÁÂÃÄÅ...',
        'AAAAAA...')
    return name.translate(table)


Or even simpler:

import unicodedata

def downcode(name):
   return unicodedata.normalize("NFD", name)\
          .encode("ascii", "ignore")\
          .decode("ascii")

Servus,
   Walter

As I understand it, the "ignore" argument to str.encode *removes* theundecodable characters, rather than replacing them with an ASCIIapproximation. Is that correct? If so, wouldn't that rather defeat thepurpose?


--
Rami Chowdhury

"Never attribute to malice that which can be attributed to stupidity" --Hanlon's Razor

408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD)
--
http://mail.python.org/mailman/listinfo/python-list

Re: unicode issue

Reply via email to