On Thu, 01 Oct 2009 08:10:58 -0700, Walter Dörwald <wal...@livinglogic.de> wrote:

On 01.10.09 16:09, Hyuga wrote:
On Sep 30, 3:34 am, gentlestone <tibor.b...@hotmail.com> wrote:
Why don't work this code on Python 2.6? Or how can I do this job?

[snip _MAP]

def downcode(name):
    """
    >>> downcode(u"Žabovitá zmiešaná kaša")
    u'Zabovita zmiesana kasa'
    """
    for key, value in _MAP.iteritems():
        name = name.replace(key, value)
    return name

Though C Python is pretty optimized under the hood for this sort of
single-character replacement, this still seems pretty inefficient
since you're calling replace for every character you want to map.  I
think that a better approach might be something like:

def downcode(name):
    return ''.join(_MAP.get(c, c) for c in name)

Or using string.translate:

import string
def downcode(name):
    table = string.maketrans(
        'ÀÁÂÃÄÅ...',
        'AAAAAA...')
    return name.translate(table)

Or even simpler:

import unicodedata

def downcode(name):
   return unicodedata.normalize("NFD", name)\
          .encode("ascii", "ignore")\
          .decode("ascii")

Servus,
   Walter

As I understand it, the "ignore" argument to str.encode *removes* the undecodable characters, rather than replacing them with an ASCII approximation. Is that correct? If so, wouldn't that rather defeat the purpose?

--
Rami Chowdhury
"Never attribute to malice that which can be attributed to stupidity" -- Hanlon's Razor
408-597-7068 (US) / 07875-841-046 (UK) / 0189-245544 (BD)
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to