utf - string translation

hg Wed, 22 Nov 2006 10:46:05 -0800

Hi,

I'm bringing over a thread that's going on on f.c.l.python.


The point was to get rid of french accents from words.

We noticed that len('à') != len('a') and I found the hack below to fix
the "problem" ... yet I do not understand - especially since 'à' is
included in the extended ASCII table, and thus can be stored in one byte.

Any clue ?

hg





# -*- coding: utf-8 -*-
import string

def convert(mot):
    print len(mot)
    print mot[0]
    print '%x' % ord(mot[1])
    table =
string.maketrans('àâäéèêëîïôöùüû','\x00a\x00a\x00a\x00e\x00e\x00e\x00e\x00i\x00i\x00o\x00o\x00u\x00u\x00u')

    return mot.translate(table).replace('\x00','')


c = 'àbôö a '
print convert(c)
-- 
http://mail.python.org/mailman/listinfo/python-list

utf - string translation

Reply via email to