Hi, I'm bringing over a thread that's going on on f.c.l.python.
The point was to get rid of french accents from words. We noticed that len('à') != len('a') and I found the hack below to fix the "problem" ... yet I do not understand - especially since 'à' is included in the extended ASCII table, and thus can be stored in one byte. Any clue ? hg # -*- coding: utf-8 -*- import string def convert(mot): print len(mot) print mot[0] print '%x' % ord(mot[1]) table = string.maketrans('àâäéèêëîïôöùüû','\x00a\x00a\x00a\x00e\x00e\x00e\x00e\x00i\x00i\x00o\x00o\x00u\x00u\x00u') return mot.translate(table).replace('\x00','') c = 'àbôö a ' print convert(c) -- http://mail.python.org/mailman/listinfo/python-list