Am 13.09.2012 23:26, schrieb Tim Chase: > I've got a bunch of text in Portuguese and to transmit them, need to > have them in us-ascii (7-bit). I'd like to keep as much information > as possible, just stripping accents, cedillas, tildes, etc. So > "serviço móvil" becomes "servico movil". Is there anything stock > that I've missed? I can do mystring.encode('us-ascii', 'replace') > but that doesn't keep as much information as I'd hope.
The unidecode [1] package contains a large mapping of unicode chars to ASCII. It even supports cool stuff like Chinese to ASCII: >>> import unidecode >>> print u"\u5317\u4EB0" 北亰 >>> print unidecode.unidecode(u"\u5317\u4EB0") Bei Jing icu4c and pyicu [2] may contain more methods for conversion but they require binary extensions. By the way ICU can do a lot of cool, too: >>> import icu >>> rbf = icu.RuleBasedNumberFormat(icu.URBNFRuleSetTag.SPELLOUT, icu.Locale.getUS()) >>> rbf.format(23) u'twenty-three' >>> rbf.format(100000) u'one hundred thousand' Regards, Christian [1] http://pypi.python.org/pypi/Unidecode/0.04.9 [2] http://pypi.python.org/pypi/PyICU/1.4 -- http://mail.python.org/mailman/listinfo/python-list