coldpizza wrote: > Hello, > > I need to convert accented unicode chars in some audio files to > similarly-looking ascii chars. Looks like the following code seems to > work on windows: > > import os > import sys > import glob > > EXT = '*.*' > > lst_uni = glob.glob(unicode(EXT)) > > os.system('chcp 437') > lst_asci = glob.glob(EXT) > print sys.stdout.encoding > > for i in range(len(lst_asci)): > try: > os.rename(lst_uni[i], lst_asci[i]) > except Exception as e: > print e > > On windows it converts most of the accented chars from the latin1 > encoding. This does not work in Linux since it uses 'chcp'. > > The questions are (1) *why* does it work on windows, and (2) what is > the proper and portable way to convert unicode characters to similarly > looking plain ascii chars? > > That is how to properly do this kind of conversion? > ü > u > é > e > â > a > ä > a > à > a > á > a > ç > c > ê > e > ë > e > è > e > > Is there any other way apart from creating my own char replacement > table?
>>> from unicodedata import normalize >>> s = u"""ü > u ... é > e ... â > a ... ä > a ... à > a ... á > a ... ç > c ... ê > e ... ë > e ... è > e ... """ >>> from unicodedata import normalize >>> print normalize("NFD", s).encode("ascii", "ignore") u > u e > e a > a a > a a > a a > a c > c e > e e > e e > e -- http://mail.python.org/mailman/listinfo/python-list