I have a program that reads files using glob and puts them into an XML file in UTF-8 using unicode(file, sys.getfilesystemencoding()).encode("UTF-8") This all works fine including all the odd characters like accents etc.
However I also print what it is doing and someone pointed out that many characters are not printing correctly in the Windows command window. I have tried to figure this out but simply get lost in the translation stuff. if I just use print filename it has characters that dont match the ones in the filename (I sorta expected that). So I tried print unicode(file, sys.getfilesystemencoding()) expecting the correct result, but no. UnicodeEncodeError: 'charmap' codec can't encode character u'\u2013' I did notice that when a windows command window does a directory listing of these files the characters seem to be translated into close approximations (long dash to minus, special double quotes to simple double quotes, but still retains many of the accent chars). I looked at translate to do this but did not know how to determine which characters to map. Can anyone tell me what I should be doing here? -- http://mail.python.org/mailman/listinfo/python-list