> I thought the correct way to do this in python would be to scan the > dir > files=os.listdir(os.path.dirname( os.path.realpath( __file__ ) )) > > then print the filenames > for filename in files: > print filename > > but as expected teh filename is not correct - so correct it using the > file sysytems encoding > > print filename.decode(sys.getfilesystemencoding()) > > But I get > UnicodeEncodeError: 'charmap' codec can't encode character u'\u2014' > in position 6: character maps to <undefined>
As a starting point, you shouldn't be using byte-oriented APIs to access files on Windows; the specific byte-oriented API is os.listdir, when passed a directory represented as a byte string. So try: dirname = os.path.dirname(os.path.realpath(__file__)) dirname = dirname.decode(sys.getfilesystemencoding() files = os.listdir(dirname) This should give you the files as Unicode strings. > I need to be able to write (a representation) to the screen (and I > don't see why I should not get something as good as DOS shows). The command window (it's not really DOS anymore) uses the CP_OEMCP encoding, which is not available in Python. This does all the transliteration also, so you would have to write an extension module if you want to get the same transliteration (or try to get to the OEMCP encoding through ctypes). If you can live with a simpler transliteration, try print filename.encode(sys.stdout.encoding, "replace") > Write it to an XML file in UTF-8 > > and write it to a text file and be able to read it back in. > Again I was supprised that this was also difficult - it appears that > the file also wanted ascii. Should I have to open the file in binary > for write (I expect so) but then what encoding should I write in? You need to tell us how precisely you tried to do this. My guess is: if you now try again, with the filenames being Unicode strings, it will work fairly well. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list