> thanks for that. I guess the problem is that when a path is obtained > from such an object the code that gets the path usually has no way of > knowing what the intended use is. That makes storage as simple bytes > hard. I guess the correct way is to always convert to a standard (say > utf8) and then always know the required encoding when the thing is to be > used.
Inside the program itself, the best things is to represent path names as Unicode strings as early as possible; later, information about the original encoding may be lost. If you obtain path names from the os module, pass Unicode strings to listdir in order to get back Unicode strings. If they come from environment variables or command line arguments, use locale.getpreferredencoding() to find out what the encoding should be. If they come from a zip file, Tijs already explained what the encoding is. Always expect encoding errors; if they occur, chose to either skip the file name, or report an error to the user. Notice that listdir may return a byte string if decoding fails (this may only happen on Unix). Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list