I have some confusion regarding the relationship between locale, os.listdir() and unicode pathnames. I'm running Python 2.3.5 on a Debian system. If it matters, all of the files I'm dealing with are on an ext3 filesystem.
The real code this problem comes from takes a configured set of directories to deal with and walks through each of those directories using os.listdir(). Today, I accidentally ran across a directory containing three "normal" files (with ASCII filenames) and one file with a two-character unicode filename. My code, which was doing something like this: for entry in os.listdir(path): # path is <type 'unicode'> entrypath = os.path.join(path, entry) suddenly started blowing up with the dreaded unicode error: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128) To add insult to injury, it only happend for one of my test users, not the others. I ultimately traced the difference in behavior to the LC_ALL setting in the environment. One user had LC_ALL set to en_US, and the other didn't have it set at all. For the user with LC_ALL set, the os.listdir() call returned this, and the os.path.join() call succeeded: [u'README.strange-name', u'\xe2\x99\xaa\xe2\x99\xac', u'utflist.long.gz', u'utflist.cp437.gz', u'utflist.short.gz'] For the other user without LC_ALL set, the os.listdir() call returned this, and the os.path.join() call failed with the UnicodeDecodeError exception: [u'README.strange-name', '\xe2\x99\xaa\xe2\x99\xac', u'utflist.long.gz', u'utflist.cp437.gz', u'utflist.short.gz'] Note that in this second result, element [1] is not a unicode string while the other three elements are. Can anyone explain: 1) Why LC_ALL has any effect on the os.listdir() result? 2) Why only 3 of the 4 files come back as unicode strings? 3) The proper "general" way to deal with this situation? My goal is to build generalized code that consistently works with all kinds of filenames. Ultimately, all I'm trying to do is copy some files around. I'd really prefer to find a programmatic way to make this work that was independent of the user's configured locale, if possible. Thanks for the help, KEN -- Kenneth J. Pronovici <[EMAIL PROTECTED]> -- http://mail.python.org/mailman/listinfo/python-list