On Wed, Feb 23, 2005 at 10:07:19PM +0100, "Martin v. L�wis" wrote: > So we have three options: > 1. skip this string, only return the ones that can be > converted to Unicode. Give the user the impression > the file does not exist. > 2. return the string as a byte string > 3. refuse to listdir altogether, raising an exception > (i.e. return nothing) > > Python has chosen alternative 2, allowing the application > to implement 1 or 3 on top of that if it wants to (or > come up with other strategies, such as user feedback).
Understood. This appears to be the most flexible solution among the three. > >3) The proper "general" way to deal with this situation? > > You can chose option 1 or 3; you could tell the user > about it, and then ignore the file, you could try to > guess the encoding (UTF-8 would be a reasonable guess). Ok. > >My goal is to build generalized code that consistently works with all > >kinds of filenames. > > Then it is best to drop the notion that file names are > character strings (because some file names aren't). You > do so by converting your path variable into a byte > string. To do that, you could try [snip] > So your code would read > > try: > path = path.encode(sys.getfilesystemencoding() or > sys.getdefaultencoding()) > except UnicodeError: > print >>sys.stderr, "Invalid path name", repr(path) > sys.exit(1) This makes sense to me. I'll work on implementing it that way. Thanks for the in-depth explanation! KEN -- Kenneth J. Pronovici <[EMAIL PROTECTED]> Personal Homepage: http://www.skyjammer.com/~pronovic/ "They that can give up essential liberty to obtain a little temporary safety deserve neither liberty nor safety." - Benjamin Franklin, Historical Review of Pennsylvania, 1759 -- http://mail.python.org/mailman/listinfo/python-list
