STINNER Victor <victor.stin...@haypocalc.com> added the comment: It remembers me the discussion of the issue #3187. About unencodable filenames, Guido proposed to ignore them or to use errors="replace", and wrote "Failing the entire os.listdir() call is not acceptable". (... long discussion ...) And finally, os.listdir() ignored undecodable filenames on UNIX/BSD.
Then you introduced the genious PEP 383 (utf8b then renamed surrogateescape) and os.listdir() now raises an error if the PyUnicode_FromEncodedObject(v, Py_FileSystemDefaultEncoding, "surrogateescape") fails... which doesn't occur because of undecodable byte sequence, but for other reasons like a memory error. About Windows, os.listdir(str) never fails, but my question is about os.listdir(bytes). Should os.listdir(bytes) returns invalid filenames (encoded with "mbcs+replace", filenames not usable to open, rename or delete the file) or just ignore them? > Ok. Then I'm -1 on the patch: you can't know whether the application > actually wants to open the file. Perhaps it only wants to display the > file names, or perhaps it only wants to open some of the files, or > only traverse into subdirectories. > > For backwards compatibility, I recommend to leave things as they are. > FindFirst/NextFileA will also do some other interesting conversions, > such as the best-fit conversion (which the "mbcs" code doesn't do > (anymore?)). "it only wants to open some of the files" is the typical reason for which I hate Python2 and its implicit conversion between bytes and characters: it works in most cases, but it fails "sometimes". The problem is to define (and explain) "sometimes". The typical use case of listing a directory is a file chooser. On Windows using the bytes API, it works in most cases, but it fails if the user picks the "wrong" file (name with "?"). That's the problem I would like to address. -- Ignore unencodable filenames solution is compatible with the "traverse into subdirectories" case. And it does also keep backward compatibility (except that unencodable files are hidden, which is a least problem I think). -- I proposed to raise an error on unencodable filename. I changed my mind after reading your answer and the discussion on #3187. My patch breaks compatibility and users don't bother to unencodable filenames. Eg. glob("*.mp3") should not fail if the directory contains a temporary unencodable filename ("xxx.tmp"). ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue9820> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com