STINNER Victor <> added the comment:

It remembers me the discussion of the issue #3187. About unencodable filenames, 
Guido proposed to ignore them or to use errors="replace", and wrote "Failing 
the entire os.listdir() call is not acceptable". (... long discussion ...) And 
finally, os.listdir() ignored undecodable filenames on UNIX/BSD.

Then you introduced the genious PEP 383 (utf8b then renamed surrogateescape) 
and os.listdir() now raises an error if the PyUnicode_FromEncodedObject(v, 
Py_FileSystemDefaultEncoding, "surrogateescape") fails... which doesn't occur 
because of undecodable byte sequence, but for other reasons like a memory 

About Windows, os.listdir(str) never fails, but my question is about 
os.listdir(bytes). Should os.listdir(bytes) returns invalid filenames (encoded 
with "mbcs+replace", filenames not usable to open, rename or delete the file) 
just ignore them?

> Ok. Then I'm -1 on the patch: you can't know whether the application
> actually wants to open the file. Perhaps it only wants to display the
> file names, or perhaps it only wants to open some of the files, or
> only traverse into subdirectories.
> For backwards compatibility, I recommend to leave things as they are.
> FindFirst/NextFileA will also do some other interesting conversions,
> such as the best-fit conversion (which the "mbcs" code doesn't do
> (anymore?)).

"it only wants to open some of the files" is the typical reason for which I 
hate Python2 and its implicit conversion between bytes and characters: it 
works in most cases, but it fails "sometimes". The problem is to define (and 
explain) "sometimes".

The typical use case of listing a directory is a file chooser. On Windows using 
the bytes API, it works in most cases, but it fails if the user picks the 
"wrong" file (name with "?"). That's the problem I would like to address.


Ignore unencodable filenames solution is compatible with the "traverse into 
subdirectories" case. And it does also keep backward compatibility (except 
that unencodable files are hidden, which is a least problem I think).


I proposed to raise an error on unencodable filename. I changed my mind after 
reading your answer and the discussion on #3187. My patch breaks compatibility 
and users don't bother to unencodable filenames. Eg. glob("*.mp3") should not 
fail if the directory contains a temporary unencodable filename ("xxx.tmp").


Python tracker <>
Python-bugs-list mailing list

Reply via email to