Duncan Booth wrote:
Windows (when using NTFS) stores all the filenames in unicode, and Python
uses the unicode api to implement listdir (when given a unicode path). This
means that the filename never gets encoded to a byte string either by the
OS or Python. If you use a byte string path than the filename gets encoded
by Windows and Python just returns what it is given.
Serge's answer is good: you might only want to apply this algorithm to
posixpath. OTOH, in the specific case, it would not have caused problems
if it were applied to ntpath as well: the path was a Unicode string, so
listdir would have returned only Unicode strings (on Windows), and the
code in path.join dealing with mixed string types would not have been
triggered.
Again, I think the algorithm should be this:
- if both are the same kind of string, just concatenate them
- if not, try to coerce the byte string to a Unicode string, using
sys.getfileencoding()
- if that fails, try the other way 'round
- if that fails, let join fail.
The only drawback I can see with that approach is that it would "break"
environments where the system encoding is "undefined", i.e. implicit
string/unicode coercions are turned off. In such an environment, it
is probably desirable that os.path.join performs no coercion as well,
so this might need to get special-cased.
Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list