Re: LC_ALL and os.listdir()

"Martin v. Löwis" Thu, 24 Feb 2005 07:15:04 -0800

Duncan Booth wrote:

Windows (when using NTFS) stores all the filenames in unicode, and Python uses the unicode api to implement listdir (when given a unicode path). This means that the filename never gets encoded to a byte string either by the OS or Python. If you use a byte string path than the filename gets encoded by Windows and Python just returns what it is given.


Serge's answer is good: you might only want to apply this algorithm to
posixpath. OTOH, in the specific case, it would not have caused problems
if it were applied to ntpath as well: the path was a Unicode string, so
listdir would have returned only Unicode strings (on Windows), and the
code in path.join dealing with mixed string types would not have been
triggered.

Again, I think the algorithm should be this:
- if both are the same kind of string, just concatenate them
- if not, try to coerce the byte string to a Unicode string, using
  sys.getfileencoding()
- if that fails, try the other way 'round
- if that fails, let join fail.

The only drawback I can see with that approach is that it would "break"
environments where the system encoding is "undefined", i.e. implicit
string/unicode coercions are turned off. In such an environment, it
is probably desirable that os.path.join performs no coercion as well,
so this might need to get special-cased.

Regards,
Martin
--
http://mail.python.org/mailman/listinfo/python-list

Re: LC_ALL and os.listdir()

Reply via email to