But wouldn't that be correct in my case?
This is what I get inside Eclipse using pydev when I run:
<code> import os dirname = "c:/test" print dirname for fname in os.listdir(dirname): print fname if os.path.isfile(fname): print fname </code>:
c:/test straßenschild.png test.py Übersetzung.rtf
This is what I get passing a unicode argument to os.listdir:
<code> import os dirname = u"c:/test" print dirname # will print fine, all ascii subset compatible for fname in os.listdir(dirname): print fname if os.path.isfile(fname): print fname </code>
c:/test
Traceback (most recent call last):
File "C:\Programme\eclipse\workspace\myFirstProject\pythonFile.py", line 5, in ?
print fname
UnicodeEncodeError: 'ascii' codec can't encode character u'\xdf' in position 4: ordinal not in range(128)
which is probably what you are getting, right?
You are trying to write *Unicode* objects containing characters outside of the 0-128 to a multi byte-oriented output without telling Python the appropriate encoding to use. Inside eclipse, Python will always use ascii and never guess.
import os dirname = u"c:/test" print dirname for fname in os.listdir(dirname): print type(fname)
c:/test <type 'unicode'> <type 'unicode'> <type 'unicode'>
so finally: <code> import os dirname = u"c:/test" print dirname for fname in os.listdir(dirname): print fname.encode("mbcs") </code>
gives:
c:/test straßenschild.png test.py Übersetzung.rtf
Instead of "mbcs", which should be available on all Windows systems, you could have used "cp1252" when working on a German locale; inside Eclipse even "utf-16-le" would work, underscoring that the way the 'output device' handles encodings is decisive. I know this all seems awkward at first, but Python's drive towards uncompromising explicitness pays off big time when you're dealing with multilingual data.
-- Vincent Wehren
-- http://mail.python.org/mailman/listinfo/python-list