Marian Aldenhövel wrote:
Hi,

I am very new to Python and have run into the following problem. If I do
something like

dir = os.listdir(somepath)
for d in dir:
print d
The program fails for filenames that contain non-ascii characters.


'ascii' codec can't encode characters in position 33-34:

If you read this carefully, you'll notice that Python has tried and failed to *encode* a decoded ( = unicode) string using the 'ascii' codec. IOW, d seems to be bound to a unicode string. Which is unexpected unless maybe the argument passed to os.listdir (somepath) is a Unicode string, too. (If given a Unicode string as argument, os.listdir will return the list as a list of unicode names).


If you're printing to the console, modern Pythons will try to guess the console's encoding (e.g. cp850). I would expect a UnicodeEncodeError if the print fails because the characters do not map to the console's encoding, not the error you're seeing.

How *are* you running the program. In the console (cmd.exe)? Or from some IDE?


I have noticed that this seems to be a very common problem. I have read a lot
of postings regarding it but not really found a solution. Is there a simple
one?


What I specifically do not understand is why Python wants to interpret the
string as ASCII at all. Where is this setting hidden?

Don't be tempted to ever change sys.defaultencoding in site.py, this is site specific, meaning that if you ever distribute them, programs relying on this setting may fail on other people's Python installations.


--
Vincent Wehren


I am running Python 2.3.4 on Windows XP and I want to run the program on Debian sarge later.

Ciao, MM
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to