Hi,
I am very new to Python and have run into the following problem. If I do something like
dir = os.listdir(somepath) for d in dir: print d The program fails for filenames that contain non-ascii characters.
'ascii' codec can't encode characters in position 33-34:
I have noticed that this seems to be a very common problem. I have read a lot
of postings regarding it but not really found a solution. Is there a simple
one?
English windows command prompt uses cp437 charset. To print it, use
print d.encode('cp437')
The issue is a terminal only understand certain character set. If you have unicode string, like d in your case, you have to encode it before it can be printed. (We really need native unicode terminal!!!) If you don't encode, Python will do it for you. The default encoding is ASCII. Any string that contains non-ASCII character will give you trouble. In my opinion Python is too conversative to use the 'strict' encoding which gives users unaware of unicode a lot of woes.
So how did you get a unicoded d to start with? If 'somepath' is unicode, os.listdir returns a list of unicode. So why is somepath unicode? Either you have entered a unicode literal or it comes from some other sources. One possible source is XML parser, which returns string in unicode.
Windows NT support unicode filename. I'm not sure about Linux. The result maybe slightly differ.
What I specifically do not understand is why Python wants to interpret the
string as ASCII at all. Where is this setting hidden?
I am running Python 2.3.4 on Windows XP and I want to run the program on Debian sarge later.
Ciao, MM
-- http://mail.python.org/mailman/listinfo/python-list