Hi,
Thank you very much, you have collectively cleared up some of the confusion.
English windows command prompt uses cp437 charset.
To be exact my Windows is german but I am not outputting to the command prompt window. I am using eclipse with the pydev plugin as development platform and the output is redirected to the console view in the IDE. I am not sure how this affects the problem and have since tried a vanilla console too. The problem stays the same, though.
I wonder what surprises are waiting for me when I first move this to my linux-box :-). I believe it uses UTF-8 throughout.
> print d.encode('cp437')
So I would have to specify the encoding on every call to print? I am sure to forget and I don't like the program dying, in my case garbled output would be much more acceptable.
Is there some global way of forcing an encoding instead of the default 'ascii'? I have found references to setencoding() but this seems to have gone away.
The issue is a terminal only understand certain character set.
I have experimented a bit now and I can make it work using encode(). The eclipse console uses a different encoding than my windows command prompt, by the way. I am sure this can be configured somewhere but I do not really care at the moment.
> If you have unicode string, like d in your case, you have to encode it before
it can be printed.
I got that now.
So encode() is a method of a unicode string, right?. I come from a background of statically typed languages so I am a bit queasy when I am not allowed to explicitly specify type.
How can I, maybe by print()-ing something, find out what type d actually is of? Just to make sure and get a better feeling for the system?
Should d at any time not be a unicode string but some other flavour of string, will encode() still work? Or do I need to write a function myPrint() that distinguishes them by type and calls encode() only for unicode strings?
So how did you get a unicoded d to start with?
I have asked myself this question before after reading the docs for os.listdir(). But I have no way of finding out what type d really is (see question above :-)). So I was dead-reckoning.
Can I force a string to be of a certain type? Like
nonunicode=unicode.encode("specialencoding")
How would I do it the other way round? From encoded representation to full unicode?
If 'somepath' is unicode, os.listdir returns a list of unicode.
> So why is somepath unicode?
> One possible source is XML parser, which returns string in unicode.
I get a root-directory from XML and I walk the filesystem from there. That explains it.
Windows NT support unicode filename. I'm not sure about Linux. The result maybe slightly differ.
I think I will worry about that later. I can create files using german umlauts on the linux box. I am sure I will find a way to move those names into my Python program.
I will not move data between the systems so there will not be much of a problem.
Ciao, MM -- Marian Aldenhövel, Rosenhain 23, 53123 Bonn. +49 228 624013. http://www.marian-aldenhoevel.de "There is a procedure to follow in these cases, and if followed it can pretty well guarantee a generous measure of success, success here defined as survival with major extremities remaining attached." -- http://mail.python.org/mailman/listinfo/python-list