Hi,

Thank you very much, you have collectively cleared up some of the confusion.

English windows command prompt uses cp437 charset.

To be exact my Windows is german but I am not outputting to the command prompt window. I am using eclipse with the pydev plugin as development platform and the output is redirected to the console view in the IDE. I am not sure how this affects the problem and have since tried a vanilla console too. The problem stays the same, though.

I wonder what surprises are waiting for me when I first move this to my
linux-box :-). I believe it uses UTF-8 throughout.

> print d.encode('cp437')

So I would have to specify the encoding on every call to print? I am sure to
forget and I don't like the program dying, in my case garbled output would be
much more acceptable.

Is there some global way of forcing an encoding instead of the default
'ascii'? I have found references to setencoding() but this seems to have gone
away.

The issue is a terminal only understand certain character set.

I have experimented a bit now and I can make it work using encode(). The eclipse console uses a different encoding than my windows command prompt, by the way. I am sure this can be configured somewhere but I do not really care at the moment.

> If you have unicode string, like d in your case, you have to encode it before
it can be printed.

I got that now.

So encode() is a method of a unicode string, right?. I come from a background
of statically typed languages so I am a bit queasy when I am not allowed to
explicitly specify type.

How can I, maybe by print()-ing something, find out what type d actually is
of? Just to make sure and get a better feeling for the system?

Should d at any time not be a unicode string but some other flavour of string,
will encode() still work? Or do I need to write a function myPrint() that
distinguishes them by type and calls encode() only for unicode strings?

So how did you get a unicoded d to start with?

I have asked myself this question before after reading the docs for os.listdir(). But I have no way of finding out what type d really is (see question above :-)). So I was dead-reckoning.

Can I force a string to be of a certain type? Like

    nonunicode=unicode.encode("specialencoding")

How would I do it the other way round? From encoded representation to full
unicode?

If 'somepath' is unicode, os.listdir returns a list of unicode.
> So why is somepath unicode?

> One possible source is XML parser, which returns string in unicode.

I get a root-directory from XML and I walk the filesystem from there. That
explains it.

Windows NT support unicode filename. I'm not sure about Linux. The result maybe slightly differ.

I think I will worry about that later. I can create files using german umlauts on the linux box. I am sure I will find a way to move those names into my Python program.

I will not move data between the systems so there will not be much of
a problem.

Ciao, MM
--
Marian Aldenhövel, Rosenhain 23, 53123 Bonn. +49 228 624013.
http://www.marian-aldenhoevel.de
"There is a procedure to follow in these cases, and if followed it can
 pretty well guarantee a generous measure of success, success here
 defined as survival with major extremities remaining attached."
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to