Re: Printing Filenames with non-Ascii-Characters

Marian Aldenhövel Wed, 02 Feb 2005 01:15:05 -0800

Hi,

Thank you very much, you have collectively cleared up some of the confusion.

English windows command prompt uses cp437 charset.


To be exact my Windows is german but I am not outputting to the command 
prompt
window. I am using eclipse with the pydev plugin as development platform and
the output is redirected to the console view in the IDE. I am not sure how
this affects the problem and have since tried a vanilla console too. The
problem stays the same, though.

I wonder what surprises are waiting for me when I first move this to my
linux-box :-). I believe it uses UTF-8 throughout.

> print d.encode('cp437')

So I would have to specify the encoding on every call to print? I am sure to
forget and I don't like the program dying, in my case garbled output would be
much more acceptable.

Is there some global way of forcing an encoding instead of the default
'ascii'? I have found references to setencoding() but this seems to have gone
away.

The issue is a terminal only understand certain character set.


I have experimented a bit now and I can make it work using encode(). The
eclipse console uses a different encoding than my windows command prompt, by
the way. I am sure this can be configured somewhere but I do not really care
at the moment.

> If you have unicode string, like d in your case, you have to encode it before

it can be printed.


I got that now.

So encode() is a method of a unicode string, right?. I come from a background
of statically typed languages so I am a bit queasy when I am not allowed to
explicitly specify type.

How can I, maybe by print()-ing something, find out what type d actually is
of? Just to make sure and get a better feeling for the system?

Should d at any time not be a unicode string but some other flavour of string,
will encode() still work? Or do I need to write a function myPrint() that
distinguishes them by type and calls encode() only for unicode strings?

So how did you get a unicoded d to start with?


I have asked myself this question before after reading the docs for
os.listdir(). But I have no way of finding out what type d really is (see
question above :-)). So I was dead-reckoning.

Can I force a string to be of a certain type? Like

    nonunicode=unicode.encode("specialencoding")

How would I do it the other way round? From encoded representation to full
unicode?

If 'somepath' is unicode, os.listdir returns a list of unicode.

> So why is somepath unicode?

> One possible source is XML parser, which returns string in unicode.

I get a root-directory from XML and I walk the filesystem from there. That
explains it.

Windows NT support unicode filename. I'm not sure about Linux. The result maybe slightly differ.


I think I will worry about that later. I can create files using german 
umlauts
on the linux box. I am sure I will find a way to move those names into my
Python program.

I will not move data between the systems so there will not be much of
a problem.

Ciao, MM
--
Marian Aldenhövel, Rosenhain 23, 53123 Bonn. +49 228 624013.
http://www.marian-aldenhoevel.de
"There is a procedure to follow in these cases, and if followed it can
 pretty well guarantee a generous measure of success, success here
 defined as survival with major extremities remaining attached."
--
http://mail.python.org/mailman/listinfo/python-list

Re: Printing Filenames with non-Ascii-Characters

Reply via email to