On Aug 24, 10:09 pm, Ned Deily <n...@acm.org> wrote: > In article > <e5e2ec2e-2b4a-4ca8-8c0f-109e5f4eb...@v23g2000pro.googlegroups.com>, > > > > 7stud <bbxx789_0...@yahoo.com> wrote: > > On Aug 24, 2:41 pm, "Martin v. Löwis" <mar...@v.loewis.de> wrote: > > > > I can't figure out a way to programatically set the encoding for > > > > sys.stdout. So where does that leave me? > > > > You should be setting the terminal encoding administratively, not > > > programmatically. > > > The terminal encoding has always been utf-8. It was not set > > programmatically. > > > It seems to me that python 3.1's string handling is broken. > > Apparently, in python 3.1 I am unable to explicitly set the encoding > > of a string and print() it out with the result being human readable > > text. On the other hand, if I let python do the encoding implicitly, > > python uses a codec I don't want it to. > > If you are running on a Unix-y system, check your locale settings (LANG, > LC.*, et al). I think you'll likely find that your locale is really not > UTF-8. The following was on Python 3.1 on OS X 10.5, similar results > on Debian Linux: > > $ cat t3.py > import sys > print(sys.stdout.encoding) > s = "¤" > print(s.encode("utf-8")) > print(s) > > $ export LANG=en_US.UTF-8 > $ python3.1 t3.py > UTF-8 > b'\xe2\x82\xac' > ¤ > > $ export LANG=C > $ python3.1 t3.py > US-ASCII > b'\xe2\x82\xac' > Traceback (most recent call last): > File "t3.py", line 7, in <module> > print(s) > UnicodeEncodeError: 'ascii' codec can't encode character '\u20ac' in > position 0: ordinal not in range(128) > > -- > Ned Deily, > n...@acm.org
Hi, Thanks for the response. My OS is mac osx 10.4.11. I'm not really sure how to check my locale settings. Here is some stuff I tried: $ echo $LANG $ echo $LC_ALL $ echo $LC_CTYPE $ locale LANG= LC_COLLATE="C" LC_CTYPE="C" LC_MESSAGES="C" LC_MONETARY="C" LC_NUMERIC="C" LC_TIME="C" LC_ALL="C" $man locale ... ... ... ENVIRONMENT: LANG Used as a substitute for any unset LC_* variable. If LANG is unset it will act as if set to "C". If any of LANG or LC_* are set to invalide values locale acts as if they are all unset. =========== As in your last example, my 'C' settings mean that an ascii codec is used somewhere to encode() the unicode string. -- The locale C or POSIX is a portable locale; its LC_CTYPE part corresponds to the 7-bit ASCII character set. http://linux.about.com/library/cmd/blcmdl3_setlocale.htm -- Is this the way it works: 1) python sets the codec for sys.stdout to the LANG environment variable. 2) It doesn't matter that my terminal's encoding is set to utf-8 because output has to pass through sys.stdout first. So: a) My terminal's environment is telling python(and all other programs running in the terminal) that output sent to sys.stdout must be encoded in ascii. b) The solution is to set a LANG environment variable. Why does echoing $LC_ALL or $LC_CTYPE just give me a blank string? Previously, I've set environment variables that I want to be permanent, e.g PATH, in ~/.bash_profile, so I did this: ~/.bash_profile: -------------- ... ... LANG="en_US.UTF-8" export LANG and now python 3.1 acts like I expect it to: ------- import locale import sys print(locale.getlocale(locale.LC_CTYPE)) print(sys.stdout.encoding) s = "€" print(s) print(s.encode("utf-8")) --output:-- ('en_US', 'UTF8') UTF-8 € b'\xe2\x82\xac' ---------- In conclusion, as far as I can tell, if your python 3.1 program tries to output a unicode string, and the unicode string cannot be encoded by the codec specified in the user's LANG environment variable**, then the user will get an encode error. Just because the programmer's system can handle the output doesn't mean that another user's system can. I guess that's the way it goes: if a user's environment is telling all programs that it only wants ascii output to go to the screen(sys.stdout), you can't(or shouldn't) do anything about it. **Or if the LANG environment variable is not present, then the codec corresponding to the locale settings(C' corresponds to ascii). some good locale info: http://www.chemie.fu-berlin.de/chemnet/use/info/libc/libc_19.html -- http://mail.python.org/mailman/listinfo/python-list