On 12/17/2012 03:00 PM, Anatoli Hristov wrote: >> I fixed the print, I changed the setting of the terminal and also on >> the sshconfig, so now when I print I'm able to print out without >> problems, but when I tried to run the script I've made it gives me >> again the same error : >> ""Unexpected error: exceptions.UnicodeEncodeError >> """ That's not the whole error message. What encoding does it report in the error?
Maybe I will try to update to 2.7 > Upgraded to python 27 and still it gives Unexpected error: > exceptions.UnicodeEncodeError. Damn encoders I don'y know what to > do... I doubted that 2.7 would make any difference. 1. What does your "terminal' expect. (For all I know you're using TeraTermPro as a terminal, which doesn't support utf-8.) Have you looked at the terminal encoding to see what your copy of Terminal is expecting? On my Ubuntu Linux, I open the terminal with Ctrl-Alt-t, then in the menu bar, I select Terminal->SetCharacterEncoding->utf-8 2. What does your environment tell Linux to support? At a bash prompt, try echo $LANG (there are two other environment variables I've seen reference to, so this aspect is nuts) Mine says en_US.UTF-8 3. What does Python think it was told? import sys print sys.stdout.encoding Mine says UTF-8 I can force a similar error as follows: import urllib opener = urllib.FancyURLopener({}) ffr = opener.open("http://prf.icecat.biz/index.cgi?product_id=%s;mi=start;smi=product;shopname=openICEcat-url;lang=fr" % (14688538)) src = ffr.read() out = src.decode("utf-8").encode("latin-1") Traceback (most recent call last): File "anatoli3.py", line 9, in <module> src.decode("utf-8").encode("latin-1") UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2122' in position 17167: ordinal not in range(256) And from that it's quite clear that for that particular data, I cannot use a latin-1 encoder. So I did a bit of hunting, and I find the offending character is the one after the word 'Core" in the following quote: processeurs Intel® Core™ de 3ème génération The symbol is a trademark symbol and is not part of latin-1. If you're really stuck with a latin-1 terminal, then you could do something like: print src.decode("utf-8").encode("latin-1", "ignore") That says to decode it using utf-8 (because the html declared a utf-8 encoding), and encode it back to latin-1 (because your terminal is stuck there), then print. Just realize that once you start using 'ignore' you're going to also ignore discrepancies that are real. For example, maybe your terminal is actual something other than either latin-1 or utf-8. For others that just want to play with a minimal subset: test = u'processeurs Intel\xae Core\u2122 de 3\xe8me g\xe9n\xe9ration av' print test print test.encode("latin-1", "ignore") print test.encode("latin-1") produces : processeurs Intel® Core™ de 3ème génération av processeurs Intel� Core de 3�me g�n�ration av Traceback (most recent call last): File "anatoli3.py", line 22, in <module> print test.encode("latin-1") UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2122' in position 23: ordinal not in range(256) -- DaveA -- http://mail.python.org/mailman/listinfo/python-list