On 12/17/2012 12:43 PM, Anatoli Hristov wrote: >> Hi, >> I don't know, what the product ID would look like, for this page, but >> assuming, the catalog pages are also utf-8 encoded as well as the >> error page I get, it should work ok; cf.: > You are right, I get it work on Windows too, but not in Linux. I > changed the codec of linux, but still I don't get it > > Here is what I get from Linux: > >>>> import urllib >>>> opener = urllib.FancyURLopener({}) >>>> ffr = >>>> opener.open("http://prf.icecat.biz/index.cgi?product_id=%s;mi=start;smi=product;shopname=openICEcat-url;lang=fr" >>>> % (14688538)) >>>> src = ffr.read() >>>> print src.decode("utf-8") > Traceback (most recent call last): > File "<stdin>", line 1, in ? > UnicodeEncodeError: 'latin-1' codec can't encode character u'\u2122' > in position 17167: ordinal not in range(256)
I can tell you what's happening, but maybe not how to fix it. src.decode() is creating a unicode string. The error is not happening there. But when print is used with a unicode string, it has to encode the data. And for whatever reason, yours is using latin-1, and you have a character in there which is not in the latin-1 encoding. My python 2.7 uses utf-8 everywhere (on Linux Ubuntu 11.04). -- DaveA -- http://mail.python.org/mailman/listinfo/python-list