Barry wrote: > On 25 Maj, 21:39, Philip Semanchuk <phi...@semanchuk.com> wrote: >> On May 25, 2010, at 3:13 PM, Barry wrote: >> >> >> >> > Hi, >> >> > The code below is giving me the error: >> >> > Traceback (most recent call last): >> > File "C:\Users\Administratör\Desktop\test.py", line 4, in <module> >> > UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: >> > unexpected code byte >> >> > What am i doing wrong? >> >> > Thanks, >> >> > Barry >> >> > request = urllib.request.Request(url='http://en.wiktionary.org/wiki/ >> > baby',headers={'User-Agent':'Mozilla/5.0 (X11; U; Linux i686) Gecko/ >> > 20071127 Firefox/2.0.0.11'} ) >> >> > response = urllib.request.urlopen(request) >> > html = response.read().decode('utf-8') >> >> Well, for starters you're assuming that the response content is in >> UTF-8. You need to examine the Content-Type header to see what the >> encoding is. If it's not UTF-8, there's your problem. >> >> HTH >> P > > The content type is utf-8: > > Date: Wed, 19 May 2010 19:17:39 GMT > Server: Apache > Cache-Control: private, s-maxage=0, max-age=0, must-revalidate > Content-Language: en > Vary: Accept-Encoding,Cookie > Last-Modified: Wed, 19 May 2010 10:10:34 GMT > Content-Encoding: gzip
But the data is gzipped. You have to uncompress it before decoding. Peter -- http://mail.python.org/mailman/listinfo/python-list