On 25 Maj, 21:39, Philip Semanchuk <phi...@semanchuk.com> wrote: > On May 25, 2010, at 3:13 PM, Barry wrote: > > > > > Hi, > > > The code below is giving me the error: > > > Traceback (most recent call last): > > File "C:\Users\Administratör\Desktop\test.py", line 4, in <module> > > UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1: > > unexpected code byte > > > What am i doing wrong? > > > Thanks, > > > Barry > > > request = urllib.request.Request(url='http://en.wiktionary.org/wiki/ > > baby',headers={'User-Agent':'Mozilla/5.0 (X11; U; Linux i686) Gecko/ > > 20071127 Firefox/2.0.0.11'} ) > > > response = urllib.request.urlopen(request) > > html = response.read().decode('utf-8') > > Well, for starters you're assuming that the response content is in > UTF-8. You need to examine the Content-Type header to see what the > encoding is. If it's not UTF-8, there's your problem. > > HTH > P
The content type is utf-8: Date: Wed, 19 May 2010 19:17:39 GMT Server: Apache Cache-Control: private, s-maxage=0, max-age=0, must-revalidate Content-Language: en Vary: Accept-Encoding,Cookie Last-Modified: Wed, 19 May 2010 10:10:34 GMT Content-Encoding: gzip Content-Length: 25247 Content-Type: text/html; charset=utf-8 X-Cache: HIT from sq61.wikimedia.org X-Cache-Lookup: HIT from sq61.wikimedia.org:3128 Age: 520549 X-Cache: HIT from amssq32.esams.wikimedia.org X-Cache-Lookup: HIT from amssq32.esams.wikimedia.org:3128 X-Cache: MISS from amssq37.esams.wikimedia.org X-Cache-Lookup: MISS from amssq37.esams.wikimedia.org:80 Connection: close Can it be that the page is corrupt? If so, how can I make the best of the situation? Many other pages from this server work without problem. Thanks! Barry -- http://mail.python.org/mailman/listinfo/python-list