Re: UnicodeDecodeError having fetch web page

Peter Otten Tue, 25 May 2010 13:31:25 -0700

Barry wrote:

> On 25 Maj, 21:39, Philip Semanchuk <[email protected]> wrote:
>> On May 25, 2010, at 3:13 PM, Barry wrote:
>>
>>
>>
>> > Hi,
>>
>> > The code below is giving me the error:
>>
>> > Traceback (most recent call last):
>> > File "C:\Users\Administratör\Desktop\test.py", line 4, in <module>
>> > UnicodeDecodeError: 'utf8' codec can't decode byte 0x8b in position 1:
>> > unexpected code byte
>>
>> > What am i doing wrong?
>>
>> > Thanks,
>>
>> > Barry
>>
>> > request = urllib.request.Request(url='http://en.wiktionary.org/wiki/
>> > baby',headers={'User-Agent':'Mozilla/5.0 (X11; U; Linux i686) Gecko/
>> > 20071127 Firefox/2.0.0.11'} )
>>
>> > response = urllib.request.urlopen(request)
>> > html = response.read().decode('utf-8')
>>
>> Well, for starters you're assuming that the response content is in
>> UTF-8. You need to examine the Content-Type header to see what the
>> encoding is. If it's not UTF-8, there's your problem.
>>
>> HTH
>> P
> 
> The content type is utf-8:
> 
> Date: Wed, 19 May 2010 19:17:39 GMT
> Server: Apache
> Cache-Control: private, s-maxage=0, max-age=0, must-revalidate
> Content-Language: en
> Vary: Accept-Encoding,Cookie
> Last-Modified: Wed, 19 May 2010 10:10:34 GMT
> Content-Encoding: gzip


But the data is gzipped. You have to uncompress it before decoding.

Peter
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: UnicodeDecodeError having fetch web page

Reply via email to