from:"junkdump2861"

Inconsistent result from urllib.urlopen

2007-04-12 Thread junkdump2861

Here's the problem:  using Netscape 7.1, I type use the view page
source command (url is http://en.wikipedia.org/wiki/Cain) and save the
raw HTML file and it's  67 kb, and has the addresses of all the images
in it.  I want the exact same thing from my Python script, but I'm not
getting it.  Instead, I get a file only 21 kb that has no image
addresses.  Here's the code I use:

import urllib
f = urllib.urlopen('http://en.wikipedia.org/wiki/Cain')
data = f.read(999)
f.close()
f1 = open('junk.txt', 'w')
f1.write(data)
f1.close()

Any ideas why I don't get the same result from the python script as I
do from a web browser?  This problem seems to be a recent
development.  The scripts I wrote like this worked fine for a while
and then stopped working within the past couple of weeks.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Inconsistent result from urllib.urlopen

2007-04-13 Thread junkdump2861


Laszlo Nagy wrote:
> > Any ideas why I don't get the same result from the python script as I
> > do from a web browser?  This problem seems to be a recent
> > development.  The scripts I wrote like this worked fine for a while
> > and then stopped working within the past couple of weeks.
> >
> Maybe it has to do something with your user agent string. The server
> side can decide to return a different content when your user agent is
> not 'mozilla', 'internet explorer' or 'opera' etc.
>
> Do you want to know how to change your user agent string? Google for
> it :-)
>
>Laszlo

Thanks.  That is the fix I needed.  I added

urllib.URLopener.version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1;
en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)'

as the second line of code and now it is actually getting content, not
just an error message.  It's not the exact same format as you get from
saving the page from the web browser, but all the links and image
addresses are in place.

-- 
http://mail.python.org/mailman/listinfo/python-list

Inconsistent result from urllib.urlopen

Re: Inconsistent result from urllib.urlopen

2 matches

Site Navigation

Mail list logo

Footer information