Inconsistent result from urllib.urlopen
Here's the problem: using Netscape 7.1, I type use the view page source command (url is http://en.wikipedia.org/wiki/Cain) and save the raw HTML file and it's 67 kb, and has the addresses of all the images in it. I want the exact same thing from my Python script, but I'm not getting it. Instead, I get a file only 21 kb that has no image addresses. Here's the code I use: import urllib f = urllib.urlopen('http://en.wikipedia.org/wiki/Cain') data = f.read(999) f.close() f1 = open('junk.txt', 'w') f1.write(data) f1.close() Any ideas why I don't get the same result from the python script as I do from a web browser? This problem seems to be a recent development. The scripts I wrote like this worked fine for a while and then stopped working within the past couple of weeks. -- http://mail.python.org/mailman/listinfo/python-list
Re: Inconsistent result from urllib.urlopen
Laszlo Nagy wrote: > > Any ideas why I don't get the same result from the python script as I > > do from a web browser? This problem seems to be a recent > > development. The scripts I wrote like this worked fine for a while > > and then stopped working within the past couple of weeks. > > > Maybe it has to do something with your user agent string. The server > side can decide to return a different content when your user agent is > not 'mozilla', 'internet explorer' or 'opera' etc. > > Do you want to know how to change your user agent string? Google for > it :-) > >Laszlo Thanks. That is the fix I needed. I added urllib.URLopener.version = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.4) Gecko/20030624 Netscape/7.1 (ax)' as the second line of code and now it is actually getting content, not just an error message. It's not the exact same format as you get from saving the page from the web browser, but all the links and image addresses are in place. -- http://mail.python.org/mailman/listinfo/python-list