Gabriel Zachmann wrote: > Here is a very simple Python script utilizing urllib: > > import urllib > url = > "http://commons.wikimedia.org/wiki/Commons:Featured_pictures/chronologi > cal" > print url > print > file = urllib.urlopen( url ) > mime = file.info() > print mime > print file.read() > print file.geturl() > > > However, when i ecexute it, i get an html error ("access denied"). > > On the one hand, the funny thing though is that i can view the page > fine in my browser, and i can download it fine using curl. > > On the other hand, it must have something to do with the URL because > urllib works fine with any other URL i have tried ... > It looks like wikipedia checks the User-Agent header and refuses to send pages to browsers it doesn't like. Try:
headers = {} headers['User-Agent'] = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4' request = urllib2.Request(url, headers) file = urllib2.urlopen(request) ... That (or code very like it) worked when I tried it. -- http://mail.python.org/mailman/listinfo/python-list