Re: urllib behaves strangely

2006-06-13 Thread Gabriel Zachmann
> headers = {} > headers['User-Agent'] = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; > rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4' > > request = urllib2.Request(url, headers) > file = urllib2.urlopen(request) ah, thanks a lot, that works ! Best regards, Gabriel. -- /

Re: urllib behaves strangely

2006-06-13 Thread Gabriel Zachmann
> On the other hand something which is simply retrieving one or two fixed > pages doesn't fit that definition of a bot so is probably alright. They i think so, too. even provide a link to some frameworks for writing bots e.g. > > http://sourceforge.net/projects/pywikipediabot/ ah, that looks

Re: urllib behaves strangely

2006-06-13 Thread Duncan Booth
John J. Lee wrote: >> It looks like wikipedia checks the User-Agent header and refuses to >> send pages to browsers it doesn't like. Try: > [...] > > If wikipedia is trying to discourage this kind of scraping, it's > probably not polite to do it. (I don't know what wikipedia's policies > are, th

Re: urllib behaves strangely

2006-06-12 Thread John J. Lee
Duncan Booth <[EMAIL PROTECTED]> writes: > Gabriel Zachmann wrote: > > > Here is a very simple Python script utilizing urllib: [...] > > "http://commons.wikimedia.org/wiki/Commons:Featured_pictures/chronologi > > cal" > > print url > > print > > file = urllib.urlopen( url ) [...]

Re: urllib behaves strangely

2006-06-12 Thread Duncan Booth
Gabriel Zachmann wrote: > Here is a very simple Python script utilizing urllib: > > import urllib > url = > "http://commons.wikimedia.org/wiki/Commons:Featured_pictures/chronologi > cal" > print url > print > file = urllib.urlopen( url ) > mime = file.info() >

Re: urllib behaves strangely

2006-06-12 Thread John Hicken
Gabriel Zachmann wrote: > Here is a very simple Python script utilizing urllib: > > import urllib > url = > "http://commons.wikimedia.org/wiki/Commons:Featured_pictures/chronological"; > print url > print > file = urllib.urlopen( url ) > mime = file.info() > pri

Re: urllib behaves strangely

2006-06-12 Thread Benjamin Niemann
Benjamin Niemann wrote: > Gabriel Zachmann wrote: > >> Here is a very simple Python script utilizing urllib: >> >> import urllib >> url = >> "http://commons.wikimedia.org/wiki/Commons:Featured_pictures/chronological"; >> print url >> print >> file = urllib.urlopen( url )

Re: urllib behaves strangely

2006-06-12 Thread Benjamin Niemann
Gabriel Zachmann wrote: > Here is a very simple Python script utilizing urllib: > > import urllib > url = > "http://commons.wikimedia.org/wiki/Commons:Featured_pictures/chronological"; > print url > print > file = urllib.urlopen( url ) > mime = file.info() > pri

urllib behaves strangely

2006-06-12 Thread Gabriel Zachmann
Here is a very simple Python script utilizing urllib: import urllib url = "http://commons.wikimedia.org/wiki/Commons:Featured_pictures/chronological"; print url print file = urllib.urlopen( url ) mime = file.info() print mime print file.read() print fi