> headers = {}
> headers['User-Agent'] = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB;
> rv:1.8.0.4) Gecko/20060508 Firefox/1.5.0.4'
>
> request = urllib2.Request(url, headers)
> file = urllib2.urlopen(request)
ah, thanks a lot, that works !
Best regards,
Gabriel.
--
/
> On the other hand something which is simply retrieving one or two fixed
> pages doesn't fit that definition of a bot so is probably alright. They
i think so, too.
even provide a link to some frameworks for writing bots e.g.
>
> http://sourceforge.net/projects/pywikipediabot/
ah, that looks
John J. Lee wrote:
>> It looks like wikipedia checks the User-Agent header and refuses to
>> send pages to browsers it doesn't like. Try:
> [...]
>
> If wikipedia is trying to discourage this kind of scraping, it's
> probably not polite to do it. (I don't know what wikipedia's policies
> are, th
Duncan Booth <[EMAIL PROTECTED]> writes:
> Gabriel Zachmann wrote:
>
> > Here is a very simple Python script utilizing urllib:
[...]
> > "http://commons.wikimedia.org/wiki/Commons:Featured_pictures/chronologi
> > cal"
> > print url
> > print
> > file = urllib.urlopen( url )
[...]
Gabriel Zachmann wrote:
> Here is a very simple Python script utilizing urllib:
>
> import urllib
> url =
> "http://commons.wikimedia.org/wiki/Commons:Featured_pictures/chronologi
> cal"
> print url
> print
> file = urllib.urlopen( url )
> mime = file.info()
>
Gabriel Zachmann wrote:
> Here is a very simple Python script utilizing urllib:
>
> import urllib
> url =
> "http://commons.wikimedia.org/wiki/Commons:Featured_pictures/chronological";
> print url
> print
> file = urllib.urlopen( url )
> mime = file.info()
> pri
Benjamin Niemann wrote:
> Gabriel Zachmann wrote:
>
>> Here is a very simple Python script utilizing urllib:
>>
>> import urllib
>> url =
>> "http://commons.wikimedia.org/wiki/Commons:Featured_pictures/chronological";
>> print url
>> print
>> file = urllib.urlopen( url )
Gabriel Zachmann wrote:
> Here is a very simple Python script utilizing urllib:
>
> import urllib
> url =
> "http://commons.wikimedia.org/wiki/Commons:Featured_pictures/chronological";
> print url
> print
> file = urllib.urlopen( url )
> mime = file.info()
> pri
Here is a very simple Python script utilizing urllib:
import urllib
url =
"http://commons.wikimedia.org/wiki/Commons:Featured_pictures/chronological";
print url
print
file = urllib.urlopen( url )
mime = file.info()
print mime
print file.read()
print fi