How do I correctly download Wikipedia pages?

Steven D'Aprano Wed, 25 Nov 2009 19:52:32 -0800

I'm trying to scrape a Wikipedia page from Python. Following instructions 
here:


http://en.wikipedia.org/wiki/Wikipedia:Database_download
http://en.wikipedia.org/wiki/Special:Export

I use the URL "http://en.wikipedia.org/wiki/Special:Export/Train"; instead 
of just "http://en.wikipedia.org/wiki/Train";. But instead of getting the 
page I expect, and can see in my browser, I get an error page:


>>> import urllib
>>> url = "http://en.wikipedia.org/wiki/Special:Export/Train";
>>> print urllib.urlopen(url).read()
...
Our servers are currently experiencing a technical problem. This is 
probably temporary and should be fixed soon
...


(Output is obviously truncated for your sanity and mine.)


Is there a trick to downloading from Wikipedia with urllib?



-- 
Steven
-- 
http://mail.python.org/mailman/listinfo/python-list

How do I correctly download Wikipedia pages?

Reply via email to