I know one more python app that do the same thing http://www.icir.org/christian/downloads/scholar.py
and few other app(Mendeley desktop) for which I found an explanation: (from http://academia.stackexchange.com/questions/2567/api-eula-and-scraping-for-google-scholar ) that: "I know how Mendley uses it: they require you to click a button for each individual search of Google Scholar. If they automatically did the Google Scholar meta-data search for each paper when you import a folder-full then they would violate the old Scholar EULA. That is why they make you click for each query: if each query is accompanied by a click and not part of some script or loop then it is in compliance with the old EULA." So, If I manage to use the User-Agent as shown by you, will I still violating the google EULA? This is my first try of scrapping HTML. So please help On Mon, 2012-10-01 at 16:51 +0000, Nick Cash wrote: > > urllib2.urlopen('http://scholar.google.co.uk/scholar?q=albert > >... > > urllib2.HTTPError: HTTP Error 403: Forbidden > > >>> > > > > Will you kindly explain me the way to get rid of this? > > Looks like Google blocks non-browser user agents from retrieving this query. > You *could* work around it by setting the User-Agent header to something fake > that looks browser-ish, but you're almost certainly breaking Google's TOS if > you do so. > > Should you really really want to, urllib2 makes it easy: > urllib2.urlopen(urllib2.Request("http://scholar.google.co.uk/scholar?q=albert+einstein%2B1905&btnG=&hl=en&as_sdt=0%2C5&as_sdtp=", > headers={"User-Agent":"Mozilla/5.0 Cheater/1.0"})) > > -Nick Cash -- http://mail.python.org/mailman/listinfo/python-list