Le Friday 27 June 2008 10:43:06 Alexnb, vous avez écrit : > I have never used the urllib or the urllib2. I really have looked online > for help on this issue, and mailing lists, but I can't figure out my > problem because people haven't been helping me, which is why I am here! :]. > Okay, so basically I want to be able to submit a word to dictionary.com and > then get the definitions. However, to start off learning urllib2, I just > want to do a simple google search. Before you get mad, what I have found on > urllib2 hasn't helped me. Anyway, How would you go about doing this. No, I > did not post the html, but I mean if you want, right click on your browser > and hit view source of the google homepage. Basically what I want to know > is how to submit the values(the search term) and then search for that > value. Heres what I know: > > import urllib2 > response = urllib2.urlopen("http://www.google.com/") > html = response.read() > print html > > Now I know that all this does is print the source, but thats about all I > know. I know it may be a lot to ask to have someone show/help me, but I > really would appreciate it.
This example is for google, of course using pygoogle is easier in this case, but this is a valid example for the general case : >>>[207]: import urllib, urllib2 You need to trick the server with an imaginary User-Agent. >>>[208]: def google_search(terms) : return urllib2.urlopen(urllib2.Request("http://www.google.com/search?" + urllib.urlencode({'hl':'fr', 'q':terms}), headers={'User-Agent':'MyNav 1.0 (compatible; MSIE 6.0; Linux'}) ).read() .....: >>>[212]: res = google_search("python & co") Now you got the whole html response, you'll have to parse it to recover datas, a quick & dirty try on google response page : >>>[213]: import re >>>[214]: [ re.sub('<.+?>', '', e) for e in re.findall('<h2 class=r>.*?</h2>', res) ] ...[229]: ['Python Gallery', 'Coffret Monty Python And Co 3 DVD : La Premi\xe8re folie des Monty ...', 'Re: os x, panther, python & co: msg#00041', 'Re: os x, panther, python & co: msg#00040', 'Cardiff Web Site Design, Professional web site design services ...', 'Python Properties', 'Frees < Programs < Python < Bin-Co', 'Torb: an interface between Tcl and CORBA', 'Royal Python Morphs', 'Python & Co'] -- _____________ Maric Michaud -- http://mail.python.org/mailman/listinfo/python-list