On Jun 29, 12:50 pm, Alexnb <[EMAIL PROTECTED]> wrote: > No I figured it out. I guess I never knew that you aren't supposed to split a > url like "http://www.goo\ > gle.com" But I did and it gave me all those errors. Anyway, I had a > question. On the original code you had this for loop: > > for tabs in soup.findAll('table', {'class': 'luna-Ent'}): > yield tabs.findAll('td')[-1].contents[-1].string > > I hate to be a pain, but I was looking at the BeautifulSoup docs, and found > the findAll thing. But I want to know why you put "for tabs," also why you > need the "'table', {'class': 'luna-Ent'}):" Like why the curly braces and > whatnot? > > Jeff McNeil-2 wrote: > > > On Jun 27, 10:26 pm, Alexnb <[EMAIL PROTECTED]> wrote: > >> Okay, so I copied your code(and just so you know I am on a mac right now > >> and > >> i am using pydev in eclipse), and I got these errors, any idea what is > >> up? > > >> Traceback (most recent call last): > >> File > >> "/Users/Alex/Documents/workspace/beautifulSoup/src/firstExample.py", > >> line 14, in <module> > >> print list(get_defs("cheese")) > >> File > >> "/Users/Alex/Documents/workspace/beautifulSoup/src/firstExample.py", > >> line 9, in get_defs > >> dictionary.reference.com/search?q=%s' % term)) > >> File > >> "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/url > >> lib.py", > >> line 82, in urlopen > >> return opener.open(url) > >> File > >> "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/url > >> lib.py", > >> line 190, in open > >> return getattr(self, name)(url) > >> File > >> "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/url > >> lib.py", > >> line 325, in open_http > >> h.endheaders() > >> File > >> "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/htt > >> plib.py", > >> line 856, in endheaders > >> self._send_output() > >> File > >> "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/htt > >> plib.py", > >> line 728, in _send_output > >> self.send(msg) > >> File > >> "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/htt > >> plib.py", > >> line 695, in send > >> self.connect() > >> File > >> "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/htt > >> plib.py", > >> line 663, in connect > >> socket.SOCK_STREAM): > >> IOError: [Errno socket error] (8, 'nodename nor servname provided, or not > >> known') > > >> Sorry if it is hard to read. > > >> Jeff McNeil-2 wrote: > > >> > Well, what about pulling that data out using Beautiful soup? If you > >> > know the table name and whatnot, try something like this: > > >> > #!/usr/bin/python > > >> > import urllib > >> > from BeautifulSoup import BeautifulSoup > > >> > def get_defs(term): > >> > soup = BeautifulSoup(urllib.urlopen('http:// > >> > dictionary.reference.com/search?q=%s' % term)) > > >> > for tabs in soup.findAll('table', {'class': 'luna-Ent'}): > >> > yield tabs.findAll('td')[-1].contents[-1].string > > >> > print list(get_defs("frog")) > > >> > [EMAIL PROTECTED]:~$ python test.py > >> > [u'any tailless, stout-bodied amphibian of the order Anura, including > >> > the smooth, moist-skinned frog species that live in a damp or > >> > semiaquatic habitat and the warty, drier-skinned toad species that are > >> > mostly terrestrial as adults. ', u' ', u' ', u'a French person or a > >> > person of French descent. ', u'a small holder made of heavy material, > >> > placed in a bowl or vase to hold flower stems in position. ', u'a > >> > recessed panel on one of the larger faces of a brick or the like. ', > >> > u' ', u'to hunt and catch frogs. ', u'French or Frenchlike. ', u'an > >> > ornamental fastening for the front of a coat, consisting of a button > >> > and a loop through which it passes. ', u'a sheath suspended from a > >> > belt and supporting a scabbard. ', u'a device at the intersection of > >> > two tracks to permit the wheels and flanges on one track to cross or > >> > branch from the other. ', u'a triangular mass of elastic, horny > >> > substance in the middle of the sole of the foot of a horse or related > >> > animal. '] > > >> > HTH, > > >> > Jeff > > >> > On Jun 27, 7:28 pm, Alexnb <[EMAIL PROTECTED]> wrote: > >> >> I have read that multiple times. It is hard to understand but it did > >> help > >> >> a > >> >> little. But I found a bit of a work-around for now which is not what I > >> >> ultimately want. However, even when I can get to the page I want lets > >> >> say, > >> >> "Http://dictionary.reference.com/browse/cheese", I look on firebug, > >> and > >> >> extension and see the definition in javascript, > > >> >> <table class="luna-Ent"> > >> >> <tbody> > >> >> <tr> > >> >> <td class="dn" valign="top">1.</td> > >> >> <td valign="top">the curd of milk separated from the whey and prepared > >> in > >> >> many ways as a food. </td> > > >> >> Jeff McNeil-2 wrote: > > >> >> > the problem being that if I use code like this to get the html of > >> that > > >> >> > page in python: > > >> >> > response = urllib2.urlopen("the webiste....") > >> >> > html = response.read() > >> >> > print html > > >> >> > then, I get a bunch of stuff, but it doesn't show me the code with > >> the > >> >> > table that the definition is in. So I am asking how do I access this > >> >> > javascript. Also, if someone could point me to a better reference > >> than > >> >> the > >> >> > last one, because that really doesn't tell me much, whether it be a > >> >> book > >> >> > or anything. > > >> >> > I stumbled across this a while back: > >> >> >http://www.voidspace.org.uk/python/articles/urllib2.shtml. > >> >> > It covers quite a bit. The urllib2 module is pretty straightforward > >> >> > once you've used it a few times. Some of the class naming and > >> whatnot > >> >> > takes a bit of getting used to (I found that to be the most > >> confusing > >> >> > bit). > > >> >> > On Jun 27, 1:41 pm, Alexnb <[EMAIL PROTECTED]> wrote: > >> >> >> Okay, I tried to follow that, and it is kinda hard. But since you > >> >> >> obviously > >> >> >> know what you are doing, where did you learn this? Or where can I > >> >> learn > >> >> >> this? > > >> >> >> Maric Michaud wrote: > > >> >> >> > Le Friday 27 June 2008 10:43:06 Alexnb, vous avez écrit : > >> >> >> >> I have never used the urllib or the urllib2. I really have > >> looked > >> >> >> online > >> >> >> >> for help on this issue, and mailing lists, but I can't figure > >> out > >> >> my > >> >> >> >> problem because people haven't been helping me, which is why I > >> am > >> >> >> here! > >> >> >> >> :]. > >> >> >> >> Okay, so basically I want to be able to submit a word to > >> >> >> dictionary.com > >> >> >> >> and > >> >> >> >> then get the definitions. However, to start off learning > >> urllib2, I > >> >> >> just > >> >> >> >> want to do a simple google search. Before you get mad, what I > >> have > >> >> >> found > >> >> >> >> on > >> >> >> >> urllib2 hasn't helped me. Anyway, How would you go about doing > >> >> this. > >> >> >> No, > >> >> >> >> I > >> >> >> >> did not post the html, but I mean if you want, right click on > >> your > >> >> >> >> browser > >> >> >> >> and hit view source of the google homepage. Basically what I > >> want > >> >> to > >> >> >> know > >> >> >> >> is how to submit the values(the search term) and then search for > >> >> that > >> >> >> >> value. Heres what I know: > > >> >> >> >> import urllib2 > >> >> >> >> response = urllib2.urlopen("http://www.google.com/") > >> >> >> >> html = response.read() > >> >> >> >> print html > > >> >> >> >> Now I know that all this does is print the source, but thats > >> about > >> >> all > >> >> >> I > >> >> >> >> know. I know it may be a lot to ask to have someone show/help > >> me, > >> >> but > >> >> >> I > >> >> >> >> really would appreciate it. > > >> >> >> > This example is for google, of course using pygoogle is easier in > >> >> this > >> >> >> > case, > >> >> >> > but this is a valid example for the general case : > > >> >> >> >>>>[207]: import urllib, urllib2 > > >> >> >> > You need to trick the server with an imaginary User-Agent. > > >> >> >> >>>>[208]: def google_search(terms) : > >> >> >> > return > >> >> >> urllib2.urlopen(urllib2.Request("http://www.google.com/search?" > >> >> >> > + > >> >> >> > urllib.urlencode({'hl':'fr', 'q':terms}), > >> >> >> > > >> >> headers={'User-Agent':'MyNav > >> >> >> > 1.0 > >> >> >> > (compatible; MSIE 6.0; Linux'}) > >> >> >> > ).read() > >> >> >> > .....: > > >> >> >> >>>>[212]: res = google_search("python & co") > > >> >> >> > Now you got the whole html response, you'll have to parse it to > >> >> recover > >> >> >> > datas, > >> >> >> > a quick & dirty try on google response page : > > >> >> >> >>>>[213]: import re > > >> >> >> >>>>[214]: [ re.sub('<.+?>', '', e) for e in re.findall('<h2 > >> >> >> class=r>.*?</h2>', > >> >> >> > res) ] > >> >> >> > ...[229]: > >> >> >> > ['Python Gallery', > >> >> >> > 'Coffret Monty Python And Co 3 DVD : La Premi\xe8re folie des > >> Monty > >> >> >> ...', > >> >> >> > 'Re: os x, panther, python & co: msg#00041', > >> >> >> > 'Re: os x, panther, python & co: msg#00040', > >> >> >> > 'Cardiff Web Site Design, Professional web site design services > >> >> ...', > >> >> >> > 'Python Properties', > >> >> >> > 'Frees < Programs < Python < Bin-Co', > >> >> >> > 'Torb: an interface between Tcl and CORBA', > >> >> >> > 'Royal Python Morphs', > >> >> >> > 'Python & Co'] > > >> >> >> > -- > >> >> >> > _____________ > > >> >> >> > Maric Michaud > >> >> >> > -- > >> >> >> >http://mail.python.org/mailman/listinfo/python-list > > >> >> >> -- > >> >> >> View this message in > > >> context:http://www.nabble.com/using-urllib2-tp18150669p18160312.html > >> >> >> Sent from the Python - python-list mailing list archive at > >> Nabble.com. > > >> >> > -- > >> >> >http://mail.python.org/mailman/listinfo/python-list > > >> >> -- > >> >> View this message in > >> >> context:http://www.nabble.com/using-urllib2-tp18150669p18165634.html > >> >> Sent from the Python - python-list mailing list archive at Nabble.com. > > >> > -- > >> >http://mail.python.org/mailman/listinfo/python-list > > >> -- > >> View this message in... > > read more »
The definitions were embedded in tables with a 'luna-Ent' class. I pulled all of the tables with that class out, and then returned the string value of td containing the actual definition. The findAll method takes an optional dictionary, thus the {}. -- http://mail.python.org/mailman/listinfo/python-list