Okay, now I ran in it the shell, and this is what happened: >>> for tabs in soup.findAll('table', {'class': 'luna-Ent'}): ... tabs.findAll('td')[-1].contents[-1].string ... u' ' u' ' u' ' u' ' u' ' u'not complex or compound; single. ' u' ' u' ' u' ' u' ' u' ' u'inconsequential or rudimentary. ' u'unlearned; ignorant. ' u' ' u'unsophisticated; naive; credulous. ' u' ' u'not mixed. ' u' ' u'not mixed. ' u' ' u' ' u' ' u' ' u'). ' u' ' u'(of a lens) having two optical surfaces only. ' u'an ignorant, foolish, or gullible person. ' u'something simple, unmixed, or uncompounded. ' u'cords for controlling the warp threads in forming the shed on draw-looms. ' u'a person of humble origins; commoner. ' u' ' >>>
However, the definitions are there. I printed the actual soup and they were there in the format they always were in. So what is the deal!?! >>> soup.findAll('table', {'class': 'luna-Ent'}) [<table class="luna-Ent"><tr><td valign="top" class="dn">1.</td><td valign="top">easy to understand, deal with, use, etc.: a simple matter; simple tools. </td></tr></table> See there is the first one in the shell, I mean it is there, but the for loop can't find it. I am wondering, because the above soup.findAll('table'..etc. makes it a list. Do you think that has anything to do with the problem? Alexnb wrote: > > Actually after looking at this, the code is preactically the same, except > the definitions. So what COULD be going wrong here? > > Also, I ran the program and decided to print the whole list of definitions > straight off BeautifulSoup, and I got an interesting result: > > What word would you like to define: simple > [u' ', u' ', u' ', u' ', u' ', u'not complex or compound; single. > > those are the first 5 definitions. and later on, it does the same thing. > it only sees a space, any ideas? > > Alexnb wrote: >> >> Okay, so i've hit a new snag and can't seem to figure out what is wrong. >> What is happening is the first 4 definitions of the word "simple" don't >> show up. The html is basicly the same, with the exception of noun turning >> into adj. Ill paste the html of the word cheese, and then the one for >> simple, and the code I am using to do the work. >> >> line of html for the 2nd def of cheese: >> >> <table class="luna-Ent"><tr><td valign="top" class="dn">2.</td><td >> valign="top">a definite mass of this substance, often in the shape of a >> wheel or cylinder. </td></tr></table> >> >> line of html for the 2nd def of simple: >> >> <table class="luna-Ent"><tr><td valign="top" class="dn">2.</td><td >> valign="top">not elaborate or artificial; plain: a simple style. >> </td></tr></table> >> >> code: >> >> import urllib >> from BeautifulSoup import BeautifulSoup >> >> >> def get_defs(term): >> soup = >> BeautifulSoup(urllib.urlopen('http://dictionary.reference.com/search?q=%s' >> % term)) >> >> for tabs in soup.findAll('table', {'class': 'luna-Ent'}): >> yield tabs.findAll('td')[-1].contents[-1].string >> >> word = raw_input("What word would you like to define: ") >> >> mainList = list(get_defs(word)) >> >> n=0 >> q = 1 >> >> for x in mainList: >> print str(q)+". "+str(mainList[n]) >> q=q+1 >> n=n+1 >> >> Now, I don't think it is the italics because one of the definitions that >> worked had them in it in the same format. Any Ideas??! >> >> >> Jeff McNeil-2 wrote: >>> >>> On Jun 29, 12:50 pm, Alexnb <[EMAIL PROTECTED]> wrote: >>>> No I figured it out. I guess I never knew that you aren't supposed to >>>> split a >>>> url like "http://www.goo\ >>>> gle.com" But I did and it gave me all those errors. Anyway, I had a >>>> question. On the original code you had this for loop: >>>> >>>> for tabs in soup.findAll('table', {'class': 'luna-Ent'}): >>>> yield tabs.findAll('td')[-1].contents[-1].string >>>> >>>> I hate to be a pain, but I was looking at the BeautifulSoup docs, and >>>> found >>>> the findAll thing. But I want to know why you put "for tabs," also why >>>> you >>>> need the "'table', {'class': 'luna-Ent'}):" Like why the curly braces >>>> and >>>> whatnot? >>>> >>>> Jeff McNeil-2 wrote: >>>> >>>> > On Jun 27, 10:26 pm, Alexnb <[EMAIL PROTECTED]> wrote: >>>> >> Okay, so I copied your code(and just so you know I am on a mac right >>>> now >>>> >> and >>>> >> i am using pydev in eclipse), and I got these errors, any idea what >>>> is >>>> >> up? >>>> >>>> >> Traceback (most recent call last): >>>> >> File >>>> >> "/Users/Alex/Documents/workspace/beautifulSoup/src/firstExample.py", >>>> >> line 14, in <module> >>>> >> print list(get_defs("cheese")) >>>> >> File >>>> >> "/Users/Alex/Documents/workspace/beautifulSoup/src/firstExample.py", >>>> >> line 9, in get_defs >>>> >> dictionary.reference.com/search?q=%s' % term)) >>>> >> File >>>> >> >>>> "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/url >>>> lib.py", >>>> >> line 82, in urlopen >>>> >> return opener.open(url) >>>> >> File >>>> >> >>>> "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/url >>>> lib.py", >>>> >> line 190, in open >>>> >> return getattr(self, name)(url) >>>> >> File >>>> >> >>>> "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/url >>>> lib.py", >>>> >> line 325, in open_http >>>> >> h.endheaders() >>>> >> File >>>> >> >>>> "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/htt >>>> plib.py", >>>> >> line 856, in endheaders >>>> >> self._send_output() >>>> >> File >>>> >> >>>> "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/htt >>>> plib.py", >>>> >> line 728, in _send_output >>>> >> self.send(msg) >>>> >> File >>>> >> >>>> "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/htt >>>> plib.py", >>>> >> line 695, in send >>>> >> self.connect() >>>> >> File >>>> >> >>>> "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/htt >>>> plib.py", >>>> >> line 663, in connect >>>> >> socket.SOCK_STREAM): >>>> >> IOError: [Errno socket error] (8, 'nodename nor servname provided, >>>> or not >>>> >> known') >>>> >>>> >> Sorry if it is hard to read. >>>> >>>> >> Jeff McNeil-2 wrote: >>>> >>>> >> > Well, what about pulling that data out using Beautiful soup? If >>>> you >>>> >> > know the table name and whatnot, try something like this: >>>> >>>> >> > #!/usr/bin/python >>>> >>>> >> > import urllib >>>> >> > from BeautifulSoup import BeautifulSoup >>>> >>>> >> > def get_defs(term): >>>> >> > soup = BeautifulSoup(urllib.urlopen('http:// >>>> >> > dictionary.reference.com/search?q=%s' % term)) >>>> >>>> >> > for tabs in soup.findAll('table', {'class': 'luna-Ent'}): >>>> >> > yield tabs.findAll('td')[-1].contents[-1].string >>>> >>>> >> > print list(get_defs("frog")) >>>> >>>> >> > [EMAIL PROTECTED]:~$ python test.py >>>> >> > [u'any tailless, stout-bodied amphibian of the order Anura, >>>> including >>>> >> > the smooth, moist-skinned frog species that live in a damp or >>>> >> > semiaquatic habitat and the warty, drier-skinned toad species that >>>> are >>>> >> > mostly terrestrial as adults. ', u' ', u' ', u'a French person or >>>> a >>>> >> > person of French descent. ', u'a small holder made of heavy >>>> material, >>>> >> > placed in a bowl or vase to hold flower stems in position. ', u'a >>>> >> > recessed panel on one of the larger faces of a brick or the like. >>>> ', >>>> >> > u' ', u'to hunt and catch frogs. ', u'French or Frenchlike. ', >>>> u'an >>>> >> > ornamental fastening for the front of a coat, consisting of a >>>> button >>>> >> > and a loop through which it passes. ', u'a sheath suspended from a >>>> >> > belt and supporting a scabbard. ', u'a device at the intersection >>>> of >>>> >> > two tracks to permit the wheels and flanges on one track to cross >>>> or >>>> >> > branch from the other. ', u'a triangular mass of elastic, horny >>>> >> > substance in the middle of the sole of the foot of a horse or >>>> related >>>> >> > animal. '] >>>> >>>> >> > HTH, >>>> >>>> >> > Jeff >>>> >>>> >> > On Jun 27, 7:28 pm, Alexnb <[EMAIL PROTECTED]> wrote: >>>> >> >> I have read that multiple times. It is hard to understand but it >>>> did >>>> >> help >>>> >> >> a >>>> >> >> little. But I found a bit of a work-around for now which is not >>>> what I >>>> >> >> ultimately want. However, even when I can get to the page I want >>>> lets >>>> >> >> say, >>>> >> >> "Http://dictionary.reference.com/browse/cheese", I look on >>>> firebug, >>>> >> and >>>> >> >> extension and see the definition in javascript, >>>> >>>> >> >> <table class="luna-Ent"> >>>> >> >> <tbody> >>>> >> >> <tr> >>>> >> >> <td class="dn" valign="top">1.</td> >>>> >> >> <td valign="top">the curd of milk separated from the whey and >>>> prepared >>>> >> in >>>> >> >> many ways as a food. </td> >>>> >>>> >> >> Jeff McNeil-2 wrote: >>>> >>>> >> >> > the problem being that if I use code like this to get the html >>>> of >>>> >> that >>>> >>>> >> >> > page in python: >>>> >>>> >> >> > response = urllib2.urlopen("the webiste....") >>>> >> >> > html = response.read() >>>> >> >> > print html >>>> >>>> >> >> > then, I get a bunch of stuff, but it doesn't show me the code >>>> with >>>> >> the >>>> >> >> > table that the definition is in. So I am asking how do I access >>>> this >>>> >> >> > javascript. Also, if someone could point me to a better >>>> reference >>>> >> than >>>> >> >> the >>>> >> >> > last one, because that really doesn't tell me much, whether it >>>> be a >>>> >> >> book >>>> >> >> > or anything. >>>> >>>> >> >> > I stumbled across this a while back: >>>> >> >> >http://www.voidspace.org.uk/python/articles/urllib2.shtml. >>>> >> >> > It covers quite a bit. The urllib2 module is pretty >>>> straightforward >>>> >> >> > once you've used it a few times. Some of the class naming and >>>> >> whatnot >>>> >> >> > takes a bit of getting used to (I found that to be the most >>>> >> confusing >>>> >> >> > bit). >>>> >>>> >> >> > On Jun 27, 1:41 pm, Alexnb <[EMAIL PROTECTED]> wrote: >>>> >> >> >> Okay, I tried to follow that, and it is kinda hard. But since >>>> you >>>> >> >> >> obviously >>>> >> >> >> know what you are doing, where did you learn this? Or where >>>> can I >>>> >> >> learn >>>> >> >> >> this? >>>> >>>> >> >> >> Maric Michaud wrote: >>>> >>>> >> >> >> > Le Friday 27 June 2008 10:43:06 Alexnb, vous avez écrit : >>>> >> >> >> >> I have never used the urllib or the urllib2. I really have >>>> >> looked >>>> >> >> >> online >>>> >> >> >> >> for help on this issue, and mailing lists, but I can't >>>> figure >>>> >> out >>>> >> >> my >>>> >> >> >> >> problem because people haven't been helping me, which is >>>> why I >>>> >> am >>>> >> >> >> here! >>>> >> >> >> >> :]. >>>> >> >> >> >> Okay, so basically I want to be able to submit a word to >>>> >> >> >> dictionary.com >>>> >> >> >> >> and >>>> >> >> >> >> then get the definitions. However, to start off learning >>>> >> urllib2, I >>>> >> >> >> just >>>> >> >> >> >> want to do a simple google search. Before you get mad, what >>>> I >>>> >> have >>>> >> >> >> found >>>> >> >> >> >> on >>>> >> >> >> >> urllib2 hasn't helped me. Anyway, How would you go about >>>> doing >>>> >> >> this. >>>> >> >> >> No, >>>> >> >> >> >> I >>>> >> >> >> >> did not post the html, but I mean if you want, right click >>>> on >>>> >> your >>>> >> >> >> >> browser >>>> >> >> >> >> and hit view source of the google homepage. Basically what >>>> I >>>> >> want >>>> >> >> to >>>> >> >> >> know >>>> >> >> >> >> is how to submit the values(the search term) and then >>>> search for >>>> >> >> that >>>> >> >> >> >> value. Heres what I know: >>>> >>>> >> >> >> >> import urllib2 >>>> >> >> >> >> response = urllib2.urlopen("http://www.google.com/") >>>> >> >> >> >> html = response.read() >>>> >> >> >> >> print html >>>> >>>> >> >> >> >> Now I know that all this does is print the source, but >>>> thats >>>> >> about >>>> >> >> all >>>> >> >> >> I >>>> >> >> >> >> know. I know it may be a lot to ask to have someone >>>> show/help >>>> >> me, >>>> >> >> but >>>> >> >> >> I >>>> >> >> >> >> really would appreciate it. >>>> >>>> >> >> >> > This example is for google, of course using pygoogle is >>>> easier in >>>> >> >> this >>>> >> >> >> > case, >>>> >> >> >> > but this is a valid example for the general case : >>>> >>>> >> >> >> >>>>[207]: import urllib, urllib2 >>>> >>>> >> >> >> > You need to trick the server with an imaginary User-Agent. >>>> >>>> >> >> >> >>>>[208]: def google_search(terms) : >>>> >> >> >> > return >>>> >> >> >> >>>> urllib2.urlopen(urllib2.Request("http://www.google.com/search?" >>>> >> >> >> > + >>>> >> >> >> > urllib.urlencode({'hl':'fr', 'q':terms}), >>>> >> >> >> > >>>> >> >> headers={'User-Agent':'MyNav >>>> >> >> >> > 1.0 >>>> >> >> >> > (compatible; MSIE 6.0; Linux'}) >>>> >> >> >> > ).read() >>>> >> >> >> > .....: >>>> >>>> >> >> >> >>>>[212]: res = google_search("python & co") >>>> >>>> >> >> >> > Now you got the whole html response, you'll have to parse it >>>> to >>>> >> >> recover >>>> >> >> >> > datas, >>>> >> >> >> > a quick & dirty try on google response page : >>>> >>>> >> >> >> >>>>[213]: import re >>>> >>>> >> >> >> >>>>[214]: [ re.sub('<.+?>', '', e) for e in re.findall('<h2 >>>> >> >> >> class=r>.*?</h2>', >>>> >> >> >> > res) ] >>>> >> >> >> > ...[229]: >>>> >> >> >> > ['Python Gallery', >>>> >> >> >> > 'Coffret Monty Python And Co 3 DVD : La Premi\xe8re folie >>>> des >>>> >> Monty >>>> >> >> >> ...', >>>> >> >> >> > 'Re: os x, panther, python & co: msg#00041', >>>> >> >> >> > 'Re: os x, panther, python & co: msg#00040', >>>> >> >> >> > 'Cardiff Web Site Design, Professional web site design >>>> services >>>> >> >> ...', >>>> >> >> >> > 'Python Properties', >>>> >> >> >> > 'Frees < Programs < Python < Bin-Co', >>>> >> >> >> > 'Torb: an interface between Tcl and CORBA', >>>> >> >> >> > 'Royal Python Morphs', >>>> >> >> >> > 'Python & Co'] >>>> >>>> >> >> >> > -- >>>> >> >> >> > _____________ >>>> >>>> >> >> >> > Maric Michaud >>>> >> >> >> > -- >>>> >> >> >> >http://mail.python.org/mailman/listinfo/python-list >>>> >>>> >> >> >> -- >>>> >> >> >> View this message in >>>> >>>> >> context:http://www.nabble.com/using-urllib2-tp18150669p18160312.html >>>> >> >> >> Sent from the Python - python-list mailing list archive at >>>> >> Nabble.com. >>>> >>>> >> >> > -- >>>> >> >> >http://mail.python.org/mailman/listinfo/python-list >>>> >>>> >> >> -- >>>> >> >> View this message in >>>> >> >> >>>> context:http://www.nabble.com/using-urllib2-tp18150669p18165634.html >>>> >> >> Sent from the Python - python-list mailing list archive at >>>> Nabble.com. >>>> >>>> >> > -- >>>> >> >http://mail.python.org/mailman/listinfo/python-list >>>> >>>> >> -- >>>> >> View this message in... >>>> >>>> read more » >>> >>> The definitions were embedded in tables with a 'luna-Ent' class. I >>> pulled all of the tables with that class out, and then returned the >>> string value of td containing the actual definition. The findAll >>> method takes an optional dictionary, thus the {}. >>> -- >>> http://mail.python.org/mailman/listinfo/python-list >>> >>> >> >> > > -- View this message in context: http://www.nabble.com/using-urllib2-tp18150669p18184788.html Sent from the Python - python-list mailing list archive at Nabble.com. -- http://mail.python.org/mailman/listinfo/python-list