Actually after looking at this, the code is preactically the same, except the definitions. So what COULD be going wrong here?
Alexnb wrote: > > Okay, so i've hit a new snag and can't seem to figure out what is wrong. > What is happening is the first 4 definitions of the word "simple" don't > show up. The html is basicly the same, with the exception of noun turning > into adj. Ill paste the html of the word cheese, and then the one for > simple, and the code I am using to do the work. > > line of html for the 2nd def of cheese: > > <table class="luna-Ent"><tr><td valign="top" class="dn">2.</td><td > valign="top">a definite mass of this substance, often in the shape of a > wheel or cylinder. </td></tr></table> > > line of html for the 2nd def of simple: > > <table class="luna-Ent"><tr><td valign="top" class="dn">2.</td><td > valign="top">not elaborate or artificial; plain: a simple style. > </td></tr></table> > > code: > > import urllib > from BeautifulSoup import BeautifulSoup > > > def get_defs(term): > soup = > BeautifulSoup(urllib.urlopen('http://dictionary.reference.com/search?q=%s' > % term)) > > for tabs in soup.findAll('table', {'class': 'luna-Ent'}): > yield tabs.findAll('td')[-1].contents[-1].string > > word = raw_input("What word would you like to define: ") > > mainList = list(get_defs(word)) > > n=0 > q = 1 > > for x in mainList: > print str(q)+". "+str(mainList[n]) > q=q+1 > n=n+1 > > Now, I don't think it is the italics because one of the definitions that > worked had them in it in the same format. Any Ideas??! > > > Jeff McNeil-2 wrote: >> >> On Jun 29, 12:50 pm, Alexnb <[EMAIL PROTECTED]> wrote: >>> No I figured it out. I guess I never knew that you aren't supposed to >>> split a >>> url like "http://www.goo\ >>> gle.com" But I did and it gave me all those errors. Anyway, I had a >>> question. On the original code you had this for loop: >>> >>> for tabs in soup.findAll('table', {'class': 'luna-Ent'}): >>> yield tabs.findAll('td')[-1].contents[-1].string >>> >>> I hate to be a pain, but I was looking at the BeautifulSoup docs, and >>> found >>> the findAll thing. But I want to know why you put "for tabs," also why >>> you >>> need the "'table', {'class': 'luna-Ent'}):" Like why the curly braces >>> and >>> whatnot? >>> >>> Jeff McNeil-2 wrote: >>> >>> > On Jun 27, 10:26 pm, Alexnb <[EMAIL PROTECTED]> wrote: >>> >> Okay, so I copied your code(and just so you know I am on a mac right >>> now >>> >> and >>> >> i am using pydev in eclipse), and I got these errors, any idea what >>> is >>> >> up? >>> >>> >> Traceback (most recent call last): >>> >> File >>> >> "/Users/Alex/Documents/workspace/beautifulSoup/src/firstExample.py", >>> >> line 14, in <module> >>> >> print list(get_defs("cheese")) >>> >> File >>> >> "/Users/Alex/Documents/workspace/beautifulSoup/src/firstExample.py", >>> >> line 9, in get_defs >>> >> dictionary.reference.com/search?q=%s' % term)) >>> >> File >>> >> >>> "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/url >>> lib.py", >>> >> line 82, in urlopen >>> >> return opener.open(url) >>> >> File >>> >> >>> "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/url >>> lib.py", >>> >> line 190, in open >>> >> return getattr(self, name)(url) >>> >> File >>> >> >>> "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/url >>> lib.py", >>> >> line 325, in open_http >>> >> h.endheaders() >>> >> File >>> >> >>> "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/htt >>> plib.py", >>> >> line 856, in endheaders >>> >> self._send_output() >>> >> File >>> >> >>> "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/htt >>> plib.py", >>> >> line 728, in _send_output >>> >> self.send(msg) >>> >> File >>> >> >>> "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/htt >>> plib.py", >>> >> line 695, in send >>> >> self.connect() >>> >> File >>> >> >>> "/System/Library/Frameworks/Python.framework/Versions/2.5/lib/python2.5/htt >>> plib.py", >>> >> line 663, in connect >>> >> socket.SOCK_STREAM): >>> >> IOError: [Errno socket error] (8, 'nodename nor servname provided, or >>> not >>> >> known') >>> >>> >> Sorry if it is hard to read. >>> >>> >> Jeff McNeil-2 wrote: >>> >>> >> > Well, what about pulling that data out using Beautiful soup? If you >>> >> > know the table name and whatnot, try something like this: >>> >>> >> > #!/usr/bin/python >>> >>> >> > import urllib >>> >> > from BeautifulSoup import BeautifulSoup >>> >>> >> > def get_defs(term): >>> >> > soup = BeautifulSoup(urllib.urlopen('http:// >>> >> > dictionary.reference.com/search?q=%s' % term)) >>> >>> >> > for tabs in soup.findAll('table', {'class': 'luna-Ent'}): >>> >> > yield tabs.findAll('td')[-1].contents[-1].string >>> >>> >> > print list(get_defs("frog")) >>> >>> >> > [EMAIL PROTECTED]:~$ python test.py >>> >> > [u'any tailless, stout-bodied amphibian of the order Anura, >>> including >>> >> > the smooth, moist-skinned frog species that live in a damp or >>> >> > semiaquatic habitat and the warty, drier-skinned toad species that >>> are >>> >> > mostly terrestrial as adults. ', u' ', u' ', u'a French person or a >>> >> > person of French descent. ', u'a small holder made of heavy >>> material, >>> >> > placed in a bowl or vase to hold flower stems in position. ', u'a >>> >> > recessed panel on one of the larger faces of a brick or the like. >>> ', >>> >> > u' ', u'to hunt and catch frogs. ', u'French or Frenchlike. ', u'an >>> >> > ornamental fastening for the front of a coat, consisting of a >>> button >>> >> > and a loop through which it passes. ', u'a sheath suspended from a >>> >> > belt and supporting a scabbard. ', u'a device at the intersection >>> of >>> >> > two tracks to permit the wheels and flanges on one track to cross >>> or >>> >> > branch from the other. ', u'a triangular mass of elastic, horny >>> >> > substance in the middle of the sole of the foot of a horse or >>> related >>> >> > animal. '] >>> >>> >> > HTH, >>> >>> >> > Jeff >>> >>> >> > On Jun 27, 7:28 pm, Alexnb <[EMAIL PROTECTED]> wrote: >>> >> >> I have read that multiple times. It is hard to understand but it >>> did >>> >> help >>> >> >> a >>> >> >> little. But I found a bit of a work-around for now which is not >>> what I >>> >> >> ultimately want. However, even when I can get to the page I want >>> lets >>> >> >> say, >>> >> >> "Http://dictionary.reference.com/browse/cheese", I look on >>> firebug, >>> >> and >>> >> >> extension and see the definition in javascript, >>> >>> >> >> <table class="luna-Ent"> >>> >> >> <tbody> >>> >> >> <tr> >>> >> >> <td class="dn" valign="top">1.</td> >>> >> >> <td valign="top">the curd of milk separated from the whey and >>> prepared >>> >> in >>> >> >> many ways as a food. </td> >>> >>> >> >> Jeff McNeil-2 wrote: >>> >>> >> >> > the problem being that if I use code like this to get the html >>> of >>> >> that >>> >>> >> >> > page in python: >>> >>> >> >> > response = urllib2.urlopen("the webiste....") >>> >> >> > html = response.read() >>> >> >> > print html >>> >>> >> >> > then, I get a bunch of stuff, but it doesn't show me the code >>> with >>> >> the >>> >> >> > table that the definition is in. So I am asking how do I access >>> this >>> >> >> > javascript. Also, if someone could point me to a better >>> reference >>> >> than >>> >> >> the >>> >> >> > last one, because that really doesn't tell me much, whether it >>> be a >>> >> >> book >>> >> >> > or anything. >>> >>> >> >> > I stumbled across this a while back: >>> >> >> >http://www.voidspace.org.uk/python/articles/urllib2.shtml. >>> >> >> > It covers quite a bit. The urllib2 module is pretty >>> straightforward >>> >> >> > once you've used it a few times. Some of the class naming and >>> >> whatnot >>> >> >> > takes a bit of getting used to (I found that to be the most >>> >> confusing >>> >> >> > bit). >>> >>> >> >> > On Jun 27, 1:41 pm, Alexnb <[EMAIL PROTECTED]> wrote: >>> >> >> >> Okay, I tried to follow that, and it is kinda hard. But since >>> you >>> >> >> >> obviously >>> >> >> >> know what you are doing, where did you learn this? Or where can >>> I >>> >> >> learn >>> >> >> >> this? >>> >>> >> >> >> Maric Michaud wrote: >>> >>> >> >> >> > Le Friday 27 June 2008 10:43:06 Alexnb, vous avez écrit : >>> >> >> >> >> I have never used the urllib or the urllib2. I really have >>> >> looked >>> >> >> >> online >>> >> >> >> >> for help on this issue, and mailing lists, but I can't >>> figure >>> >> out >>> >> >> my >>> >> >> >> >> problem because people haven't been helping me, which is why >>> I >>> >> am >>> >> >> >> here! >>> >> >> >> >> :]. >>> >> >> >> >> Okay, so basically I want to be able to submit a word to >>> >> >> >> dictionary.com >>> >> >> >> >> and >>> >> >> >> >> then get the definitions. However, to start off learning >>> >> urllib2, I >>> >> >> >> just >>> >> >> >> >> want to do a simple google search. Before you get mad, what >>> I >>> >> have >>> >> >> >> found >>> >> >> >> >> on >>> >> >> >> >> urllib2 hasn't helped me. Anyway, How would you go about >>> doing >>> >> >> this. >>> >> >> >> No, >>> >> >> >> >> I >>> >> >> >> >> did not post the html, but I mean if you want, right click >>> on >>> >> your >>> >> >> >> >> browser >>> >> >> >> >> and hit view source of the google homepage. Basically what I >>> >> want >>> >> >> to >>> >> >> >> know >>> >> >> >> >> is how to submit the values(the search term) and then search >>> for >>> >> >> that >>> >> >> >> >> value. Heres what I know: >>> >>> >> >> >> >> import urllib2 >>> >> >> >> >> response = urllib2.urlopen("http://www.google.com/") >>> >> >> >> >> html = response.read() >>> >> >> >> >> print html >>> >>> >> >> >> >> Now I know that all this does is print the source, but thats >>> >> about >>> >> >> all >>> >> >> >> I >>> >> >> >> >> know. I know it may be a lot to ask to have someone >>> show/help >>> >> me, >>> >> >> but >>> >> >> >> I >>> >> >> >> >> really would appreciate it. >>> >>> >> >> >> > This example is for google, of course using pygoogle is >>> easier in >>> >> >> this >>> >> >> >> > case, >>> >> >> >> > but this is a valid example for the general case : >>> >>> >> >> >> >>>>[207]: import urllib, urllib2 >>> >>> >> >> >> > You need to trick the server with an imaginary User-Agent. >>> >>> >> >> >> >>>>[208]: def google_search(terms) : >>> >> >> >> > return >>> >> >> >> urllib2.urlopen(urllib2.Request("http://www.google.com/search?" >>> >> >> >> > + >>> >> >> >> > urllib.urlencode({'hl':'fr', 'q':terms}), >>> >> >> >> > >>> >> >> headers={'User-Agent':'MyNav >>> >> >> >> > 1.0 >>> >> >> >> > (compatible; MSIE 6.0; Linux'}) >>> >> >> >> > ).read() >>> >> >> >> > .....: >>> >>> >> >> >> >>>>[212]: res = google_search("python & co") >>> >>> >> >> >> > Now you got the whole html response, you'll have to parse it >>> to >>> >> >> recover >>> >> >> >> > datas, >>> >> >> >> > a quick & dirty try on google response page : >>> >>> >> >> >> >>>>[213]: import re >>> >>> >> >> >> >>>>[214]: [ re.sub('<.+?>', '', e) for e in re.findall('<h2 >>> >> >> >> class=r>.*?</h2>', >>> >> >> >> > res) ] >>> >> >> >> > ...[229]: >>> >> >> >> > ['Python Gallery', >>> >> >> >> > 'Coffret Monty Python And Co 3 DVD : La Premi\xe8re folie >>> des >>> >> Monty >>> >> >> >> ...', >>> >> >> >> > 'Re: os x, panther, python & co: msg#00041', >>> >> >> >> > 'Re: os x, panther, python & co: msg#00040', >>> >> >> >> > 'Cardiff Web Site Design, Professional web site design >>> services >>> >> >> ...', >>> >> >> >> > 'Python Properties', >>> >> >> >> > 'Frees < Programs < Python < Bin-Co', >>> >> >> >> > 'Torb: an interface between Tcl and CORBA', >>> >> >> >> > 'Royal Python Morphs', >>> >> >> >> > 'Python & Co'] >>> >>> >> >> >> > -- >>> >> >> >> > _____________ >>> >>> >> >> >> > Maric Michaud >>> >> >> >> > -- >>> >> >> >> >http://mail.python.org/mailman/listinfo/python-list >>> >>> >> >> >> -- >>> >> >> >> View this message in >>> >>> >> context:http://www.nabble.com/using-urllib2-tp18150669p18160312.html >>> >> >> >> Sent from the Python - python-list mailing list archive at >>> >> Nabble.com. >>> >>> >> >> > -- >>> >> >> >http://mail.python.org/mailman/listinfo/python-list >>> >>> >> >> -- >>> >> >> View this message in >>> >> >> >>> context:http://www.nabble.com/using-urllib2-tp18150669p18165634.html >>> >> >> Sent from the Python - python-list mailing list archive at >>> Nabble.com. >>> >>> >> > -- >>> >> >http://mail.python.org/mailman/listinfo/python-list >>> >>> >> -- >>> >> View this message in... >>> >>> read more » >> >> The definitions were embedded in tables with a 'luna-Ent' class. I >> pulled all of the tables with that class out, and then returned the >> string value of td containing the actual definition. The findAll >> method takes an optional dictionary, thus the {}. >> -- >> http://mail.python.org/mailman/listinfo/python-list >> >> > > -- View this message in context: http://www.nabble.com/using-urllib2-tp18150669p18184170.html Sent from the Python - python-list mailing list archive at Nabble.com. -- http://mail.python.org/mailman/listinfo/python-list