On Sep 20, 4:12 pm, Tobiah <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] wrote: > > I need to extract the number on each <td tags from a html file. > > > i.e 49.950 from the following: > > > <td align=right width=80><font size=2 face="New Times > > Roman,Times,Serif"> 49.950 </font></td> > > > The actual number between: 49.950 can be any number of > > digits before decimal and after decimal. > > > <td align=right width=80><font size=2 face="New Times > > Roman,Times,Serif"> ######.#### </font></td> > > > How can I just extract the real/integer number using regex? > > '[0-9]*\.[0-9]*' > > -- > Posted via a free Usenet account fromhttp://www.teranews.com
I am trying to use BeautifulSoup: soup = BeautifulSoup(page) td_tags = soup.findAll('td') i=0 for td in td_tags: i = i+1 print "td: ", td # re.search('[0-9]*\.[0-9]*', td) price = re.compile('[0-9]*\.[0-9]*').search(td) I am getting an error: price= re.compile('[0-9]*\.[0-9]*').search(td) TypeError: expected string or buffer Does beautiful soup returns array of objects? If so, how do I pass "td" instance as string to re.search? What is the different between re.search vs re.compile().search? -- http://mail.python.org/mailman/listinfo/python-list