On 11-12-2010 17:24, Martin Kaspar wrote: > Hello commnity > > i am new to Python and to Beatiful Soup also! > It is told to be a great tool to parse and extract content. So here i > am...: > > I want to take the content of a <td>-tag of a table in a html > document. For example, i have this table > > <table class="bp_ergebnis_tab_info"> > <tr> > <td> > This is a sample text > </td> > > <td> > This is the second sample text > </td> > </tr> > </table> > > How can i use beautifulsoup to take the text "This is a sample text"? > > Should i make use > soup.findAll('table' ,attrs={'class':'bp_ergebnis_tab_info'}) to get > the whole table. > > See the target > http://www.schulministerium.nrw.de/BP/SchuleSuchen?action=799.601437941842&SchulAdresseMapDO=142323 > > Well - what have we to do first: > > The first thing is t o find the table: > > i do this with Using find rather than findall returns the first item > in the list > (rather than returning a list of all finds - in which case we'd have > to add an extra [0] > to take the first element of the list): > > > table = soup.find('table' ,attrs={'class':'bp_ergebnis_tab_info'}) > > Then use find again to find the first td: > > first_td = soup.find('td') > > Then we have to use renderContents() to extract the textual contents: > > text = first_td.renderContents() > > ... and the job is done (though we may also want to use strip() to > remove leading and trailing spaces: > > trimmed_text = text.strip() > > This should give us: > > > print trimmed_text > This is a sample text > > as desired. > > > What do you think about the code? I love to hear from you!? I've no opinion. I'm just struggling with BeautifulSoup myself, finding it one of the toughest libs I've seen ;-)
So the simplest solution I came up with: Text = """ <table class="bp_ergebnis_tab_info"> <tr> <td> This is a sample text </td> <td> This is the second sample text </td> </tr> </table> """ Content = BeautifulSoup ( Text ) print Content.find('td').contents[0].strip() >>> This is a sample text And now I wonder how to get the next contents !! cheers, Stef > greetings > matze -- http://mail.python.org/mailman/listinfo/python-list