On Jun 18, 5:41 pm, someone <petshm...@googlemail.com> wrote: > Hello, > > does anyone know how to get html contents of an tag with > BeautifulSoup? In example I'd like to get all html which is in first > <p> tag, i.e. <span id="foo">This is paragraph</span> <b>one</b>. as > unicode object > > p.contents gives me a list which I cannot join TypeError: sequence > item 0: expected string, Tag found > > Thanks! > > from BeautifulSoup import BeautifulSoup > import re > > doc = ['<html><head><title>Page title</title></head>', > '<body><p id="firstpara" align="center"><span id="foo">This is > paragraph</span> <b>one</b>.</p>', > '<p id="secondpara" align="blah">This is paragraph <b>two</b>.</ > p>', > '</body></html>'] > soup = BeautifulSoup(''.join(doc)) > #print soup.prettify() > r = re.compile(r'<[^<]*?/?>') > for i, p in enumerate(soup.findAll('p')): > #print type(p) #<class 'BeautifulSoup.Tag'> > #print type(p.contents) #list > content = "".join(p.contents) #fails > > p_without_html = r.sub(' ', content) > print p_without_html
p.renderContents() was what I've looked for -- http://mail.python.org/mailman/listinfo/python-list