On Wed, Apr 1, 2009 at 8:25 AM, Gabriel Rossetti < gabriel.rosse...@arimaz.com> wrote:
> Hello everyone, > > I am using beautiful soup to parse some HTML and I came across something > strange. > Here is an illustration: > > >>> soup = BeautifulSoup(u'<div class="text">hello ça boume<br /></div') > >>> soup > <div class="text">hello ça boume<br /></div> > >>> soup.find("div", "text") > <div class="text">hello ça boume<br /></div> > >>> soup.find("div", "text").string > >>> soup.find("div", "text").next > u'hello \xe7a boume' > > why does soup.find("div", "text").string not give me the string? Is it > because there is a <br/>? IIRC, yes it is, and there's not much you can do about it other than use .next.string or .contents[0] or stripping out brs. See http://www.crummy.com/software/BeautifulSoup/documentation.html , particularly the "Removing Elements" and "string" sections.
-- http://mail.python.org/mailman/listinfo/python-list