On Wed, Apr 1, 2009 at 8:25 AM, Gabriel Rossetti <
gabriel.rosse...@arimaz.com> wrote:

> Hello everyone,
>
> I am using beautiful soup to parse some HTML and I came across something
> strange.
> Here is an illustration:
>
> >>> soup = BeautifulSoup(u'<div class="text">hello ça boume<br /></div')
> >>> soup
> <div class="text">hello ça boume<br /></div>
> >>> soup.find("div", "text")
> <div class="text">hello ça boume<br /></div>
> >>> soup.find("div", "text").string
> >>> soup.find("div", "text").next
> u'hello \xe7a boume'
>
> why does soup.find("div", "text").string not give me the string? Is it
> because there is a <br/>?


IIRC, yes it is, and there's not much you can do about it other than  use
.next.string or .contents[0]  or stripping out brs. See
http://www.crummy.com/software/BeautifulSoup/documentation.html ,
particularly the "Removing Elements" and "string" sections.
--
http://mail.python.org/mailman/listinfo/python-list

Reply via email to