Re: Beautiful soup : why does "string" not give me the string?

Gabriel Rossetti Wed, 01 Apr 2009 05:16:52 -0700

Jeremiah Dodds wrote:

On Wed, Apr 1, 2009 at 8:25 AM, Gabriel Rossetti<gabriel.rosse...@arimaz.com <mailto:gabriel.rosse...@arimaz.com>> wrote:
    Hello everyone,

    I am using beautiful soup to parse some HTML and I came across
    something strange.
    Here is an illustration:

    >>> soup = BeautifulSoup(u'<div class="text">hello ça boume<br
    /></div')
    >>> soup
    <div class="text">hello ça boume<br /></div>
    >>> soup.find("div", "text")
    <div class="text">hello ça boume<br /></div>
    >>> soup.find("div", "text").string
    >>> soup.find("div", "text").next
    u'hello \xe7a boume'

    why does soup.find("div", "text").string not give me the string?
    Is it because there is a <br/>?
IIRC, yes it is, and there's not much you can do about it other thanuse .next.string or .contents[0] or stripping out brs. Seehttp://www.crummy.com/software/BeautifulSoup/documentation.html ,particularly the "Removing Elements" and "string" sections.

Ok, thanks, I also found that I can do this :

   soup.find(text=lambda t: isinstance(t, basestring))

or this:

   soup.find(text=True)

it seems faster than doing this :

   [br.extract() for br in soup.findAll("br")]
   soup.string

but I may be wrong.

Thanks again!
Gabriel
--
http://mail.python.org/mailman/listinfo/python-list

Re: Beautiful soup : why does "string" not give me the string?

Reply via email to