Re: Parsing HTML?

Benjamin Sat, 26 Apr 2008 14:31:41 -0700

On Apr 6, 11:03 pm, Stefan Behnel <[EMAIL PROTECTED]> wrote:
> Benjamin wrote:
> > I'm trying to parse an HTML file.  I want to retrieve all of the text
> > inside a certain tag that I find with XPath.  The DOM seems to make
> > this available with the innerHTML element, but I haven't found a way
> > to do it in Python.
>
>     import lxml.html as h
>     tree = h.parse("somefile.html")
>     text = tree.xpath("string( some/[EMAIL PROTECTED] )")
>
> http://codespeak.net/lxml
>
> Stefan


I actually had trouble getting this to work.  I guess only new version
of lxml have the html module, and I couldn't get it installed.  lxml
does look pretty cool, though.
--
http://mail.python.org/mailman/listinfo/python-list

Re: Parsing HTML?

Reply via email to