On Apr 6, 11:03 pm, Stefan Behnel <[EMAIL PROTECTED]> wrote: > Benjamin wrote: > > I'm trying to parse an HTML file. I want to retrieve all of the text > > inside a certain tag that I find with XPath. The DOM seems to make > > this available with the innerHTML element, but I haven't found a way > > to do it in Python. > > import lxml.html as h > tree = h.parse("somefile.html") > text = tree.xpath("string( some/[EMAIL PROTECTED] )") > > http://codespeak.net/lxml > > Stefan
I actually had trouble getting this to work. I guess only new version of lxml have the html module, and I couldn't get it installed. lxml does look pretty cool, though. -- http://mail.python.org/mailman/listinfo/python-list