Re: beautifulsoup .vs tidy

Paul Boddie Sat, 01 Jul 2006 09:45:45 -0700

Ravi Teja wrote:
>
> 1.) XPath is not a good idea at all with "malformed" HTML or perhaps
> web pages in general.


import libxml2dom
import urllib
f = urllib.urlopen("http://wiki.python.org/moin/";)
s = f.read()
f.close()
# s contains HTML not XML text
d = libxml2dom.parseString(s, html=1)
# get the community-related links
for label in d.xpath("//li[.//a/text() = 'Community']//li//a/text()"):
    print label.nodeValue

Of course, lxml should be able to do this kind of thing as well. I'd be
interested to know why this "is not a good idea", though.

Paul

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: beautifulsoup .vs tidy

Reply via email to