bruce wrote:
> hi...
>
> never used perl, but i have an issue trying to resolve some html that
> appears to be "dirty/malformed" regarding the overall structure. in
> researching validators, i came across the beautifulsoup app and wanted to
> know if anybody could give me pros/cons of the app as it relates to any of
> the other validation apps...
>
> the issue i'm facing involves parsing some websites, so i'm trying to
> extract information based on the DOM/XPath functions.. i'm using perl to
> handle the extraction....

1.) XPath is not a good idea at all with "malformed" HTML or perhaps
web pages in general.
2.) BeautifulSoup is not a validator but works well with bad HTML. Also
look at Mechanize and ClientForm.
3.) XMLStarlet is a good XML validator
(http://xmlstar.sourceforge.net/). It's not Python but you don't need
to care about the language it is written in.
4.) For a simple HTML validator, Just use http://validator.w3.org/

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to