bruce wrote: > hi... > > never used perl, but i have an issue trying to resolve some html that > appears to be "dirty/malformed" regarding the overall structure. in > researching validators, i came across the beautifulsoup app and wanted to > know if anybody could give me pros/cons of the app as it relates to any of > the other validation apps... > > the issue i'm facing involves parsing some websites, so i'm trying to > extract information based on the DOM/XPath functions.. i'm using perl to > handle the extraction....
1.) XPath is not a good idea at all with "malformed" HTML or perhaps web pages in general. 2.) BeautifulSoup is not a validator but works well with bad HTML. Also look at Mechanize and ClientForm. 3.) XMLStarlet is a good XML validator (http://xmlstar.sourceforge.net/). It's not Python but you don't need to care about the language it is written in. 4.) For a simple HTML validator, Just use http://validator.w3.org/ -- http://mail.python.org/mailman/listinfo/python-list