Rene Pijlman wrote: > Lawrence D'Oliveiro: >> I've been using HTMLParser to scrape Web sites. The trouble with this >> is, there's a lot of malformed HTML out there. Real browsers have to be >> written to cope gracefully with this, but HTMLParser does not. > > There are two solutions to this: > > 1. Tidy the source before parsing it. > http://www.egenix.com/files/python/mxTidy.html > > 2. Use something more foregiving, like BeautifulSoup. > http://www.crummy.com/software/BeautifulSoup/
You can also use the HTML parser from libxml2 or any of the available wrappers for it. Bye, Walter Dörwald -- http://mail.python.org/mailman/listinfo/python-list