Dan Stromberg wrote: > Has anyone tried to construct an HTML janitor script using BeautifulSoup? > > My situation: > > I'm trying to convert a series of web pages from .html to palmdoc format, > using plucker, which is written in python. The plucker project suggests > passing html through "tidy", to get well-formed html for plucker to work > with. > > However, some of the pages I want to convert are so bad that even tidy > pukes on them. > > I was thinking that BeautifulSoup might be more tolerant of really bad > html... Which led me to the question this article started out with. :) > > Thanks!
I have used BeautifulSoup for screen scraping, pulling html into structured form (using XML). Is that similar to a janitor script? I used it because tidy was puking on some html. BS has been excellent. -- http://mail.python.org/mailman/listinfo/python-list