Skip to site navigation (Press enter)

Re: I'm looking for html cleaner. Example : convert
my title
=> < h1>my title

John Nagle Mon, 29 Mar 2010 17:22:53 -0700

Stéphane Klein wrote:

Hi,


I work on HTML cleaner.

I export OpenOffice.org documents to HTML.
Next, I would like clean this HTML export files :

* remove comment
* remove style
* remove dispensable tag
* ...


   Try parsing with HTML5 Parser ("http://code.google.com/p/html5lib/";) which
is the closest thing to a good parser available for Python.  It's basically
a reference implementation of HTML5, including all the handling of bad HTML.

   Once you have a tree, write something to go through the tree and remove
empty tags from a list of tags which do nothing when empty.  Then
regenerate HTML from the tree.

   Or just use HTML Tidy: "http://www.w3.org/People/Raggett/tidy/";

                                        John Nagle
--
http://mail.python.org/mailman/listinfo/python-list