Is there a HTML clean/tidy library or module written in pure python? I found mxTidy, but it's a interface to command-line tool.
What I'm searching is something that will accept a list of allowed tags and/or attributes and strip the rest from HTML string.
Here's a module I wrote to do something along the lines of what you want: <http://ecritters.biz/limithtml.py>. Unfortunately, it requires the HTML to be relatively well-formed (e.g. it doesn't like things like "<i><b>foo</i></b>"), so I feed the HTML into uTidyLib (another interface to HTML Tidy) first. I'm not sure why you don't want to use Tidy, but if you do change your mind, you should be able to use my module alongside Tidy to limit the HTML elements and attributes which will be accepted.
--
http://mail.python.org/mailman/listinfo/python-list