On Fri, Jul 13, 2007 at 11:48:50AM +0100, Nic James Ferrier wrote: > > Brett Parker <[EMAIL PROTECTED]> writes: > > > On Fri, Jul 13, 2007 at 11:18:18AM +0100, Nic James Ferrier wrote: > >> > >> Derek Anderson <[EMAIL PROTECTED]> writes: > >> > >> > hey all, > >> > > >> > could anyone point me to a python html sanitizer implementation (or > >> > example)? i don't mean to strip all html, just tags and attributes not > >> > on a whitelist, such as I/B/A href/U/etc. > >> > >> I use libxml2/libxslt, something like: > >> > >> doc = libxml2.htmlParseDoc(whatever, "utf8") > >> result = libxslt.applyStylesheetFile(doc, "strip.xslt", {}) > >> > >> There are loads of ways of stripping in xslt depending on what you > >> want to do. > > > > Only works on well formed XHTML documents though... which although they > > should be the norm, really aren't! > > No. In my example I deliberately used libxml2' HTML parser which is an > HTML parser not an XHTML parser. > > It copes with non-well formed documents as well as all the usual > entity problems.
Ohhh, so you did - sorry - eyes still blurry from sleep deprivation! Cheers, -- Brett Parker --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~----------~----~----~----~------~----~------~--~---