On Fri, Jul 13, 2007 at 11:48:50AM +0100, Nic James Ferrier wrote:
> 
> Brett Parker <[EMAIL PROTECTED]> writes:
> 
> > On Fri, Jul 13, 2007 at 11:18:18AM +0100, Nic James Ferrier wrote:
> >> 
> >> Derek Anderson <[EMAIL PROTECTED]> writes:
> >> 
> >> > hey all,
> >> >
> >> > could anyone point me to a python html sanitizer implementation (or 
> >> > example)?  i don't mean to strip all html, just tags and attributes not 
> >> > on a whitelist, such as I/B/A href/U/etc.
> >> 
> >> I use libxml2/libxslt, something like:
> >> 
> >>   doc = libxml2.htmlParseDoc(whatever, "utf8")
> >>   result = libxslt.applyStylesheetFile(doc, "strip.xslt", {})
> >> 
> >> There are loads of ways of stripping in xslt depending on what you
> >> want to do.
> >
> > Only works on well formed XHTML documents though... which although they
> > should be the norm, really aren't!
> 
> No. In my example I deliberately used libxml2' HTML parser which is an
> HTML parser not an XHTML parser.
> 
> It copes with non-well formed documents as well as all the usual
> entity problems.

Ohhh, so you did - sorry - eyes still blurry from sleep deprivation!

Cheers,
-- 
Brett Parker

--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to