wow, this lib is great.  danke.

heh, love that feeling that i've been wasting my life coding scrapers 
with regexs up until now...  :)


patrick k. wrote:
> it´s easy to write a customized sanitizer using beautifulsoup.
> http://www.crummy.com/software/BeautifulSoup/
> 
> 1) place beautifulsoup.py somewhere in your pythonpath
> 2) build your sanitizer and save it somewhere on your pythonpath
> in my case it´s called eatMe and looks like this:
> http://dpaste.com/hold/14305/
> 
> your sanitizer will probably be less complicated ...
> 
> 3) in your models.py, do something like this:
> 
> def isGoodHTML(field_data, all_data):
>      from eatMe import doEatMe
>      if field_data:
>          new_field_data = doEatMe(field_data)
>          if new_field_data != all_data['summary']:
>              all_data['summary'] = new_field_data
>              raise validators.ValidationError, "The errors in your  
> document were automatically corrected. Please check again!"
> 
> isGoodHTML.always_test = True
> 
> ...
> 
> summary = models.TextField(validator_list=[isGoodHTML])
> 
> note: I know this looks complicated, but if you built your sanitizer  
> once you can always reconfigure and reuse it.
> we use the above example for text-fields in the admin (with TinyMCE).
> 
> patrick
> 
> Am 13.07.2007 um 11:23 schrieb Derek Anderson:
> 
>> hey all,
>>
>> could anyone point me to a python html sanitizer implementation (or
>> example)?  i don't mean to strip all html, just tags and attributes  
>> not
>> on a whitelist, such as I/B/A href/U/etc.
>>
>> danke,
>> derek
>>
> 
> 
> > 
> 


--~--~---------~--~----~------------~-------~--~----~
You received this message because you are subscribed to the Google Groups 
"Django users" group.
To post to this group, send email to django-users@googlegroups.com
To unsubscribe from this group, send email to [EMAIL PROTECTED]
For more options, visit this group at 
http://groups.google.com/group/django-users?hl=en
-~----------~----~----~----~------~----~------~--~---

Reply via email to