wow, this lib is great. danke. heh, love that feeling that i've been wasting my life coding scrapers with regexs up until now... :)
patrick k. wrote: > it´s easy to write a customized sanitizer using beautifulsoup. > http://www.crummy.com/software/BeautifulSoup/ > > 1) place beautifulsoup.py somewhere in your pythonpath > 2) build your sanitizer and save it somewhere on your pythonpath > in my case it´s called eatMe and looks like this: > http://dpaste.com/hold/14305/ > > your sanitizer will probably be less complicated ... > > 3) in your models.py, do something like this: > > def isGoodHTML(field_data, all_data): > from eatMe import doEatMe > if field_data: > new_field_data = doEatMe(field_data) > if new_field_data != all_data['summary']: > all_data['summary'] = new_field_data > raise validators.ValidationError, "The errors in your > document were automatically corrected. Please check again!" > > isGoodHTML.always_test = True > > ... > > summary = models.TextField(validator_list=[isGoodHTML]) > > note: I know this looks complicated, but if you built your sanitizer > once you can always reconfigure and reuse it. > we use the above example for text-fields in the admin (with TinyMCE). > > patrick > > Am 13.07.2007 um 11:23 schrieb Derek Anderson: > >> hey all, >> >> could anyone point me to a python html sanitizer implementation (or >> example)? i don't mean to strip all html, just tags and attributes >> not >> on a whitelist, such as I/B/A href/U/etc. >> >> danke, >> derek >> > > > > > --~--~---------~--~----~------------~-------~--~----~ You received this message because you are subscribed to the Google Groups "Django users" group. To post to this group, send email to django-users@googlegroups.com To unsubscribe from this group, send email to [EMAIL PROTECTED] For more options, visit this group at http://groups.google.com/group/django-users?hl=en -~----------~----~----~----~------~----~------~--~---