On Thursday 07 February 2008 15:48:46 Jon Ribbens wrote: > Be aware that if you are doing this for security reasons (e.g. to > prevent cross-site scripting), It is for that reason, essentially.
> it is very hard to get right. Indeed, that's why I thought I'd find out what everyone else actually uses rather than follow one of the various approaches I could take. > The code at > http://www.voidspace.org.uk/python/weblog/arch_d7_2005_04_23.shtml#e35 > is wrong, for example. That's because it whitelists a collection of tags but doesn't whitelist specific attributes, I presume. I can certainly adapt that code to work the way I'd prefer it. Changing allowed_tags to something like: allowed_tags = { 'a' : ["id", "name", "href"], 'img' : ["id", "src"], .. <tag> : [ <list of allowed attributes> ] } Would allow that code to be used with only a small modification, if I'm reading your objection right. On Thursday 07 February 2008 15:20:17 Michael Foord wrote: ... > I used htmldata a while ago to do this: > > http://www.voidspace.org.uk/python/weblog/arch_d7_2005_04_23.shtml#e35 Much appreciated - I may well start from that approach. On Thursday 07 February 2008 15:30:57 Alexander Harrowell wrote: > If you're not bothered about speed, BeautifulSoup can catch, remove and > replace arbitrary HTML tags in a document. Initially, speed isn't a issue. OK, so 1 vote in favour of beautiful soup, one in favour of htmldata & one pointing out a problem with one specific example... Michael. _______________________________________________ python-uk mailing list python-uk@python.org http://mail.python.org/mailman/listinfo/python-uk