Re: [python-uk] Favourite ways of scrubbing HTML/whitelisting specific HTML tags?

2008-02-07 Thread Alexander Harrowell
On Thu, Feb 7, 2008 at 7:11 PM, Shaun Laughey <[EMAIL PROTECTED]> wrote: > > Hi, > I have used Beautiful Soup for parsing html. > It works very nicely and I didn't see much of an issue with speed in > parsing several hundred html files every hour or so. > I also rolled my own using various regex's

Re: [python-uk] Favourite ways of scrubbing HTML/whitelisting specific HTML tags?

2008-02-07 Thread Shaun Laughey
On 07/02/2008, Jon Ribbens <[EMAIL PROTECTED]> wrote: > On Thu, Feb 07, 2008 at 05:50:37PM +, Michael Sparks wrote: > > > The code at > > > http://www.voidspace.org.uk/python/weblog/arch_d7_2005_04_23.shtml#e35 > > > is wrong, for example. > > > > That's because it whitelists a collection of ta

Re: [python-uk] Favourite ways of scrubbing HTML/whitelisting specific HTML tags?

2008-02-07 Thread Jon Ribbens
On Thu, Feb 07, 2008 at 05:50:37PM +, Michael Sparks wrote: > > The code at > > http://www.voidspace.org.uk/python/weblog/arch_d7_2005_04_23.shtml#e35 > > is wrong, for example. > > That's because it whitelists a collection of tags but doesn't whitelist > specific attributes, I presume. That

Re: [python-uk] Favourite ways of scrubbing HTML/whitelisting specific HTML tags?

2008-02-07 Thread Menno Smits
Hi Michael, Michael Sparks wrote: > Just a quick Q for people: what's your favourite way (preferably a library :) > of allowing a subset of HTML tags through? I can think of 1/2 dozen different > ways of doing this, but I'm sure there's a preferred approach for some... > > Thanks in advance :-)

Re: [python-uk] Favourite ways of scrubbing HTML/whitelisting specific HTML tags?

2008-02-07 Thread Michael Foord
Michael Sparks wrote: > On Thursday 07 February 2008 15:48:46 Jon Ribbens wrote: > >> The code at >> http://www.voidspace.org.uk/python/weblog/arch_d7_2005_04_23.shtml#e35 >> is wrong, for example. >> > > That's because it whitelists a collection of tags but doesn't whitelist > specific at

Re: [python-uk] Favourite ways of scrubbing HTML/whitelisting specific HTML tags?

2008-02-07 Thread Jon Staley
Hi, Just a quick suggestion, I've used Strip-o-Gram in the past and found it to be pretty good. http://zope.org/Members/chrisw/StripOGram/readme -- Jon On Feb 7, 2008 2:35 PM, Michael Sparks <[EMAIL PROTECTED]> wrote: > Hi, > > > Just a quick Q for people: what's your favourite way (preferably

Re: [python-uk] Favourite ways of scrubbing HTML/whitelisting specific HTML tags?

2008-02-07 Thread Michael Sparks
On Thursday 07 February 2008 15:48:46 Jon Ribbens wrote: > Be aware that if you are doing this for security reasons (e.g. to > prevent cross-site scripting), It is for that reason, essentially. > it is very hard to get right. Indeed, that's why I thought I'd find out what everyone else actually

Re: [python-uk] Favourite ways of scrubbing HTML/whitelisting specific HTML tags?

2008-02-07 Thread Michael Foord
Jon Ribbens wrote: > On Thu, Feb 07, 2008 at 02:35:29PM +, Michael Sparks wrote: > >> Just a quick Q for people: what's your favourite way (preferably a library >> :) >> of allowing a subset of HTML tags through? I can think of 1/2 dozen >> different >> ways of doing this, but I'm sure t

Re: [python-uk] Favourite ways of scrubbing HTML/whitelisting specific HTML tags?

2008-02-07 Thread Jon Ribbens
On Thu, Feb 07, 2008 at 02:35:29PM +, Michael Sparks wrote: > Just a quick Q for people: what's your favourite way (preferably a library :) > of allowing a subset of HTML tags through? I can think of 1/2 dozen different > ways of doing this, but I'm sure there's a preferred approach for some.

Re: [python-uk] Favourite ways of scrubbing HTML/whitelisting specific HTML tags?

2008-02-07 Thread Alexander Harrowell
If you're not bothered about speed, BeautifulSoup can catch, remove and replace arbitrary HTML tags in a document. On Thu, Feb 7, 2008 at 2:35 PM, Michael Sparks <[EMAIL PROTECTED]> wrote: > Hi, > > > Just a quick Q for people: what's your favourite way (preferably a library > :) > of allowing a

Re: [python-uk] Favourite ways of scrubbing HTML/whitelisting specific HTML tags?

2008-02-07 Thread Michael Foord
Michael Sparks wrote: > Hi, > > > Just a quick Q for people: what's your favourite way (preferably a library :) > of allowing a subset of HTML tags through? I can think of 1/2 dozen different > ways of doing this, but I'm sure there's a preferred approach for some... > > Thanks in advance :-) > >

[python-uk] Favourite ways of scrubbing HTML/whitelisting specific HTML tags?

2008-02-07 Thread Michael Sparks
Hi, Just a quick Q for people: what's your favourite way (preferably a library :) of allowing a subset of HTML tags through? I can think of 1/2 dozen different ways of doing this, but I'm sure there's a preferred approach for some... Thanks in advance :-) Michael. -- http://yeoldeclue.com/bl