Re: [python-uk] Favourite ways of scrubbing HTML/whitelisting specific HTML tags?

2008-02-08 Thread Andy Robinson
On 07/02/2008, Alexander Harrowell <[EMAIL PROTECTED]> wrote: > To clarify, I use BeautifulSoup for a small project that parses frequently > changing HTML on a number of websites (>1MB each), extracts the content of > specific tags, filters out certain strings from the content, and serves it > up i

[python-uk] Conference

2008-02-08 Thread John Pinner
As well as PyCon UK (12th-14th September) we have another UK conference which may be of interest, the UKUUG Spring Conference. It's not a pure Python conference, but has a significant Python content (possibly because I'm involved in the organisation ;) The url is http://spring2008.ukuug.org The

Re: [python-uk] Favourite ways of scrubbing HTML/whitelisting specific HTML tags?

2008-02-08 Thread Jon Ribbens
On Fri, Feb 08, 2008 at 09:01:06AM +, Andy Robinson wrote: > FWIW, we parse tens of thousands of pages every week to build let > people republish content into nice PDFs. Beautiful Soup was the only > thing that made this sane, as many pages are not structured to be easy > to parse. Like you w