Re: html 2 plain text

Faber Sun, 28 May 2006 11:45:50 -0700

robin wrote:

> i remember seeing this simple python function which would take raw html
> and output the content (body?) of the page as plain text (no <..> tags
> etc)
> i have been looking at htmllib and htmlparser but this all seems to
> complicated for what i'm looking for. i just need the main text in the
> body of some arbitrary webbpage to then do some natural-language
> processing with it...
> thanks for pointing me to some helpful resources!


Have a look at the Beautiful Soup library:
http://www.crummy.com/software/BeautifulSoup/

Regards

-- 
Faber
http://faberbox.com/
http://smarking.com/

A teacher must always teach to doubt his teaching. -- José Ortega y Gasset
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: html 2 plain text

Reply via email to