robin wrote: > i remember seeing this simple python function which would take raw html > and output the content (body?) of the page as plain text (no <..> tags > etc) > i have been looking at htmllib and htmlparser but this all seems to > complicated for what i'm looking for. i just need the main text in the > body of some arbitrary webbpage to then do some natural-language > processing with it... > thanks for pointing me to some helpful resources!
Have a look at the Beautiful Soup library: http://www.crummy.com/software/BeautifulSoup/ Regards -- Faber http://faberbox.com/ http://smarking.com/ A teacher must always teach to doubt his teaching. -- José Ortega y Gasset -- http://mail.python.org/mailman/listinfo/python-list