hi, i remember seeing this simple python function which would take raw html and output the content (body?) of the page as plain text (no <..> tags etc) i have been looking at htmllib and htmlparser but this all seems to complicated for what i'm looking for. i just need the main text in the body of some arbitrary webbpage to then do some natural-language processing with it... thanks for pointing me to some helpful resources!
robin -- http://mail.python.org/mailman/listinfo/python-list