Ksenia Marasanova wrote: > I am looking for a library that will give me very simple text > representation of HTML. > For example ><div><h1>Title</h1><p>This is a <br />test</p></div> > > will be transformed to: > > Title > > This is a > test > > > i want to send plain text alternative of html email, and would prefer > to do it automatically from HTML source. > Any hints?
Use htmllib: >>> import htmllib, formatter, StringIO >>> def cleanup(s): out = StringIO.StringIO() p = htmllib.HTMLParser( formatter.AbstractFormatter(formatter.DumbWriter(out))) p.feed(s) p.close() if p.anchorlist: print >>out for idx,anchor in enumerate(p.anchorlist): print >>out, "\n[%d]: %s" % (idx+1,anchor) return out.getvalue() >>> print cleanup('''<div><h1>Title</h1><p>This is a <br />test</p></div>''') Title This is a test >>> print cleanup('''<div><h1>Title</h1><p>This is a <br />test with <a href="http://python.org">a link</a> to the Python homepage</p></div>''') Title This is a test with a link[1] to the Python homepage [1]: http://python.org -- http://mail.python.org/mailman/listinfo/python-list