On Oct 30, 6:44 pm, "一首诗" <[EMAIL PROTECTED]> wrote: > Oh, I didn't make myself clear. > > What I mean is how to convert a piece of html to plain text bu keep as > much format as possible. > > Such as convert " " to blank space and convert <br> to "\r\n" >
Then you can explore the parser, http://docs.python.org/lib/module-HTMLParser.html, like #!/usr/bin/env python from HTMLParser import HTMLParser parsedtext = '' class Parser(HTMLParser): def handle_starttag(self, tag, attrs): if tag == 'br': global parsedtext parsedtext += '\\r\\n' def handle_data(self, data): global parsedtext parsedtext += data def handle_entityref(self, name): if name == 'nbsp': pass x = Parser() x.feed('An text<br>') print parsedtext > Gary Herron wrote: > > 一首诗 wrote: > > > Is there any simple way to solve this problem? > > > Yes, strings have a replace method: > > > >>> s = "abc def" > > >>> s.replace(' ',' ') > > 'abc def' > > > Also various modules that are meant to deal with web and xml and such > > have functions to do such operations. > > > Gary Herron -- http://mail.python.org/mailman/listinfo/python-list