In <[EMAIL PROTECTED]>, irstas wrote: > I'd like to see how this transformation can be done with > BeautifulSoup. Well, the last two regexps can be replaced with this: > > unicode(BeautifulStoneSoup(s,convertEntities=BeautifulStoneSoup.HTML_ENTITIES).contents[0])
Completely without regular expressions: def main(): soup = BeautifulSoup(source, convertEntities=BeautifulSoup.HTML_ENTITIES) print ' '.join(''.join(soup(text=True)).split()) Ciao, Marc 'BlackJack' Rintsch -- http://mail.python.org/mailman/listinfo/python-list