Robin Haswell wrote: > I'm currently screenscraping some Swedish site, and i need a method to > convert XML entities (& etc, plus d etc) to Unicode characters. > I'm sure one of python's myriad of XML processors can do this but I can't > find which one. > > Can anyone make any suggestions?
any decent html-aware screen scraper library should be able to do this for you. if you've already extracted the strings, the strip_html function on this page might be what you need: http://effbot.org/zone/re-sub.htm#strip-html </F> -- http://mail.python.org/mailman/listinfo/python-list