On Tue, 24 Jan 2006 14:46:46 +0100, Fredrik Lundh wrote: > Robin Haswell wrote: > >> I'm currently screenscraping some Swedish site, and i need a method to >> convert XML entities (& etc, plus d etc) to Unicode characters. >> I'm sure one of python's myriad of XML processors can do this but I can't >> find which one. >> >> Can anyone make any suggestions? > > any decent html-aware screen scraper library should be able to do > this for you.
I'm using BeautifulSoup and it appears that it doesn't. I'd also like to know the answer to this for when I do screenscraping with regular expressions :-) Thanks > > if you've already extracted the strings, the strip_html function on > this page might be what you need: > > http://effbot.org/zone/re-sub.htm#strip-html > > </F> -- http://mail.python.org/mailman/listinfo/python-list