On Tue, 24 Jan 2006 14:46:46 +0100, Fredrik Lundh wrote:

> Robin Haswell wrote:
> 
>> I'm currently screenscraping some Swedish site, and i need a method to
>> convert XML entities (& etc, plus d etc) to Unicode characters.
>> I'm sure one of python's myriad of XML processors can do this but I can't
>> find which one.
>>
>> Can anyone make any suggestions?
> 
> any decent html-aware screen scraper library should be able to do
> this for you.

I'm using BeautifulSoup and it appears that it doesn't. I'd also like to
know the answer to this for when I do screenscraping with regular
expressions :-)

Thanks

> 
> if you've already extracted the strings, the strip_html function on
> this page might be what you need:
> 
>     http://effbot.org/zone/re-sub.htm#strip-html
> 
> </F>

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to