mattia wrote:
Hi all, I'm using py3k and the urllib package to download web pages. Can you suggest me a package that can translate reserved characters in html like "è", "ò", "é" in the corresponding correct encoding?
import re from html.entities import entitydefs # The downloaded web page will be bytes, so decode it to a string. webpage = downloaded_page.decode("iso-8859-1") # Then decode the HTML entities. webpage = re.sub(r"&(\w+);", lambda m: entitydefs[m.group(1)], webpage) -- http://mail.python.org/mailman/listinfo/python-list