On 24/12/12 01:34:47, iMath wrote: > how to detect the character encoding in a web page ?
That depends on the site: different sites indicate their encoding differently. > such as this page: http://python.org/ If you download that page and look at the HTML code, you'll find a line: <meta http-equiv="content-type" content="text/html; charset=utf-8" /> So it's encoded as utf-8. Other sites declare their charset in the Content-Type HTTP header line. And then there are sites relying on the default. And sites that get it wrong, and send data in a different encoding from what they declare. Welcome to the real world, -- HansM -- http://mail.python.org/mailman/listinfo/python-list