Am 24.12.2012 um 04:03 schrieb iMath: > but how to let python do it for you ? > such as these 2 pages > http://python.org/ > http://msdn.microsoft.com/en-us/library/bb802962(v=office.12).aspx > how to detect the character encoding in these 2 pages by python ?
If you have the html code, let chardetect.py do an educated guess for you. http://pypi.python.org/pypi/chardet Example: $ wget -q -O - http://python.org/ | chardetect.py stdin: ISO-8859-2 with confidence 0.803579722043 $ $ wget -q -O - 'http://msdn.microsoft.com/en-us/library/bb802962(v=office.12).aspx' | chardetect.py stdin: utf-8 with confidence 0.87625 $ Grüessli -- kurt.alfred.muel...@gmail.com -- http://mail.python.org/mailman/listinfo/python-list