Re: how to detect the character encoding in a web page ?

Kurt Mueller Mon, 24 Dec 2012 00:40:42 -0800

Am 24.12.2012 um 04:03 schrieb iMath:
> but how to let python do it for you ? 
> such as these 2 pages 
> http://python.org/ 
> http://msdn.microsoft.com/en-us/library/bb802962(v=office.12).aspx
> how to  detect the character encoding in these 2 pages  by python ?



If you have the html code, let 
chardetect.py 
do an educated guess for you.

http://pypi.python.org/pypi/chardet

Example:
$ wget -q -O - http://python.org/ | chardetect.py 
stdin: ISO-8859-2 with confidence 0.803579722043
$ 

$ wget -q -O - 
'http://msdn.microsoft.com/en-us/library/bb802962(v=office.12).aspx' | 
chardetect.py 
stdin: utf-8 with confidence 0.87625
$ 


Grüessli
-- 
kurt.alfred.muel...@gmail.com

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: how to detect the character encoding in a web page ?

Reply via email to