Hello, I am parsing a web page with special chars such as é (which stands for é). I know I can have the unicode character é from unicode ("\xe9","iso-8859-1") but with those extra characters I don' t know.
I tried to implement handle_charref within HTMLParser without success. Furthermore, if I have the data abécd, handle_data will get "ab", handle_charref will get xe9 and then handle_data doesn't have the end of the string ("cd"). Thank you for your help, Fabien -- http://mail.python.org/mailman/listinfo/python-list