Re: Special chars with HTMLParser

2009-08-05 Thread Fafounet
Thank you, now I can get the correct character. Now when I have the string abécd I can get ab then é thanks to your function and then cd. But how is it possible to know that cd is still the same word ? Fabien > The character references indicate Unicode ordinals, not iso-8859-1 > characters. In

Special chars with HTMLParser

2009-08-05 Thread Fafounet
Hello, I am parsing a web page with special chars such as é (which stands for é). I know I can have the unicode character é from unicode ("\xe9","iso-8859-1") but with those extra characters I don' t know. I tried to implement handle_charref within HTMLParser without success. Furthermore, if I ha