Thank you, now I can get the correct character.
Now when I have the string abécd I can get ab then é thanks to
your function and then cd. But how is it possible to know that cd is
still the same word ?
Fabien
> The character references indicate Unicode ordinals, not iso-8859-1
> characters. In
Hello,
I am parsing a web page with special chars such as é (which
stands for é).
I know I can have the unicode character é from unicode
("\xe9","iso-8859-1")
but with those extra characters I don' t know.
I tried to implement handle_charref within HTMLParser without success.
Furthermore, if I ha