Fafounet wrote:
> I am parsing a web page with special chars such as é (which
> stands for é).
> I know I can have the unicode character é from unicode
> ("\xe9","iso-8859-1")
> but with those extra characters I don' t know.
>
> I tried to implement handle_charref within HTMLParser without success
> Fafounet (F) wrote:
>F> Thank you, now I can get the correct character.
>F> Now when I have the string abécd I can get ab then é thanks to
>F> your function and then cd. But how is it possible to know that cd is
>F> still the same word ?
That depends on your definition of `word'. And that
Thank you, now I can get the correct character.
Now when I have the string abécd I can get ab then é thanks to
your function and then cd. But how is it possible to know that cd is
still the same word ?
Fabien
> The character references indicate Unicode ordinals, not iso-8859-1
> characters. In
> Fafounet (F) wrote:
>F> Hello,
>F> I am parsing a web page with special chars such as é (which
>F> stands for é).
>F> I know I can have the unicode character é from unicode
>F> ("\xe9","iso-8859-1")
>F> but with those extra characters I don' t know.
>F> I tried to implement handle_charref
Hello,
I am parsing a web page with special chars such as é (which
stands for é).
I know I can have the unicode character é from unicode
("\xe9","iso-8859-1")
but with those extra characters I don' t know.
I tried to implement handle_charref within HTMLParser without success.
Furthermore, if I ha