Re: unicode html

Stefan Behnel Mon, 17 Jul 2006 23:46:39 -0700

[EMAIL PROTECTED] wrote:
> Hi, I've found lots of material on the net about unicode html
> conversions, but still i'm having many problems converting unicode
> characters to html entities. Is there any available function to solve
> this issue?
> As an example I would like to do this kind of conversion:
> \uc3B4 => &ocirc;
> for all available html entities.


I don't know how you generate your HTML, but ElementTree and lxml both have
good HTML parsers, so that you can let them write out the result with an
"US-ASCII" encoding and they will generate numeric entities for everything
that's not ASCII.

    >>> from lxml import etree
    >>> root = etree.HTML(my_html_data)
    >>> html_7_bit = etree.tostring(root, "us-ascii")

Stefan
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode html

Reply via email to