Re: decode Numeric Character References to unicode

7stud Mon, 18 Feb 2008 04:06:02 -0800

On Feb 18, 4:53 am, 7stud <[EMAIL PROTECTED]> wrote:
> On Feb 18, 3:20 am, William Heymann <[EMAIL PROTECTED]> wrote:
>
> > How do I decode a string back to useful unicode that has xml numeric 
> > character
> > references in it?
>
> > Things like &#21344;  #which is: &_#21344_; (without the underscores)
>
> BeautifulSoup can handle two of the three formats for html entities.
> For instance, an 'o' with umlaut can be represented in three different
> ways:
>
> &_ouml_;
> ö
> ö
>


lol.  It's hard to even make posts about this stuff because html
entities get converted by the forum software. Here are the three
different formats for an 'o with umlaut' with some underscores added
to keep the forum software from rendering the characters:

&_ouml_;
&_#246_;
&_#xf6_;
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: decode Numeric Character References to unicode

Reply via email to