On Feb 18, 4:53 am, 7stud <[EMAIL PROTECTED]> wrote: > On Feb 18, 3:20 am, William Heymann <[EMAIL PROTECTED]> wrote: > > > How do I decode a string back to useful unicode that has xml numeric > > character > > references in it? > > > Things like 占 #which is: &_#21344_; (without the underscores) > > BeautifulSoup can handle two of the three formats for html entities. > For instance, an 'o' with umlaut can be represented in three different > ways: > > &_ouml_; > ö > ö >
lol. It's hard to even make posts about this stuff because html entities get converted by the forum software. Here are the three different formats for an 'o with umlaut' with some underscores added to keep the forum software from rendering the characters: &_ouml_; &_#246_; &_#xf6_; -- http://mail.python.org/mailman/listinfo/python-list