7stud <[EMAIL PROTECTED]> writes:
> For instance, an 'o' with umlaut can be represented in three
> different ways:
>
> '&' followed by 'ouml;'
> '&' followed by '#246;'
> '&' followed by '#xf6;'
The fourth way, of course, is to simply have 'ö' appear directly as a
character in the document, and
7stud <[EMAIL PROTECTED]> wrote:
> On Feb 18, 4:53 am, 7stud <[EMAIL PROTECTED]> wrote:
>> On Feb 18, 3:20 am, William Heymann <[EMAIL PROTECTED]> wrote:
>>
>> > How do I decode a string back to useful unicode that has xml
>> > numeric cha
> racter
>> > references in it?
>>
>> > Things like 占 #w
On Feb 18, 4:53 am, 7stud <[EMAIL PROTECTED]> wrote:
> On Feb 18, 3:20 am, William Heymann <[EMAIL PROTECTED]> wrote:
>
> > How do I decode a string back to useful unicode that has xml numeric
> > character
> > references in it?
>
> > Things like 占 #which is: &_#21344_; (without the underscores)
On Feb 18, 3:20 am, William Heymann <[EMAIL PROTECTED]> wrote:
> How do I decode a string back to useful unicode that has xml numeric character
> references in it?
>
> Things like 占
BeautifulSoup can handle two of the three formats for html entities.
For instance, an 'o' with umlaut can be represe
William Heymann <[EMAIL PROTECTED]> wrote:
> How do I decode a string back to useful unicode that has xml numeric
> character references in it?
>
> Things like 占
>
Try something like this:
import re
from htmlentitydefs import name2codepoint
name2codepoint = name2codepoint.copy()
name2codepoint