On Oct 13, 12:42 pm, MRAB <[EMAIL PROTECTED]> wrote:
> You can
> decode that into the actual UTF-8 string with decode("string_escape"):
>
> s = raw_input('Enter: ')   #A\xcc\x88
> s = s.decode("string_escape")
>

Ahh.  Thanks for that.


>On Oct 12, 2:43 pm, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote:
>
> > And what is it that your keyboard enters to produce an 'a' with an umlaut?
>
> *I* just hit the ä key.  The one right next to the ö key.  ;-)
>

BeautifulSoup can convert an html entity representing an 'A' with
umlaut, e.g.:

&Auml;

into an   without every touching my keyboard.  How does BeautifulSoup
do it?


from BeautifulSoup import BeautifulStoneSoup as bss


s1 = "<h1>&Auml;</h1>"  #&_Auml;_
#I added the comment after the line to show the
#format of the html entity.  In case a browser
#might render the comment into the actual character,
#I added underscores to the html entity:

soup = bss(s1)
text = soup.contents[0].string  #gets the 'A' with umlaut out of the
html

new_s = bss(text, convertEntities=bss.HTML_ENTITIES)
print repr(new_s)
print new_s

I see the same output for both print statements, and what I see is an
'A' with umlaut.  I expected that the first print statement would show
the utf-8 encoding for the character.

-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to