On Oct 13, 12:42 pm, MRAB <[EMAIL PROTECTED]> wrote: > You can > decode that into the actual UTF-8 string with decode("string_escape"): > > s = raw_input('Enter: ') #A\xcc\x88 > s = s.decode("string_escape") >
Ahh. Thanks for that. >On Oct 12, 2:43 pm, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote: > > > And what is it that your keyboard enters to produce an 'a' with an umlaut? > > *I* just hit the ä key. The one right next to the ö key. ;-) > BeautifulSoup can convert an html entity representing an 'A' with umlaut, e.g.: Ä into an without every touching my keyboard. How does BeautifulSoup do it? from BeautifulSoup import BeautifulStoneSoup as bss s1 = "<h1>Ä</h1>" #&_Auml;_ #I added the comment after the line to show the #format of the html entity. In case a browser #might render the comment into the actual character, #I added underscores to the html entity: soup = bss(s1) text = soup.contents[0].string #gets the 'A' with umlaut out of the html new_s = bss(text, convertEntities=bss.HTML_ENTITIES) print repr(new_s) print new_s I see the same output for both print statements, and what I see is an 'A' with umlaut. I expected that the first print statement would show the utf-8 encoding for the character.
-- http://mail.python.org/mailman/listinfo/python-list