On Thu, 01 Nov 2007 19:21:03 -0700, 7stud wrote:
> BeautifulSoup can convert an html entity representing an 'A' with
> umlaut, e.g.:
>
> Ä
>
> into an without every touching my keyboard. How does BeautifulSoup
> do it?
It maps the HTML entity names to unicode characters. Take a look at the
On Oct 13, 12:42 pm, MRAB <[EMAIL PROTECTED]> wrote:
> You can
> decode that into the actual UTF-8 string with decode("string_escape"):
>
> s = raw_input('Enter: ') #A\xcc\x88
> s = s.decode("string_escape")
>
Ahh. Thanks for that.
>On Oct 12, 2:43 pm, Marc 'BlackJack' Rintsch <[EMAIL PROTECT
On Oct 13, 3:09 am, 7stud <[EMAIL PROTECTED]> wrote:
> On Oct 12, 2:43 pm, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote:
>
> > You mean literally!? Then of course I get A\xcc\x88 because that's what I
> > entered. In string literals in source code the backslash has a special
> > meaning but
On Fri, 12 Oct 2007 19:09:46 -0700, 7stud wrote:
> On Oct 12, 2:43 pm, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote:
>> You mean literally!? Then of course I get A\xcc\x88 because that's what I
>> entered. In string literals in source code the backslash has a special
>> meaning but `raw_in
On Oct 12, 2:43 pm, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote:
> You mean literally!? Then of course I get A\xcc\x88 because that's what I
> entered. In string literals in source code the backslash has a special
> meaning but `raw_input()` does not "interpret" the input in any way.
>
Th
On Fri, 12 Oct 2007 13:18:35 -0700, 7stud wrote:
> On Oct 12, 1:18 pm, [EMAIL PROTECTED] wrote:
>> On Oct 12, 1:53 pm, 7stud <[EMAIL PROTECTED]> wrote:
>>
>> > s = 'A\xcc\x88' #capital A with umlaut
>> > print s #displays capital A with umlaut
>>
>> > s = raw_input('Enter: ') #A\xcc\
On Oct 12, 1:18 pm, [EMAIL PROTECTED] wrote:
> On Oct 12, 1:53 pm, 7stud <[EMAIL PROTECTED]> wrote:
>
> > s = 'A\xcc\x88' #capital A with umlaut
> > print s #displays capital A with umlaut
>
> > s = raw_input('Enter: ') #A\xcc\x88
> > print s#displays A\xcc\x88
>
>
On Oct 12, 1:53 pm, 7stud <[EMAIL PROTECTED]> wrote:
> s = 'A\xcc\x88' #capital A with umlaut
> print s #displays capital A with umlaut
>
> s = raw_input('Enter: ') #A\xcc\x88
> print s#displays A\xcc\x88
>
> print len(input) #9
>
> It looks like every ch
s = 'A\xcc\x88' #capital A with umlaut
print s #displays capital A with umlaut
s = raw_input('Enter: ') #A\xcc\x88
print s#displays A\xcc\x88
print len(input) #9
It looks like every character of the string I enter in utf-8 is being
interpreted literal