Re: Beazley 4E P.E.R, Page29: Unicode

Terry Reedy Sun, 14 Jul 2013 00:12:58 -0700

On 7/13/2013 11:09 PM, [email protected] wrote:

http://stackoverflow.com/questions/17632246/beazley-4e-p-e-r-page29-unicode


Is this David Beazley? (You referred to 'DB' later.)

 "directly writing a raw UTF-8 encoded string such as
'Jalape\xc3\xb1o' simply produces a nine-character string U+004A,
U+0061, U+006C, U+0061, U+0070, U+0065, U+00C3, U+00B1, U+006F, which
is probably not what you intended.This is because in UTF-8, the
multi- byte sequence \xc3\xb1 is supposed to represent the single
character U+00F1, not the two characters U+00C3 and U+00B1."

My original question was: Shouldn't this be 8 characters - not 9? He
says: \xc3\xb1 is supposed to represent the single character. However
after some interaction with fellow Pythonistas i'm even more
confused.

With reference to the above para: 1. What does he mean by "writing a
raw UTF-8 encoded string"??

As much respect as I have for DB, I think this is an impossible to parseconfused statement, fueled by the Python 2 confusion between charactersand bytes. I suggest forgetting it and the discussion that followed.Bytes as bytes can carry any digital information, just as modulated sinewaves can carry any analog information. In both cases, one can regardthem as either purely what they are or as encoding information in someother form.


--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Re: Beazley 4E P.E.R, Page29: Unicode

Reply via email to