On 06/17/2013 10:42 PM, Steven D'Aprano wrote:
On Mon, 17 Jun 2013 21:06:57 -0400, Dave Angel wrote:
On 06/17/2013 08:41 PM, Steven D'Aprano wrote:
<SNIP>
In Python 3.2 and older, the data will be either UTF-4 or UTF-8,
selected when the Python compiler itself is compiled.
I think that was a typo. Do you perhaps UCS-2 or UCS-4
Yes, that would be better.
UCS-2 is identical to UTF-16, except it doesn't support non-BMP
characters and therefore doesn't have surrogate pairs.
UCS-4 is functionally equivalent to UTF-16,
Perhaps you mean UTF-32 ?
as far as I can tell. (I'm
not really sure what the difference is.)
Now you've got me curious, by bringing up surrogate pairs. Do you know
whether a narrow build (say 3.2) really works as UTF16, so when you
encode a surrogate pair (4 bytes) to UTF-8, it encodes a single Unicode
character into a single UTF-8 sequence (prob. 4 bytes long) ?
--
DaveA
--
http://mail.python.org/mailman/listinfo/python-list