On Sunday, March 20, 2016 at 10:32:07 AM UTC+5:30, Steven D'Aprano wrote: > On Sun, 20 Mar 2016 03:12 am, Marko Rauhamaa wrote: > > > Steven D'Aprano : > > > >> On Sun, 20 Mar 2016 02:02 am, Marko Rauhamaa wrote: > >>> Yes, but UTF-16 produces 16-bit values that are outside Unicode. > >> > >> Show me. > >> > >> Before you answer, if your answer is "surrogate pairs", that is > >> incorrect. Surrogate pairs is how UTF-16 encodes astral characters. > > > > UTF-16 inputs a Unicode stream and produces a stream of 16-bit numbers. > > Thus, the output of UTF-16 is not Unicode. > > I'm not sure what point you think you are making. > > Unicode (the character set part of it) is a set of abstract 23-bit numbers,
23? Or 21? AIUI if the 'least-count' is 1 its 21 If its 8 its 24 If its 16 its 32 More pertinently if the number of bits signifies, whatever is the sense of the word 'abstract'? > or code points, representing (among other things) characters, and numbered > from U+0000 to U+10FFFF. Any UTF is, by definition, a transformation from > such abstract code points to sequences of machine words or bytes (and vice > versa). What's your point? I think its more useful to think of data transformations between formats Rather than calling one format more abstract than another -- https://mail.python.org/mailman/listinfo/python-list