On Sun, Mar 20, 2016 at 3:12 AM, Marko Rauhamaa <ma...@pacujo.net> wrote: > Steven D'Aprano <st...@pearwood.info>: > >> On Sun, 20 Mar 2016 02:02 am, Marko Rauhamaa wrote: >>> Yes, but UTF-16 produces 16-bit values that are outside Unicode. >> >> Show me. >> >> Before you answer, if your answer is "surrogate pairs", that is >> incorrect. Surrogate pairs is how UTF-16 encodes astral characters. > > UTF-16 inputs a Unicode stream and produces a stream of 16-bit numbers. > Thus, the output of UTF-16 is not Unicode.
Then UTF-16 produces 16-bit values that have nothing whatsoever to do with Unicode. Is that what you're saying? If so, you're correct; UTF-16LE produces two bytes to represent every BMP character, and four bytes to represent every non-BMP character, and those are not themselves Unicode. ChrisA -- https://mail.python.org/mailman/listinfo/python-list