Tom Christiansen <tchr...@perl.com> added the comment: I wrote:
>> Python's narrow builds are, in a sense, 'between' UCS-2 and UTF-16. > So I'm finding. Perhaps that's why I keep getting confused. I do have a > pretty firm > notion of what UCS-2 and UTF-16 are, and so I get sometimes > self-contradictory results. > Can you think of anywhere that Python acts like UCS-2 and not UTF-16? I'm > not sure I > have found one, although the regex thing might count. I just thought of one. The casemapping functions don't work right on Deseret, which is a non-BMP case-changing scripts. That's one I submitted as a bug, because I figure if the the UTF-8 decoder can decode the non-BMP code points into paired UTF-16 surrogates, then the casing functions had jolly well be able to deal with it. If the UTF-8 decoder knows it is only going to UCS-2, then it should have raised on exception on my non-BMP source. Since it went to UTF-16, the rest of the language should have behaved accordingly. Java does to this right, BTW, despite its UTF-16ness. --tom ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue12729> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com