Alexander Belopolsky <belopol...@users.sourceforge.net> added the comment:
On Fri, Dec 10, 2010 at 6:09 PM, Daniel Stutzbach <rep...@bugs.python.org> wrote: .. > The second check for surrogates in Py_UNICODE_PUT_NEXT is necessary, unless > you can prove that > Py_UNICODE_SOME_TRANSFORMATION will never transform characters < 0x10000 into > characters > > 0x10000 or vice versa. > > Can we prove will always be the case, for current and future versions of > Unicode, for all or almost-all of the > transformations we care about? > Certainly not for all, but for some important transformations, I believe Unicode Standard does promise that the transformation maps BMP to BMP and supplements to supplements. For example case folding and normalization are two important examples. > Answering that question and figuring out what to do about it are probably > more trouble than it's worth. > If a particularly point proves to be a bottleneck, we can always specialize > the code there later. Agree. It is even more likely that the applications that have to deal with lots of supplementary characters will be better off using a wide unicode build rather than such specialization. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue10542> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com