Marc-Andre Lemburg <m...@egenix.com> added the comment: Adam Olsen wrote: > > Adam Olsen <rha...@gmail.com> added the comment: > > Surrogates aren't optional features of UTF-16, we really need to get > this fixed. That includes .isalpha().
We use UCS2 on narrow Python builds, not UTF-16. > We might keep the old public API for compatibility, but it should be > clearly marked as broken for non-BMP scalar values. That has always been the case. UCS2 doesn't support surrogates. However, we have been slowly moving into the direction of making the UCS2 storage appear like UTF-16 to the Python programmer. This process is not yet complete and will likely never complete since it must still be possible to create things line lone surrogates for processing purposes, so care has to be taken when using non-BMP code points on narrow builds. > I don't see a problem with changing 2.x. The existing behaviour is > broken for non-BMP scalar values, so surely nobody can claim dependence > on it. No, but changing the APIs from 16-bit integers to 32-bit integers does require a recompile of all code using it. Otherwise you end up with segfaults. Also, the Unicode type database itself uses Py_UNICODE, so case mapping would fail for non-BMP code points. So if we want to support accessing non-BMP type information on narrow builds, we'd need to change the complete Unicode type database API to work with UCS4 code points and then provide a backwards compatible C API using Py_UNICODE. Due to the UCS2/UCS4 API renaming done in unicodeobject.h, this would amount to exposing both the UCS2 and the UCS4 variants of the APIs on narrow builds. With such an approach we'd not break the binary API and still get the full UCS4 range of code points in the type database. The change would be possible in Python 2.x and 3.x (which now both use the same strategy w/r to change management). Would someone be willing to work on this ? ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue5127> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com