Alexander Belopolsky <belopol...@users.sourceforge.net> added the comment:
On Sat, Nov 27, 2010 at 5:24 PM, Marc-Andre Lemburg <rep...@bugs.python.org> wrote: .. > Perhaps we should allow ord() to work on surrogates in > UCS4 builds as well. That would reduce the number of > surprises. > This is an interesting idea, however, having surrogates in UCS4 builds will sooner or later lead to surprises such as Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed I though UCS4 (or more properly, UTF-32) did not allow encoding of surrogate code points. It is somewhat bothersome that a valid string literal such as '\uD800\uDC00' in narrow build is subtly invalid in wide build. It would probably be better if '\uD800\uDC00' was either rejected on a wide build, or interpreted as a single character so that True on any build. ---------- _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue10542> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com