Adam Olsen <[EMAIL PROTECTED]> added the comment: Simpler way to reproduce this (on linux):
$ rm unicodetest.pyc $ $ python -c 'import unicodetest' Result: False Len: 2 1 Repr: u'\ud800\udd23' u'\U00010123' $ $ python -c 'import unicodetest' Result: True Len: 1 1 Repr: u'\U00010123' u'\U00010123' Storing surrogates in UTF-32 is ill-formed[1], so the first part definitely shouldn't be failing on linux (with a UTF-32 build). The repr could go either way, as unicode doesn't cover escape sequences. We could allow u'\ud800\udd23' literals to magically become u'\U00010123' on UTF-32 builds. We already allow repr(u'\ud800\udd23') to magically become "u'\U00010123'" on UTF-16 builds (which is why the repr test always passes there, rather than always failing). The bigger problem is how much we prohibit ill-formed character sequences. We already prevent values above U+10FFFF, but not inappropriate surrogates. [1] Search for D90 in http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf ---------- nosy: +Rhamphoryncus Added file: http://bugs.python.org/file10880/unicodetest.py _______________________________________ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3297> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com