Marc-Andre Lemburg <[EMAIL PROTECTED]> added the comment: Just to clarify: Python can be built as UCS2 or UCS4 build (not UTF-16 vs. UTF-32).
The conversions done from the literal escaped representation to the internal format are done using the unicode-escape and raw-unicode-escape codecs. PYC files are written using the marshal module, which uses UTF-8 as encoding for Unicode objects. All of these codecs know about surrogates, so there must be a bug somewhere in the Python tokenizer or compiler. I checked on Linux using a UCS2 and a UCS4 build of Python 2.5: the problem only shows up with the UCS4 build. _______________________________________ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3297> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com