[issue3297] Python interpreter uses Unicode surrogate pairs only before the pyc is created

Marc-Andre Lemburg Fri, 11 Jul 2008 17:39:57 -0700

Marc-Andre Lemburg <[EMAIL PROTECTED]> added the comment:

Just to clarify: Python can be built as UCS2 or UCS4 build (not UTF-16
vs. UTF-32).


The conversions done from the literal escaped representation to the
internal format are done using the unicode-escape and raw-unicode-escape
codecs.

PYC files are written using the marshal module, which uses UTF-8 as
encoding for Unicode objects.

All of these codecs know about surrogates, so there must be a bug
somewhere in the Python tokenizer or compiler.

I checked on Linux using a UCS2 and a UCS4 build of Python 2.5: the
problem only shows up with the UCS4 build.

_______________________________________
Python tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue3297>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue3297] Python interpreter uses Unicode surrogate pairs only before the pyc is created

Reply via email to