Adam Olsen <[EMAIL PROTECTED]> added the comment: Marc, perhaps Unicode has refined their definitions since you last looked?
Valid UTF-8 *cannot* contain surrogates[1]. If it does, you have CESU-8[2][3], not UTF-8. So there are two bugs: first, the UTF-8 codec should refuse to load surrogates. Second, since the original bug showed up before the .pyc is created, something in the parse/compilation/whatever stage is producing CESU-8. [1] 4th bullet point of D92 in http://www.unicode.org/versions/Unicode5.0.0/ch03.pdf [2] http://unicode.org/reports/tr26/ [3] http://en.wikipedia.org/wiki/CESU-8 _______________________________________ Python tracker <[EMAIL PROTECTED]> <http://bugs.python.org/issue3297> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com