-------- >>> sys.version '3.3.2 (v3.3.2:d047928ae3f6, May 16 2013, 00:03:43) [MSC v.1600 32 bit (Intel)]' >>> '\ud800'.encode('utf-8') Traceback (most recent call last): File "<eta last command>", line 1, in <module> UnicodeEncodeError: 'utf-8' codec can't encode character '\ud800' in position 0: surrogates not allowed >>> '\ud800'.encode('utf-32-be') b'\x00\x00\xd8\x00' >>> '\ud800'.encode('utf-32-le') b'\x00\xd8\x00\x00' >>> '\ud800'.encode('utf-32') b'\xff\xfe\x00\x00\x00\xd8\x00\x00'
jmf -- https://mail.python.org/mailman/listinfo/python-list