Marc-Andre Lemburg added the comment:
I think you have a wrong understanding of round-tripping.
In Unicode it is really irrelevant if you're using a UCS2 surrogate pair
or a UCS4 representation to describe a code point. The length of the
Unicode representation may change, but the meaning won't,
Martin v. Löwis added the comment:
As Guido says: this is by design. The Unicode type doesn't really
support storage of surrogates; so don't use it for that.
--
nosy: +loewis
resolution: -> wont fix
status: open -> closed
__
Tracker <[EMAIL PROTECTED]>
<
Guido van Rossum added the comment:
I think this is unavoidable. Depending on whether you happen to be using
a narrow or wide unicode build of Python, \U may be turned into
a pair of surrogates anyway. It's not just marshal that's not
roundtripping; the utf-8 codec has the same issue (and
New submission from Carl Friedrich Bolz:
Marshal does not round-trip unicode surrogate pairs for wide unicode-builds:
marshal.loads(marshal.dumps(u"\ud800\udc00")) == u'\U0001'
This is very annoying, because the size of unicode constants differs
between when you run a module for the first t