That seems to me correct. >>> '\\u{:04x}'.format(ord(u'é')) \u00e9 >>> '\\U{:08x}'.format(ord(u'é')) \U000000e9 >>>
because >>> u'\U00e9' File "<eta last command>", line 1 SyntaxError: (unicode error) 'unicodeescape' codec can't decode bytes in position 0-5: end of string in escape sequence >>> u'\U000000e9' é >>> u'\u00e9' é >>> from this: >>> u'éléphant\N{EURO SIGN}' éléphant >>> u = u'éléphant\N{EURO SIGN}' >>> ''.join(['\\u{:04x}'.format(ord(c)) for c in u]) \u00e9\u006c\u00e9\u0070\u0068\u0061\u006e\u0074\u20ac >>> Skipping surrogate pairs is a little bit a non sense, because the purpose is to display code points! -- http://mail.python.org/mailman/listinfo/python-list