Steven D'Aprano schrieb: > A few issues: > > (1) It doesn't seem to be reversible: > >>>> '© and many more...'.decode('latin-1') > u'© and many more...' > > What should I do instead?
For reverse processing, you need to parse it with an SGML/XML parser. > (2) Are XML entities guaranteed to be the same as HTML entities? Please make a terminology difference between "entity", "entity reference", and "character reference". An (external parsed) entity is a named piece of text, such as the copyright character. An entity reference is a reference to such a thing, e.g. © A character reference is a reference to a character, not to an entity. xmlcharrefreplace generates character references, not entity references (let alone generating entities). The character references in XML and HTML both reference by Unicode ordinal, so it is "the same". > (3) Is there a way to find out at runtime what encoders/decoders/error > handlers are available, and what they do? Not through Python code. In C code, you can look at the codec_error_registry field of the interpreter object. Regards, Martin -- http://mail.python.org/mailman/listinfo/python-list