New submission from Nick Coghlan: Prompted by issue 18713 and http://lucumr.pocoo.org/2013/7/2/the-updated-guide-to-unicode/, here are some possible utilities we could add to the codecs module to help deal with/debug issues related to surrogate escaped strings:
def has_escaped_bytes(s): """Returns true if string contains surrogate escaped bytes""" ... def replace_escaped_bytes(s): """Replaces each surrogate escaped byte with a valid code point""" ... def decode_escaped_bytes(s, nominal_encoding, actual_encoding): """Reinterprets incorrectly decoded text using a new encoding""" return s.encode(nominal_encoding, 'surrogateescape').decode(actual_encoding) ---------- messages: 195937 nosy: ncoghlan priority: normal severity: normal stage: needs patch status: open title: Add tools for "cleaning" surrogate escaped strings type: enhancement versions: Python 3.4 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue18814> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com