I would like to convert an 8-bit string (i.e., a str) into unicode, treating chars \x00-\x7f as ascii, and converting any chars \x80-xff into a backslashed escape sequences. I.e., I want something like this:
>>> decode_with_backslashreplace('abc \xff\xe8 def') u'abc \\xff\\xe8 def' The best I could come up with was: def decode_with_backslashreplace(s): "str -> unicode" return (s.decode('latin1') .encode('ascii', 'backslashreplace') .decode('ascii')) Surely there's a better way than converting back and forth 3 times? Is there a reason that the 'backslashreplace' error mode can't be used with codecs.decode? >>> 'abc \xff\xe8 def'.decode('ascii', 'backslashreplace') Traceback (most recent call last): File "<stdin>", line 1, in ? TypeError: don't know how to handle UnicodeDecodeError in error callback -Edward -- http://mail.python.org/mailman/listinfo/python-list