Edward Loper <[EMAIL PROTECTED]> wrote: >I would like to convert an 8-bit string (i.e., a str) into unicode, >treating chars \x00-\x7f as ascii, and converting any chars \x80-xff >into a backslashed escape sequences. I.e., I want something like this: > > >>> decode_with_backslashreplace('abc \xff\xe8 def') >u'abc \\xff\\xe8 def' > >The best I could come up with was: > > def decode_with_backslashreplace(s): > "str -> unicode" > return (s.decode('latin1') > .encode('ascii', 'backslashreplace') > .decode('ascii')) > >Surely there's a better way than converting back and forth 3 times?
I didn't check whether this was faster, although I rather suspect it is not: cvt = lambda x: ord(x)<0x80 and x or '\\x'+hex(ord(x)) def decode_with_backslashreplace(s): return ''.join(map(cvt,s)) -- - Tim Roberts, [EMAIL PROTECTED] Providenza & Boekelheide, Inc. -- http://mail.python.org/mailman/listinfo/python-list