unicode question

Edward Loper Fri, 24 Feb 2006 19:31:28 -0800

I would like to convert an 8-bit string (i.e., a str) into unicode,
treating chars \x00-\x7f as ascii, and converting any chars \x80-xff
into a backslashed escape sequences.  I.e., I want something like this:


 >>> decode_with_backslashreplace('abc \xff\xe8 def')
u'abc \\xff\\xe8 def'

The best I could come up with was:

   def decode_with_backslashreplace(s):
       "str -> unicode"
       return (s.decode('latin1')
                .encode('ascii', 'backslashreplace')
                .decode('ascii'))

Surely there's a better way than converting back and forth 3 times?  Is
there a reason that the 'backslashreplace' error mode can't be used with 
codecs.decode?

 >>> 'abc \xff\xe8 def'.decode('ascii', 'backslashreplace')
Traceback (most recent call last):
   File "<stdin>", line 1, in ?
TypeError: don't know how to handle UnicodeDecodeError in error callback

-Edward

-- 
http://mail.python.org/mailman/listinfo/python-list

unicode question

Reply via email to