Re: unicode question

Tim Roberts Sat, 25 Feb 2006 00:05:41 -0800

Edward Loper <[EMAIL PROTECTED]> wrote:

>I would like to convert an 8-bit string (i.e., a str) into unicode,
>treating chars \x00-\x7f as ascii, and converting any chars \x80-xff
>into a backslashed escape sequences.  I.e., I want something like this:
>
> >>> decode_with_backslashreplace('abc \xff\xe8 def')
>u'abc \\xff\\xe8 def'
>
>The best I could come up with was:
>
>   def decode_with_backslashreplace(s):
>       "str -> unicode"
>       return (s.decode('latin1')
>                .encode('ascii', 'backslashreplace')
>                .decode('ascii'))
>
>Surely there's a better way than converting back and forth 3 times?


I didn't check whether this was faster, although I rather suspect it is
not:

  cvt = lambda x: ord(x)<0x80 and x or '\\x'+hex(ord(x))
  def decode_with_backslashreplace(s):
      return ''.join(map(cvt,s))
-- 
- Tim Roberts, [EMAIL PROTECTED]
  Providenza & Boekelheide, Inc.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: unicode question

Reply via email to