On Dec 2, 2:33 am, Duncan Booth <[EMAIL PROTECTED]> wrote:
> slomo <[EMAIL PROTECTED]> wrote:
> >>>> print line
> > \u0050\u0079\u0074\u0068\u006f\u006e
>
> > But I want to get a string:
>
> > "\u0050\u0079\u0074\u0068\u006f\u006e"
>
> > How do you make it?
>
> line.decode('unicode-escape')

Amazing what you can find in obscure corners of the obscure docs! BTW,
how many folks know what "bijective" means ?

Hmmm ... the encode is documented as "Produce a string that is
suitable as Unicode literal in Python source code", but it *isn't*
suitable. A Unicode literal is u'blah', this gives just blah. Worse,
it leaves the caller to nut out how to escape apostrophes and quotes:

>>> test = u'Python\'\'\'\'\"\"\"\"\u1234\n'
>>> print repr(test)
u'Python\'\'\'\'""""\u1234\n'
>>> print test.encode('unicode-escape')
Python''''""""\u1234\n
>>>

Why would someone bother writing this codec when repr() does the job
properly?

Anyhow, here's a solution to the OP's stated problem from first
principles using basic building blocks:

>>> line = '\\u0050\\u0079\\u0074\\u0068\\u006f\\u006e\n'
>>> u''.join(unichr(int(x, 16)) for x in line.split(r'\u') if x and x != '\n') 
>>> + u'\n'
u'Python\n'
>>>
-- 
http://mail.python.org/mailman/listinfo/python-list

Reply via email to