On Dec 2, 2:33 am, Duncan Booth <[EMAIL PROTECTED]> wrote: > slomo <[EMAIL PROTECTED]> wrote: > >>>> print line > > \u0050\u0079\u0074\u0068\u006f\u006e > > > But I want to get a string: > > > "\u0050\u0079\u0074\u0068\u006f\u006e" > > > How do you make it? > > line.decode('unicode-escape')
Amazing what you can find in obscure corners of the obscure docs! BTW, how many folks know what "bijective" means ? Hmmm ... the encode is documented as "Produce a string that is suitable as Unicode literal in Python source code", but it *isn't* suitable. A Unicode literal is u'blah', this gives just blah. Worse, it leaves the caller to nut out how to escape apostrophes and quotes: >>> test = u'Python\'\'\'\'\"\"\"\"\u1234\n' >>> print repr(test) u'Python\'\'\'\'""""\u1234\n' >>> print test.encode('unicode-escape') Python''''""""\u1234\n >>> Why would someone bother writing this codec when repr() does the job properly? Anyhow, here's a solution to the OP's stated problem from first principles using basic building blocks: >>> line = '\\u0050\\u0079\\u0074\\u0068\\u006f\\u006e\n' >>> u''.join(unichr(int(x, 16)) for x in line.split(r'\u') if x and x != '\n') >>> + u'\n' u'Python\n' >>> -- http://mail.python.org/mailman/listinfo/python-list