Awesome, that works. Thank you so much! My confusion of the different format made this harder than it should.
On Thu, Mar 6, 2008 at 4:53 PM, Gabriel Genellina <[EMAIL PROTECTED]> wrote: > En Thu, 06 Mar 2008 22:43:58 -0200, Henry Chang <[EMAIL PROTECTED]> > escribi�: > > > > Suppose I start out with a raw string of utf-8 code points. > > "utf-8 code points"??? > Looks like a utf-8 encoded string, and then written in hex format. > > > > raw_string = "68656E727963" > > > > I can coerce it into proper unicode format by slicing out two > > characters at a time. > > > > unicode_string = u"\x68\x65\x6E\x72\x79\x63" > > > > >>> print unicode_proper > > >>> henry > > > > My question: is there an existing function that can do this (without > > having to manually slicing the raw text string)? > > Two steps: first decode from hex to string, and then from utf8 string to > unicode: > > py> raw_string = "68656E727963" > py> raw_string.decode("hex") > 'henryc' > py> raw_string.decode("hex").decode("utf8") > u'henryc' > > -- > Gabriel Genellina > > -- > http://mail.python.org/mailman/listinfo/python-list -- http://mail.python.org/mailman/listinfo/python-list