Can I get the 8bit-string representation of any unicode string
Hello, everyone. I have a problem when I'm processing unicode strings. Is it possible to get the 8bit-string representation of any unicode string? Suppose I get a unicode string: a = u'\xc8\xce\xcf\xcd\xc6\xeb'; then, by a.encode('latin-1'); I can get the 8bit-string representation of it, that is, the physical storage format of this string. But for another kind of unicode string, say: b = u'\u4efb\u8d24\u9f50'; I have to: b.encode('utf-8') to get the 8bit-string format of it. Since these unicode strings are given by an external library function, I don't know which kind a unicode string belongs to before I get it at runtime. So, I wonder if there is a unified way to get the 8bit-string representation, say, byte-by-byte, of any unicode string? Thank you very much. -- http://mail.python.org/mailman/listinfo/python-list
Re: Can I get the 8bit-string representation of any unicode string
Thank you all for your replies :-) I may misunderstood it. I will think about it carefully. By the way, does python has a interface, just like iconv in libc for C/C++? Or, how can I convert a string from a encoding into another one? Thank you so much. -- http://mail.python.org/mailman/listinfo/python-list
Re: Can I get the 8bit-string representation of any unicode string
Hi, I see. Thank you for your help! Regards, hongzheng Fredrik Lundh wrote: > [EMAIL PROTECTED] wrote > > > I may misunderstood it. I will think about it carefully. > > > > By the way, does python has a interface, just like iconv in libc for > > C/C++? Or, how can I convert a string from a encoding into another > > one? > > if b is an 8-bit string containing an encoded unicode string, > > u = b.decode(encoding) > > or > > u = unicode(b, encoding) > > gives you a unicode string. to encode the unicode string back to another > byte string, use the encode method. > > b = u.encode(encoding) > > -- http://mail.python.org/mailman/listinfo/python-list