>willie wrote: >> >> Thanks for the thorough explanation. One last question >> about terminology then I'll go away :) >> What is the proper way to describe "ustr" below?
>> >>> ustr = buf.decode('UTF-8') >> >>> type(ustr) >> <type 'unicode'> >> Is it a "unicode object that contains a UTF-8 encoded >> string object?" John Machin: >No. It is a Python unicode object, period. > >1. If it did contain another object you would be (quite justifiably) >screaming your peripherals off about the waste of memory :-) >2. You don't need to concern yourself with the internals of a unicode >object; however rest assured that it is *not* stored as UTF-8 -- so if >you are hoping for a quick "number of utf 8 bytes without actually >producing a str object" method, you are out of luck. > >Consider this example: you have a str object which contains some >Russian text, encoded in cp1251. > >str1 = russian_text >unicode1 = str1.decode('cp1251') >str2 = unicode1.encode('utf-8') >unicode2 = str2.decode('utf-8') >Then unicode2 == unicode1, repr(unicode2) == repr(unicode1), there is >no way (without the above history) of determining how it was created -- >and you don't need to care how it was created. Gabriel Genellina: >ustr is an unicode object. Period. An unicode object contains >characters (not bytes). >buf, apparently, is a string - a string of bytes. Those bytes >apparently represent some unicode characters encoded using the UTF-8 >encoding. So, you can decode them -using the decode() method- to get >the unicode object. > >Very roughly, the difference is like that of an integer and its >representations: >w = 1 >x = 0x0001 >y = 001 >z = struct.unpack('>h','\x00\x01') >All three objects are the *same* integer, 1. >There is no way of knowing *how* the integer was spelled, i.e., from >which representation it comes from - like the unicode object, it has >no "encoding" by itself. >You can go back and forth between an integer number and its decimal >representation - like astring.decode() and ustring.encode() I finally understand, much appreciated. -- http://mail.python.org/mailman/listinfo/python-list