Τη Κυριακή, 9 Ιουνίου 2013 12:12:36 μ.μ. UTC+3, ο χρήστης Cameron Simpson έγραψε: > On 09Jun2013 02:00, =?utf-8?B?zp3Or866zr/PgiDOk866z4EzM866?= > <nikos.gr...@gmail.com> wrote: > > | Steven wrote: > > | >> Since 1 byte can hold up to 256 chars, why not utf-8 use 1-byte for > > | >> values up to 256? > > | > > | >Because then how do you tell when you need one byte, and when you need > > | >two? If you read two bytes, and see 0x4C 0xFA, does that mean two > > | >characters, with ordinal values 0x4C and 0xFA, or one character with > > | >ordinal value 0x4CFA? > > | > > | I mean utf-8 could use 1 byte for storing the 1st 256 characters. I meant > up to 256, not above 256. > > > > Then it would not be UTF-8. UTF-8 will encode an Unicode codepoint. Your > >suggestion will not.
I dont follow. > | >> UTF-8 and UTF-16 and UTF-32 > > | >> I though the number beside of UTF- was to declare how many bits the > > | >> character set was using to store a character into the hdd, no? > > | > > | >Not exactly, but close. UTF-32 is completely 32-bit (4 byte) values. > > | >UTF-16 mostly uses 16-bit values, but sometimes it combines two 16-bit > > | >values to make a surrogate pair. > > | > > | A surrogate pair is like itting for example Ctrl-A, which means is a > combination character that consists of 2 different characters? > > | Is this what a surrogate is? a pari of 2 chars? > > > > Essentially. The combination represents a code point. > > > > | >UTF-8 uses 8-bit values, but sometimes > > | >it combines two, three or four of them to represent a single code-point. > > | > > | 'a' to be utf8 encoded needs 1 byte to be stored ? (since ordinal = 65) > > | 'α΄' to be utf8 encoded needs 2 bytes to be stored ? (since ordinal is > > 127 ) > > | 'a chinese ideogramm' to be utf8 encoded needs 4 byte to be stored ? (since > ordinal > 65000 ) > > | > > | The amount of bytes needed to store a character solely depends on the > character's ordinal value in the Unicode table? > > > > Essentially. You can read up on the exact process in Wikipedia or the Unicode > Standard. When you say essentially means you agree with my statements? -- http://mail.python.org/mailman/listinfo/python-list