A few questiosn about encoding

Νικόλαος Κούρας Sun, 09 Jun 2013 03:48:57 -0700

A few questiosn about encoding please:

>> Since 1 byte can hold up to 256 chars, why not utf-8 use 1-byte for 
>> values up to 256?


>Because then how do you tell when you need one byte, and when you need 
>two? If you read two bytes, and see 0x4C 0xFA, does that mean two 
>characters, with ordinal values 0x4C and 0xFA, or one character with 
>ordinal value 0x4CFA? 

I mean utf-8 could use 1 byte for storing the 1st 256 characters. I meant up to 
256, not above 256. 


>> UTF-8 and UTF-16 and UTF-32 
>> I though the number beside of UTF- was to declare how many bits the 
>> character set was using to store a character into the hdd, no? 

>Not exactly, but close. UTF-32 is completely 32-bit (4 byte) values. 
>UTF-16 mostly uses 16-bit values, but sometimes it combines two 16-bit 
>values to make a surrogate pair. 

A surrogate pair is like itting for example Ctrl-A, which means is a 
combination character that consists of 2 different characters? 
Is this what a surrogate is? a pari of 2 chars? 


>UTF-8 uses 8-bit values, but sometimes 
>it combines two, three or four of them to represent a single code-point. 

'a' to be utf8 encoded needs 1 byte to be stored ? (since ordinal = 65) 
'α΄' to be utf8 encoded needs 2 bytes to be stored ? (since ordinal is > 127 ) 
'a chinese ideogramm' to be utf8 encoded needs 4 byte to be stored ? (since 
ordinal >  65000 ) 

The amount of bytes needed to store a character solely depends on the 
character's ordinal value in the Unicode table?
-- 
http://mail.python.org/mailman/listinfo/python-list

A few questiosn about encoding

Reply via email to