Am Montag, 30. Oktober 2006 19:45 schrieb Enrico Forestieri:
> On Mon, Oct 30, 2006 at 06:02:18PM +0100, Georg Baum wrote:

> > I read somewhere that the highest possible number of bytes for a single
> > character in utf8 is 6, but I forgot where. Abdel reported the same, 
and
> > now I am unsure, because wikipedia says 4. Does anybody know what is
> > correct?
> 
> Maybe you got your info from here:
> http://www.cl.cam.ac.uk/~mgk25/unicode.html

Probably. At least I know this page.

> Indeed, if only 21 bits are used, 4 bytes should suffice. Eee the table
> a little down here:
> http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8

OK, so it is like that: Up to 4 bytes per code point are used for the 
currently defined 21 bits of UCS4, but UTF8 is designed in such a way that 
it is possible to encode all 36 bits of UCS4 with at most 6 bytes per code 
point.
Everything is OK again :-)


Georg

Reply via email to