Am Montag, 30. Oktober 2006 19:45 schrieb Enrico Forestieri: > On Mon, Oct 30, 2006 at 06:02:18PM +0100, Georg Baum wrote:
> > I read somewhere that the highest possible number of bytes for a single > > character in utf8 is 6, but I forgot where. Abdel reported the same, and > > now I am unsure, because wikipedia says 4. Does anybody know what is > > correct? > > Maybe you got your info from here: > http://www.cl.cam.ac.uk/~mgk25/unicode.html Probably. At least I know this page. > Indeed, if only 21 bits are used, 4 bytes should suffice. Eee the table > a little down here: > http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8 OK, so it is like that: Up to 4 bytes per code point are used for the currently defined 21 bits of UCS4, but UTF8 is designed in such a way that it is possible to encode all 36 bits of UCS4 with at most 6 bytes per code point. Everything is OK again :-) Georg