On Mon, Oct 30, 2006 at 08:49:36PM +0100, Joost Verburg wrote: > Georg Baum wrote: > >OK, so it is like that: Up to 4 bytes per code point are used for the > >currently defined 21 bits of UCS4, but UTF8 is designed in such a way that > >it is possible to encode all 36 bits of UCS4 with at most 6 bytes per code > >point. > > Not really. Some years ago there was not yet a real limit in the Unicode > specification for the number of code points (the theoretical limit was > 2^31 if I remember correctly). > > However, the limit has now been set to 2^20+2^16 code points. There is > still a lot of space available, but there will _never_ be any more code > points than 2^20+2^16 (also not in UCS-4!).
Roughly a million characters does not sound as overly excessive when there are languages using several thousands of them already... Andre'