Re: [Patch] optimize utf8_to_ucs4

Andre Poenitz Sat, 11 Nov 2006 01:18:00 -0800

On Mon, Oct 30, 2006 at 08:49:36PM +0100, Joost Verburg wrote:
> Georg Baum wrote:
> >OK, so it is like that: Up to 4 bytes per code point are used for the 
> >currently defined 21 bits of UCS4, but UTF8 is designed in such a way that 
> >it is possible to encode all 36 bits of UCS4 with at most 6 bytes per code 
> >point.
> 
> Not really. Some years ago there was not yet a real limit in the Unicode 
> specification for the number of code points (the theoretical limit was 
> 2^31 if I remember correctly).
> 
> However, the limit has now been set to 2^20+2^16 code points. There is 
> still a lot of space available, but there will _never_ be any more code 
> points than 2^20+2^16 (also not in UCS-4!).


Roughly a million characters does not sound as overly excessive when
there are languages using several thousands of them already...

Andre'

Re: [Patch] optimize utf8_to_ucs4

Reply via email to