Re: [Patch] optimize utf8_to_ucs4

Enrico Forestieri Mon, 30 Oct 2006 10:42:20 -0800

On Mon, Oct 30, 2006 at 06:02:18PM +0100, Georg Baum wrote:

> Joost Verburg wrote:
> 
> > Georg Baum wrote:
> >> For ucs4 -> utf8 we would have to use a result string with a length of 6
> >> times the input length, with the average length close to the inpurt
> >> length if we want to be able to convert everything. That is probably too
> >> much to be efficient.
> > 
> > ucs4 uses 4 bytes per character and utf8 1-4 bytes. I don't understand
> > where you get this number from.
> 
> I read somewhere that the highest possible number of bytes for a single
> character in utf8 is 6, but I forgot where. Abdel reported the same, and
> now I am unsure, because wikipedia says 4. Does anybody know what is
> correct?


Maybe you got your info from here:
http://www.cl.cam.ac.uk/~mgk25/unicode.html

Indeed, if only 21 bits are used, 4 bytes should suffice. Eee the table
a little down here:
http://www.cl.cam.ac.uk/~mgk25/unicode.html#utf-8

-- 
Enrico

Re: [Patch] optimize utf8_to_ucs4

Reply via email to