At 1:50 PM +0000 12/16/02, Nicholas Clark wrote:
On Mon, Dec 16, 2002 at 01:07:36PM +0000, mcharity @ vendian. org wrote:

This question is actually  independent of the patch (which looks good)

simply returns the C<INTVAL> it is passed; C<string_utf8_max_bytes>, on the
other hand, returns three times the value that it is passed because a
UTF8 character may occupy up to three bytes.
Should that really be the number 3? I thought that the UTF8 representation of
code points outside the base Unicode plane could get longer than that.
I think it should be at least 4, potentially 6. Looks like the Unicode consortium's given up and admitted that they're going to potentially use the entire 32-bit space at some point. (Or so my recent run-through of their online stuff seemed to indicate)
--
Dan

--------------------------------------"it's like this"-------------------
Dan Sugalski even samurai
[EMAIL PROTECTED] have teddy bears and even
teddy bears get drunk

Reply via email to