Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

Ulf Zibis Sun, 02 Oct 2011 02:53:38 -0700

Am 02.10.2011 08:29, schrieb Xueming Shen:

http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf
Go to 3.9 Unicode Encoding Forms. Or simply search D93

On 10/1/2011 2:21 PM, Ulf Zibis wrote:
Am 30.09.2011 22:46, schrieb Xueming Shen:
On 09/30/2011 07:09 AM, Ulf Zibis wrote:
(1) new byte[]{(byte)0xE1, (byte)0x80, (byte)0x42} ---> 
CoderResult.malformedForLength(1)
It appears the Unicode Standard now explicitly recommends to return the 
malformed length 2,
what UTF-8 is doing now, for this scenario
My idea behind was, that in case of malformed length 1 a consecutive call to the decode loopwould again return another malformed length 1, to ensure 2 replacement chars in the outputstring. (Not sure, if that is expected in this corner case.)
Unicode Standard's "best practices" D93a/b recommends to return 2 in this case.

OK, I got it:
E1 80 42 --> malformed length 2 --> 1 replacement --> FFFD 0042

Because for later understanding by others it could be difficult to find the right documents, itwould be *very nice* to add this link to the souce code of UTF_8.java, by javadoc, or by simple doc.


-Ulf

Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

Reply via email to