Am 02.10.2011 08:29, schrieb Xueming Shen:
http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf

Go to 3.9 Unicode Encoding Forms. Or simply search D93

On 10/1/2011 2:21 PM, Ulf Zibis wrote:
Am 30.09.2011 22:46, schrieb Xueming Shen:
On 09/30/2011 07:09 AM, Ulf Zibis wrote:

(1) new byte[]{(byte)0xE1, (byte)0x80, (byte)0x42} ---> 
CoderResult.malformedForLength(1)
It appears the Unicode Standard now explicitly recommends to return the 
malformed length 2,
what UTF-8 is doing now, for this scenario
My idea behind was, that in case of malformed length 1 a consecutive call to the decode loop would again return another malformed length 1, to ensure 2 replacement chars in the output string. (Not sure, if that is expected in this corner case.)

Unicode Standard's "best practices" D93a/b recommends to return 2 in this case.
OK, I got it:
E1 80 42 --> malformed length 2 --> 1 replacement --> FFFD 0042

Because for later understanding by others it could be difficult to find the right documents, it would be *very nice* to add this link to the souce code of UTF_8.java, by javadoc, or by simple doc.

-Ulf

Reply via email to