Am 02.10.2011 08:29, schrieb Xueming Shen:
http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf
Go to 3.9 Unicode Encoding Forms. Or simply search D93
On 10/1/2011 2:21 PM, Ulf Zibis wrote:
Am 30.09.2011 22:46, schrieb Xueming Shen:
On 09/30/2011 07:09 AM, Ulf Zibis wrote:
(1) new byte[]{(byte)0xE1, (byte)0x80, (byte)0x42} --->
CoderResult.malformedForLength(1)
It appears the Unicode Standard now explicitly recommends to return the
malformed length 2,
what UTF-8 is doing now, for this scenario
My idea behind was, that in case of malformed length 1 a consecutive call to the decode loop
would again return another malformed length 1, to ensure 2 replacement chars in the output
string. (Not sure, if that is expected in this corner case.)
Unicode Standard's "best practices" D93a/b recommends to return 2 in this case.
OK, I got it:
E1 80 42 --> malformed length 2 --> 1 replacement --> FFFD 0042
Because for later understanding by others it could be difficult to find the right documents, it
would be *very nice* to add this link to the souce code of UTF_8.java, by javadoc, or by simple doc.
-Ulf