Hi again,
Am 30.09.2011 00:27, schrieb Xueming Shen:
On 09/29/2011 02:16 PM, Ulf Zibis wrote:
280 if (Character.isSurrogate(c))
281 return malformedForLength(src, sp, dst, dp, 3);
Shouldn't we return cr.length() = 1to allow remaining 2 bytes to be interpreted
again ?
Forget it! If c is a surrogate, b2 is in range A0..BF and b3 is in range 80..BF. Both can not be
potentially well-formed as a first byte.
Actually I don't know the answer. My reading of D93a/D93b suggests that we might
interpret it as a whole, given the bytes are actually in well-formed byte
pattern range
listed in Table 3.7, but "ill-formed" simply because they are surrogate value
not scale
value, so I would interpret the whole 3 bytes as a maximal subpart. Given
D93a/b is
"best practices for Using U+fffd", either way should be fine. We do have
Unicode expert
on the list, so maybe they can share their opinion on what is the
"desired"/recommended
behavior in this case, from Standard point view?
At line 102 you could insert:
// [E0] [A0..BF]
// [E1..EF] [80..BF]
-Ulf