Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

2011-11-07 Thread Ulf Zibis
I have thought that and it was easy to see, as it glowed in blue in Sdiffs. But I anyway wondered about the combination of protected with final, or is there some miracle I don't know? And the diff line is still glowing ... You can drop final as for class Decoder ;-) -Ulf Am 07.11.2011 22:28

Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

2011-11-07 Thread Xueming Shen
Hi Ulf, Good catch. It was the leftover of my first drift, in which CESU.de/encoder is the subclass of the UTF8 de/encoder. webrev has been updated accordingly. -Sherman On 11/07/2011 12:57 PM, Ulf Zibis wrote: Hi Sherman, 2 nits again: protected static final class Encoder extends CharsetE

Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

2011-11-07 Thread Ulf Zibis
Hi Sherman, 2 nits again: protected static final class Encoder extends CharsetEncoder - If final, protected doesn't make sense, as it can't be subclassed. (ideally this should be a compiler error) - Where is it used? Otherwise it could be private. -Ulf Am 07.11.2011 21:30, schrieb Xueming Sh

Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

2011-11-07 Thread Xueming Shen
Thanks Alan, The webrev has been updated accordingly to address your comments (the commented out isMalformed2 has been removed, copyright updated, bugid added...) And thanks for Ulf for the detailed review. -Sherman On 10/28/2011 09:14 AM, Alan Bateman wrote: On 28/09/2011 20:18, Xueming Shen

Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

2011-10-28 Thread Alan Bateman
On 28/09/2011 20:18, Xueming Shen wrote: Hi, [I combined the proposed charge for #7082884, in which no one appears to be interested:-) into this one] : http://cr.openjdk.java.net/~sherman/7096080/webrev/ I don't know if you are still lo

Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

2011-10-14 Thread Ulf Zibis
Am 14.10.2011 10:47, schrieb Ulf Zibis: My new guess for the reason: The unfolding of the bytes to int to serve the isNotContinuation / isMalformedxx methods. So those methods should be coded in byte logic too. + use the "bx <= (byte)abc" logic instead "shift" or "(bx & abc) != def". -Ulf

Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

2011-10-14 Thread Ulf Zibis
Am 30.09.2011 22:46, schrieb Xueming Shen: I believe we changed from (b1 < xyz) to (b1 >> x) == -2 back to 2009(?) because the benchmark shows the "shift" version is slightly faster. Do you have any number shows any difference now. My non-scientific benchmark still suggests the "shift" type is f

Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

2011-10-14 Thread Ulf Zibis
Am 13.10.2011 21:13, schrieb Xueming Shen: On 10/13/2011 09:55 AM, Ulf Zibis wrote: Am 11.10.2011 19:49, schrieb Xueming Shen: I don't know which one is better, I did a run on private static boolean op1(int b) { return (b >> 6) != -2; } private static boolean op2(int b) {

Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

2011-10-13 Thread Xueming Shen
On 10/13/2011 09:55 AM, Ulf Zibis wrote: Am 11.10.2011 19:49, schrieb Xueming Shen: I don't know which one is better, I did a run on private static boolean op1(int b) { return (b >> 6) != -2; } private static boolean op2(int b) { return (b & 0xc0) != 0x80; }

Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

2011-10-13 Thread Ulf Zibis
Am 11.10.2011 19:49, schrieb Xueming Shen: I don't know which one is better, I did a run on private static boolean op1(int b) { return (b >> 6) != -2; } private static boolean op2(int b) { return (b & 0xc0) != 0x80; } private static boolean op3(byte b) {

Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

2011-10-11 Thread Xueming Shen
On 10/11/2011 04:36 AM, Ulf Zibis wrote: Am 30.09.2011 22:46, schrieb Xueming Shen: I believe we changed from (b1 < xyz) to (b1 >> x) == -2 back to 2009(?) because the benchmark shows the "shift" version is slightly faster. Do you have any number shows any difference now. My non-scientific be

Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

2011-10-11 Thread Ulf Zibis
Am 11.10.2011 13:36, schrieb Ulf Zibis: I believe we changed from (b1 < xyz) to (b1 >> x) == -2 back to 2009(?) because the benchmark shows the "shift" version is slightly faster. Do you have any number shows any difference now. My non-scientific benchmark still suggests the "shift" type is fa

Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

2011-10-11 Thread Ulf Zibis
Hi Sherman, I didn't read anything from you since longer time. You are in holidays? Am 30.09.2011 22:46, schrieb Xueming Shen: I believe we changed from (b1 < xyz) to (b1 >> x) == -2 back to 2009(?) because the benchmark shows the "shift" version is slightly faster. Do you have any number sho

Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

2011-10-02 Thread Ulf Zibis
Hi again, Am 30.09.2011 00:27, schrieb Xueming Shen: On 09/29/2011 02:16 PM, Ulf Zibis wrote: 280 if (Character.isSurrogate(c)) 281 return malformedForLength(src, sp, dst, dp, 3); Shouldn't we return cr.length() = 1to allow remaining 2 bytes to be

Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

2011-10-02 Thread Ulf Zibis
Am 02.10.2011 08:29, schrieb Xueming Shen: http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf Go to 3.9 Unicode Encoding Forms. Or simply search D93 On 10/1/2011 2:21 PM, Ulf Zibis wrote: Am 30.09.2011 22:46, schrieb Xueming Shen: On 09/30/2011 07:09 AM, Ulf Zibis wrote: (1) new byte[]{(

Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

2011-10-01 Thread Xueming Shen
http://www.unicode.org/versions/Unicode6.0.0/ch03.pdf Go to 3.9 Unicode Encoding Forms. Or simply search D93 On 10/1/2011 2:21 PM, Ulf Zibis wrote: Am 30.09.2011 22:46, schrieb Xueming Shen: On 09/30/2011 07:09 AM, Ulf Zibis wrote: (1) new byte[]{(byte)0xE1, (byte)0x80, (byte)0x42} ---> Cod

Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

2011-10-01 Thread Ulf Zibis
Am 30.09.2011 22:46, schrieb Xueming Shen: On 09/30/2011 07:09 AM, Ulf Zibis wrote: (1) new byte[]{(byte)0xE1, (byte)0x80, (byte)0x42} ---> CoderResult.malformedForLength(1) It appears the Unicode Standard now explicitly recommends to return the malformed length 2, what UTF-8 is doing now, fo

Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

2011-09-30 Thread Xueming Shen
On 09/30/2011 07:09 AM, Ulf Zibis wrote: (1) new byte[]{(byte)0xE1, (byte)0x80, (byte)0x42} ---> CoderResult.malformedForLength(1) It appears the Unicode Standard now explicitly recommends to return the malformed length 2, what UTF-8 is doing now, for this scenario My idea behind is, that i

Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

2011-09-30 Thread Ulf Zibis
Hi, Am 29.09.2011 05:27, schrieb Xueming Shen: Hi, On 9/28/2011 3:44 PM, Ulf Zibis wrote 3. Consider additionally 6795537 - UTF_8$Decoder returns wrong results (1) new byte[]{(byte)0xE1, (byte)0x80, (byte)0x42} ---> CoderResult.

Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

2011-09-29 Thread Xueming Shen
On 09/29/2011 02:16 PM, Ulf Zibis wrote: Please use spaces with ternary operators: Lines 155, 216 For short you could use sr instead srcRemaining, consistent to sa, sp, sl. 420 // returns -1 if there is malformed byte(s) and the better: 420 // returns -1 if there is/are malfor

Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

2011-09-28 Thread Xueming Shen
Hi, On 9/28/2011 3:44 PM, Ulf Zibis wrote: Hi Sherman, 1. bug 7096080 is not visible at http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7096080 It might take couple days for it to show up on bugs.sun.com. But it has exactly the same content as my previous email. In fact I simply copy/pa

Re: Codereview request for 7096080: UTF8 update and new CESU-8 charset

2011-09-28 Thread Ulf Zibis
Hi Sherman, 1. bug 7096080 is not visible at http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=7096080 2. bug 7096080 seems to be a duplicate of 6798514 - Charset UTF-8 accepts CESU-8 codings which was closed. It should be reopened

Codereview request for 7096080: UTF8 update and new CESU-8 charset

2011-09-28 Thread Xueming Shen
Hi, [I combined the proposed charge for #7082884, in which no one appears to be interested:-) into this one] Unicode Standard added "Addition Constraints on conversion of ill-formed UTF-8" in version 5.1 [1] and updated in 6.0 again with further "clarification" [2] regarding how a "conformanc