On 09/30/2011 07:09 AM, Ulf Zibis wrote:


(1) new byte[]{(byte)0xE1, (byte)0x80, (byte)0x42} ---> CoderResult.malformedForLength(1) It appears the Unicode Standard now explicitly recommends to return the malformed length 2,
what UTF-8 is doing now, for this scenario
My idea behind is, that in case of malformed length 1 a consecutive call to the decode loop would again return another malformed length 1, to ensure 2 replacement chars in the output string. (Not sure, if that is expected in this corner case.)

Unicode Standard's "best practices" D93a/b recommends to return 2 in this case.


3. Consider additionally 6795537 - UTF_8$Decoder returns wrong results <http://bugs.sun.com/bugdatabase/view_bug.do?bug_id=6795537>


I'm not sure I understand the suggested b1 < -0x3e patch, I don't see we can simply replace
((b1 >> 5) == -2) with (b1 < -0x3e).
You must see the b1 < -0x3e in combination with the following b1 < -0x20. ;-)

But now I have a better "if...else if" switch. :-)
- saves the shift operations
- only 1 comparison per case
- only 1 constant to load per case
- helps compiler to benefit from 1 byte constants and op-codes
- much better readable

I believe we changed from (b1 < xyz) to (b1 >> x) == -2 back to 2009(?) because the benchmark shows the "shift" version is slightly faster. Do you have any number shows any difference now. My non-scientific benchmark still suggests the "shift"
type is faster on -server vm, no significant difference on -client vm.

  ------------------  your new switch---------------
(1) -server
Method                      Millis  Ratio
Decoding 1b UTF-8 :            125  1.000
Decoding 2b UTF-8 :           2558 20.443
Decoding 3b UTF-8 :           3439 27.481
Decoding 4b UTF-8 :           2030 16.221
(2) -client
Decoding 1b UTF-8 :            335  1.000
Decoding 2b UTF-8 :           1041  3.105
Decoding 3b UTF-8 :           2245  6.694
Decoding 4b UTF-8 :           1254  3.741

  ------------------ existing "shift"---------------
(1) -server
Decoding 1b UTF-8 :            134  1.000
Decoding 2b UTF-8 :           1891 14.106
Decoding 3b UTF-8 :           2934 21.886
Decoding 4b UTF-8 :           2133 15.913
(2) -client
Decoding 1b UTF-8 :            341  1.000
Decoding 2b UTF-8 :            949  2.560
Decoding 3b UTF-8 :           2321  6.255
Decoding 4b UTF-8 :           1278  3.446



-sherman

Reply via email to