On 10/01/2012 06:05 PM, Dennis E. Hamilton wrote:
Regarding the mention that the latest Java VM is using UTF8 internally instead 
of unsigned short arrays is rather daunting.  There is an easy way to test it 
-- see if char values that are not admissible UTF16 codes can be used in 
construction of a string and then extracted correctly.  If they can, there is 
no way that transformation to and from UTF8 occurred.  If they can't, it is an 
interesting breaking change in Java.  With regard to string literals, it would 
be interesting to see what can be introduced into those via escape codes too.

Note that the JVM traditionally also makes use of a modified form of UTF-8 (encoding surrogate code points individually, and encoding \u0000 as 0xC0 0x80), see the JNI spec.

Stephan
_______________________________________________
LibreOffice mailing list
LibreOffice@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/libreoffice

Reply via email to