I was reading up on conversion of characters to UTF-8 and I now understand why it is writing out UTF-8 (to be able to support most of the worlds languages with minimal space?). But after reading up on the algorithms for conversion as given below, does the writeChars method not support the U+10000→U+10FFFF conversions or am I misreading the code?
Character Range Bit Encoding U+0000→U+007F 0xxxxxxx U+0080→U+07FF 110xxxxx 10xxxxxx U+0800→U+FFFF 1110xxxx 10xxxxxx 10xxxxxx U+10000→U+10FFFF 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx public void writeChars(String s, int start, int length) throws IOException { final int end = start + length; for (int i = start; i < end; i++) { final int code = (int)s.charAt(i); if (code >= 0x01 && code <= 0x7F) writeByte((byte)code); else if (((code >= 0x80) && (code <= 0x7FF)) || code == 0) { writeByte((byte)(0xC0 | (code >> 6))); writeByte((byte)(0x80 | (code & 0x3F))); } else { writeByte((byte)(0xE0 | (code >>> 12))); writeByte((byte)(0x80 | ((code >> 6) & 0x3F))); writeByte((byte)(0x80 | (code & 0x3F))); } } }