On 6/26/18, Thomas Wolff wrote: > This encoding scheme is wrong; where did you get it from? Maybe it's the > obsolete UTF-8...
http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt I thought I saw something about utf-8 being able to handle a 31 bit value.. is that also obsolete/wrong? how about this for the current encoding scheme: http://www.unicode.org/versions/Unicode11.0.0/ch03.pdf Table 3-6. UTF-8 Bit Distribution Bits Scalar Value First Byte Second Byte Third Byte Fourth Byte 7 00000000 0xxxxxxx 0xxxxxxx 11 00000yyy yyxxxxxx 110yyyyy 10xxxxxx 16 zzzzyyyy yyxxxxxx 1110zzzz 10yyyyyy 10xxxxxx 21 000uuuuu zzzzyyyy yyxxxxxx 11110uuu 10uuzzzz 10yyyyyy 10xxxxxx Lee -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple