On 6/24/18, L A Walsh <cyg...@tlinx.org> wrote: > Lee wrote: >> So... keep it simple, set >> LANG=en_US.UTF-8 >> and use vi or something else that comes with cygwin to create the file >> and I'll have a file with UTF-8 character encoding - correct? > --- > The first 127 characters of UTF-8 are identical to the > first 127 characters of ASCII, and latin1 and iso-8859-1. > > If you don't use any characters that need accents or special symbols, > then nothing will be encoded in UTF-8, because its only > the characters OVER the first 127 > (see chart @ http://www.babelstone.co.uk/Unicode/babelmap.html).
I'm still trying to figure utf-8 out, but it seems to me that 0x0 - 0xff is part of the utf-8 encoding. This chart makes things clearer ... at least for me :) http://www.cl.cam.ac.uk/~mgk25/ucs/utf-8-history.txt The proposed UCS transformation format encodes UCS values in the range [0,0x7fffffff] using multibyte characters of lengths 1, 2, 3, 4, and 5 bytes. For all encodings of more than one byte, the initial byte determines the number of bytes used and the high-order bit in each byte is set. An easy way to remember this transformation format is to note that the number of high-order 1's in the first byte is the same as the number of subsequent bytes in the multibyte character: Bits Hex Min Hex Max Byte Sequence in Binary 1 7 00000000 0000007f 0zzzzzzz 2 13 00000080 0000207f 10zzzzzz 1yyyyyyy 3 19 00002080 0008207f 110zzzzz 1yyyyyyy 1xxxxxxx 4 25 00082080 0208207f 1110zzzz 1yyyyyyy 1xxxxxxx 1wwwwwww 5 31 02082080 7fffffff 11110zzz 1yyyyyyy 1xxxxxxx 1wwwwwww 1vvvvvvv Thanks Lee -- Problem reports: http://cygwin.com/problems.html FAQ: http://cygwin.com/faq/ Documentation: http://cygwin.com/docs.html Unsubscribe info: http://cygwin.com/ml/#unsubscribe-simple