Joe Ciccone wrote: > The most interesting thing from a programmers point of view is the way > the characters are handled. This is the reason why incompatibilites > exist. A non-UTF-8 character, char, is 4 bits whereas a UTF-8 character, > wchar, is 32bits. It's hard to write code to properly support both types > of locales. Also, wchar processing code is slightly slower then char > processing code. Most programmers try to avoid it, including myself.
Well ASCII is technically 7 bits, but most systems recognize Latin1 which is 8 bits. IIRC UTF-8 characters are actually 1, 2, 3, or 4 bytes depending on the character. The first 128 UTF-8 characters are identical to ASCII. The vast majority of characters are 16-bits. There are somewhere around 30-40K character glyphs defined. Programmers do have to allow for 4 byte characters when manipulating UTF-8. -- Bruce -- http://linuxfromscratch.org/mailman/listinfo/lfs-dev FAQ: http://www.linuxfromscratch.org/faq/ Unsubscribe: See the above information page