On Sat, Sep 01, 2012 at 07:32:48PM -0400, Dan B. wrote: > In a locale setting such as en_US.UTF-8 (e.g., LANG=en_US.UTF-8), > what exactly does the charset/character encoding part (UTF-8) affect?
This affects the character encoding that programs use for input and output. For example, if you want to print the character ‘á’ (Unicode code point 0x00E1), you will output this as UTF-8 as the byte sequence 0xc3 0xa1 However, in a Latin 1 (ISO-8859-1) locale, this would be printed as 0xe1 and in other encodings, it will be a different byte sequence yet again. > Which common programs (e.g., getty, xterm/etc., sed/grep?) do something > different based on the charset portion of the local setting? All of them, in short. When you run a terminal emulator such as xterm, it will get the encoding to use inside the emulator using nl_langinfo(3). This returns the name of the character encoding used in the locale. This will ensure that it knows the encoding used by programs so that it can correctly display them, and likewise for the input it sends to them. If the encoding was incorrect, it would otherwise display garbage. When you run sed/grep, the encoding will affect how it processes the text. It's therefore important to use the same encoding in your files as you have set in your locale. Before we had UTF-8, the old 8-bit encodings didn't necessarily match your locale, and you couldn't tell what they were supposed to be, so using UTF-8 everywhere has been a massive improvement. This is generally completely transparent. For example, if you were to write (in C), the following code: #include <stdio.h> #include <locale.h> int main(void) { setlocale(LC_ALL, ""); printf("á\n"); return 0; } This will work correctly in any locale. GCC defaults to using UTF-8 internally, and will translate it to the user's locale encoding on output. Nowadays, there's little reason to use any encoding other than UTF-8; all the others are a subset of UTF-8 and only present for legacy and compatibility reasons. Regards, Roger -- .''`. Roger Leigh : :' : Debian GNU/Linux http://people.debian.org/~rleigh/ `. `' schroot and sbuild http://alioth.debian.org/projects/buildd-tools `- GPG Public Key F33D 281D 470A B443 6756 147C 07B3 C8BC 4083 E800 -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org Archive: http://lists.debian.org/20120902095315.gd3...@codelibre.net