Mark H Weaver wrote: > by convention they are >supposed to encoded in the locale encoding.
This convention is bunk. The encoding aspect of the locale system is fundamentally broken: the model is that every string in the universe (every file content, filename, command line argument, etc.) is encoded in the same way, and the locale environment variable tells you which universe you're in. But in the real universe, files, filenames, and so on turn up encoded how their authors liked to encode them, and that's not always the same. In the real universe we have to cope with data that is not encoded in our preferred way. > If that convention is >violated, I don't see what a program could do about it. If the convention is violated, then there is some difficulty in presenting correctly-encoded (or even consistently-encoded) output to the user, but it is not insuperable. Perhaps the program knows by some non-locale means how a string is encoded, and can explicitly convert. Perhaps it doesn't know the real encoding, but can trust that the user will understand the octet string if it is passed through with neither decoding of input nor encoding for output. Or perhaps the program doesn't need to put the string into textual output at all, but only to use it some API or file format that's expecting an encodingless octet string. So there are many things a program can reasonably do about it, and which one to do depends on the application. >Can someone show me a realistic example of how this would be used in >practice? Looking specifically at environment variables: an environment variable could give the name of a file that is to be consulted under specified circumstances, and the right file may happen to have a name that is inconsistent with the encoding used by the user's terminal. (The filename is not required for output; it only needs to be passed as an uninterpreted octet string to the open(2) syscall.) An environment variable could specify a Unicode-using name of a language module to be loaded, while the user doesn't otherwise use Unicode, or doesn't use an encoding encompassing enough of it. (Name not required on output, again; will be either transformed into a filename or looked up in a file format that specifies its own encoding.) The program could be env(1), not interpreting the environment but needing to output the octets correctly. The program could be saving an uninterpreted environment, for a cron job to later run some other program with equivalent settings. -zefram