Date: Wed, 26 Mar 2025 16:05:57 -0400 From: Chet Ramey <chet.ra...@case.edu> Message-ID: <e478edbe-68a2-403f-9b3e-9a8bc5b44...@case.edu>
| There is a precedence hierarchy associated with locale environment | variables, since setting and unsetting environment variables is under | the user's control. This brings up an unrelated, but related, issue (unrelated to the OP's issue, but related to LC_* vars and shell behaviour), which is what the shell should do (for itself) when the user sets one of the LC_* variables. One answer would be "nothing" - they're just variables, usually exported, and get set for applications to use (just as the shell would do when receiving settings for those vars in the environment when it starts). That's simple, and easy to explain, but probably isn't what users expect. The next possibility is just to do setlocale(3) using the category implied by the variable name, and the value the user assigned to it. This is also easy, and probably what the user actually wants, but means the shell ends up operating in a way that is different than any other application behaves. Eg: if I were to do (in the relevant shell) export LC_ALL=C that's simple enough, setlocale() causes all the categories to be set to C which is what we'd expect. But then I do export LC_CTYPE=en_AU.UTF-8 If I do setlocale() now with that data, the locale in which the shell is operating will have C for all categories except CTYPE, which will be Aussie English (UTF-8 encoded). Any other application (including a new instance of the shell, but not just a subshell environment) would ignore the LC_CTYPE setting, as LC_ALL overrides everything, but setlocale(3) doesn't work like that, it operates as "most recent call wins". And just to be more weird, what is the shell to do if I now do export LANG=fr ? LANG is generally just the fallback for categories that haven't been set to something else. For this, assume the LC_ALL had never been set (setting LANG after that is set should probably just do nothing) so we have explicitly set LC_CTYPE, and then LANG, what should the shell do with that one? Lookup all the category vars (how does the shell author even know what they all are - we know the basic ones, but I believe locales can add more categories) and set the ones which no var is set to to the locale being assigned to LANG? Really? If not, then what? Things get even weirder when we start unsetting these things, as there's no unsetlocale(3) to call to make it work. Take the above settings, then have the user execute "unset LC_ALL" - what does the shell do now? So, what does everyone believe the shell should really be doing in these situations? What does bash do? Anyone know about other shells? Further does it make any difference if these vars are being set in the shell, but not exported? Just in case there's any doubt, I'm asking about the effects of these settings on the shell's internal operations, what it means when a glob expression uses [[:alpha:]] as one of its elements, for example, and the collating sequence used when sorting the var names when "set" demands they be sorted, etc etc. Also, please consider what built-in utilities are supposed to do with all of this. Do those just use the shell's locale settings? If so, then depending upon the answers to the questions above, then they might not behave the same as the equivalent external utility would behave, and that's what is expected (as much as it can be). For example, if in the above case, if I set LC_NUMERIC as well (on an export command, just like the others above) and run printf, does it use the LC_NUMERIC that I just set, or does the LC_ALL setting override it as it would if the external command were run instead? Including if I explicitly do: LC_NUMERIC=xxx printf ... (with LC_ALL still set in the environment). If the builtins should act just as if they were external commands, and given a major purpose of being builtin is to avoid forking (and some operations of builtins cannot work if they do fork) then how is the shell intended to save and restore its locale environment so the builtin can set its own? Is it really necessary to query every category, save the results, and then restore them all again, around every builtin command execution? For this do remember than things like "break" "continue" "return" "local" etc, are all just built in commands technically. And yes, it can make a difference. I can do "break 2", but that there 2 is an arg to the break utility, and is required to be a number. Fine. But what if I have LC_CTYPE set to a locale with additional characters which are digits? Can I then use those digits as the argument to the break utility (or continue/return/shift/exit/...)? If I also (as above) had LC_ALL=C does that override the LC_CTYPE setting for these utilities, or does it (assuming the shell acts as users most likely intend when they set LC_CTYPE after LC_ALL and just calls setlocale() on each, in order) use the shell's current settings, and if so, what justifies that? All this is a big mess! The locale system is really a disaster. It was fine for what it was originally designed for - so manufacturers in the late 1970's and 1980's (pre internet) could sell unix based systems in non-English speaking countries. There there would basically be one local locale (and the default C locale - no POSIX then of course) everyone would simply up the environment to make their local locale work, and it would. Except for the occasional need to force LC_ALL=C for some particular operations, nothing would ever alter anything in the local locale settings. The world has moved on since then. kre