On 3/26/25 6:29 PM, Robert Elz wrote:
Date: Wed, 26 Mar 2025 16:05:57 -0400 From: Chet Ramey <chet.ra...@case.edu> Message-ID: <e478edbe-68a2-403f-9b3e-9a8bc5b44...@case.edu>| There is a precedence hierarchy associated with locale environment | variables, since setting and unsetting environment variables is under | the user's control. This brings up an unrelated, but related, issue (unrelated to the OP's issue, but related to LC_* vars and shell behaviour), which is what the shell should do (for itself) when the user sets one of the LC_* variables.
The shell should assume that setting a shell variable means the user wants to modify the shell's locale settings.
One answer would be "nothing" - they're just variables, usually exported, and get set for applications to use (just as the shell would do when receiving settings for those vars in the environment when it starts).
A bad choice.
The next possibility is just to do setlocale(3) using the category implied by the variable name, and the value the user assigned to it. This is also easy, and probably what the user actually wants, but means the shell ends up operating in a way that is different than any other application behaves.
The shell does quite a lot of things that are different than any other application, including allowing the user to change locale environment variables.
Eg: if I were to do (in the relevant shell) export LC_ALL=C that's simple enough, setlocale() causes all the categories to be set to C which is what we'd expect. But then I do export LC_CTYPE=en_AU.UTF-8 If I do setlocale() now with that data, the locale in which the shell is operating will have C for all categories except CTYPE, which will be Aussie English (UTF-8 encoded).
That doesn't have to be all the shell does. The precedence hierarchy is well-understood; there's nothing stopping the shell from implementing it: noting that LC_ALL is set as a shell variable and making the right call to setlocale() to make sure it overrides LC_CTYPE. I'd argue that having just set LC_ALL, this is what the user expects here.
Any other application (including a new instance of the shell, but not just a subshell environment) would ignore the LC_CTYPE setting, as LC_ALL overrides everything, but setlocale(3) doesn't work like that, it operates as "most recent call wins".
You're assuming a certain behavior and going on from there. The shell doesn't have to do it that way.
And just to be more weird, what is the shell to do if I now do export LANG=fr
Nothing. LC_ALL and the relevant other LC_ variables take precedence, as they should.
LANG is generally just the fallback for categories that haven't been set to something else. For this, assume the LC_ALL had never been set (setting LANG after that is set should probably just do nothing) so we have explicitly set LC_CTYPE, and then LANG, what should the shell do with that one? Lookup all the category vars (how does the shell author even know what they all are - we know the basic ones, but I believe locales can add more categories) and set the ones whichno var is set to to the locale being assigned to LANG?
I'd argue that the shell should modify the locale categories that affect its behavior. That's a tricky business, no doubt, but it bounds the effects (or you could just pay attention to all the categories that POSIX defines). Plus there's nothing in POSIX that I can see that allows locale definitions to add additional categories.
Things get even weirder when we start unsetting these things, as there's no unsetlocale(3) to call to make it work. Take the above settings, then have the user execute "unset LC_ALL" - what does the shell do now?
Note that one of the LC_ variables is being unset and act appropriately? Since LC_ALL is being unset, you can go through all the locale categories you know about and set them appropriately. If it's one of the other LC_ variables being unset, you can just change that one.
So, what does everyone believe the shell should really be doing inthese situations? What does bash do?
Pretty much what I described above.
Further does it make any difference if these vars are being set in the shell, but not exported?
I'd argue that the user wants to change the shell's behavior.
Just in case there's any doubt, I'm asking about the effects of these settings on the shell's internal operations, what it means when a glob expression uses [[:alpha:]] as one of its elements, for example, and the collating sequence used when sorting the var names when "set" demands they be sorted, etc etc.
OK.
Also, please consider what built-in utilities are supposed to do with all of this. Do those just use the shell's locale settings?
Yes, they are builtins and documented as such.
If so, then depending upon the answers to the questions above, then they might not behave the same as the equivalent external utility wouldbehave, and that's what is expected (as much as it can be).
Yes, if the user sets it up this way, that is what will happen, but it's unlikely.
For example, if in the above case, if I set LC_NUMERIC as well (on an export command, just like the others above) and run printf, does it use the LC_NUMERIC that I just set, or does the LC_ALL setting override it as it would if the external command were run instead?
The value of LC_ALL should be used, since its precedecnce is higher.
Including if I explicitly do: LC_NUMERIC=xxx printf ... (with LC_ALL still set in the environment).
Nope. Even the external version of the command would have LC_ALL override the temporary assignment to LC_NUMERIC.
If the builtins should act just as if they were external commands, and given a major purpose of being builtin is to avoid forking (and some operations of builtins cannot work if they do fork) then how is the shell intended to save and restore its localeenvironment so the builtin can set its own?
I say you don't bother. Users expect the variables they set in a shell session to affect that shell session.
Is it really necessary to query every category, save the results, and then restore them all again, around every builtin command execution?
I don't think so.
For this do remember than things like "break" "continue" "return" "local" etc, are all just built in commands technically. And yes, it can make a difference. I can do "break 2", but that there 2 is an arg to the break utility, and is required to be a number. Fine. But what if I have LC_CTYPE set to a locale with additional characters which are digits? Can I then use those digits as the argument to the break utility (or continue/return/shift/exit/...)?
I'd say that's up to the implementation. (And are you really saying that these hypothetical extra digits affect `break's treatment of its argument as a "positive decimal integer?")
If I also (as above) had LC_ALL=C does that override the LC_CTYPE setting for these utilities, or does it (assuming the shell acts as users most likely intend when they set LC_CTYPE after LC_ALL and just calls setlocale() on each, in order) use the shell's current settings, and if so, what justifies that?
See above. It's easy to explain to a user that setting LC_ALL overrides everything else. -- ``The lyf so short, the craft so long to lerne.'' - Chaucer ``Ars longa, vita brevis'' - Hippocrates Chet Ramey, UTech, CWRU c...@case.edu http://tiswww.cwru.edu/~chet/
OpenPGP_signature.asc
Description: OpenPGP digital signature