On 3/26/25 6:29 PM, Robert Elz wrote:
     Date:        Wed, 26 Mar 2025 16:05:57 -0400
     From:        Chet Ramey <chet.ra...@case.edu>
     Message-ID:  <e478edbe-68a2-403f-9b3e-9a8bc5b44...@case.edu>

   | There is a precedence hierarchy associated with locale environment
   | variables, since setting and unsetting environment variables is under
   | the user's control.

This brings up an unrelated, but related, issue (unrelated to the OP's issue,
but related to LC_* vars and shell behaviour), which is what the shell should
do (for itself) when the user sets one of the LC_* variables.

The shell should assume that setting a shell variable means the user wants
to modify the shell's locale settings.


One answer would be "nothing" - they're just variables, usually exported,
and get set for applications to use (just as the shell would do when receiving
settings for those vars in the environment when it starts).

A bad choice.

The next possibility is just to do setlocale(3) using the category implied
by the variable name, and the value the user assigned to it.   This is also
easy, and probably what the user actually wants, but means the shell ends
up operating in a way that is different than any other application behaves.

The shell does quite a lot of things that are different than any other
application, including allowing the user to change locale environment
variables.


Eg: if I were to do (in the relevant shell)

        export LC_ALL=C

that's simple enough, setlocale() causes all the categories to be set to C
which is what we'd expect.   But then I do

        export LC_CTYPE=en_AU.UTF-8

If I do setlocale() now with that data, the locale in which the shell is
operating will have C for all categories except CTYPE, which will be
Aussie English (UTF-8 encoded).

That doesn't have to be all the shell does. The precedence hierarchy is
well-understood; there's nothing stopping the shell from implementing it:
noting that LC_ALL is set as a shell variable and making the right call
to setlocale() to make sure it overrides LC_CTYPE. I'd argue that having
just set LC_ALL, this is what the user expects here.

Any other application (including a new instance of the shell, but not
just a subshell environment) would ignore the LC_CTYPE setting, as LC_ALL
overrides everything, but setlocale(3) doesn't work like that, it operates
as "most recent call wins".

You're assuming a certain behavior and going on from there. The shell
doesn't have to do it that way.


And just to be more weird, what is the shell to do if I now do

        export LANG=fr

Nothing. LC_ALL and the relevant other LC_ variables take precedence, as
they should.


LANG is generally just the fallback for categories that haven't been
set to something else.   For this, assume the LC_ALL had never been
set (setting LANG after that is set should probably just do nothing)
so we have explicitly set LC_CTYPE, and then LANG, what should the
shell do with that one?   Lookup all the category vars (how does the
shell author even know what they all are - we know the basic ones,
but I believe locales can add more categories) and set the ones which
no var is set to to the locale being assigned to LANG?

I'd argue that the shell should modify the locale categories that affect
its behavior. That's a tricky business, no doubt, but it bounds the
effects (or you could just pay attention to all the categories that POSIX
defines). Plus there's nothing in POSIX that I can see that allows locale
definitions to add additional categories.


Things get even weirder when we start unsetting these things, as
there's no unsetlocale(3) to call to make it work.   Take the above
settings, then have the user execute "unset LC_ALL" - what does
the shell do now?

Note that one of the LC_ variables is being unset and act appropriately?
Since LC_ALL is being unset, you can go through all the locale categories
you know about and set them appropriately. If it's one of the other LC_
variables being unset, you can just change that one.


So, what does everyone believe the shell should really be doing in
these situations? What does bash do?

Pretty much what I described above.


Further does it make any difference if these vars are being set in
the shell, but not exported?

I'd argue that the user wants to change the shell's behavior.

Just in case there's any doubt, I'm asking about the effects of these
settings on the shell's internal operations, what it means when a
glob expression uses [[:alpha:]] as one of its elements, for example,
and the collating sequence used when sorting the var names when "set"
demands they be sorted, etc etc.

OK.


Also, please consider what built-in utilities are supposed to do
with all of this.  Do those just use the shell's locale settings?

Yes, they are builtins and documented as such.

If so, then depending upon the answers to the questions above, then
they might not behave the same as the equivalent external utility would
behave, and that's what is expected (as much as it can be).

Yes, if the user sets it up this way, that is what will happen, but it's
unlikely.

For
example, if in the above case, if I set LC_NUMERIC as well (on an
export command, just like the others above) and run printf, does
it use the LC_NUMERIC that I just set, or does the LC_ALL setting
override it as it would if the external command were run instead?

The value of LC_ALL should be used, since its precedecnce is higher.

Including if I explicitly do:
        LC_NUMERIC=xxx printf ...
(with LC_ALL still set in the environment).

Nope. Even the external version of the command would have LC_ALL
override the temporary assignment to LC_NUMERIC.

If the builtins should act just as if they were external commands,
and given a major purpose of being builtin is to avoid forking
(and some operations of builtins cannot work if they do fork)
then how is the shell intended to save and restore its locale
environment so the builtin can set its own?

I say you don't bother. Users expect the variables they set in a shell
session to affect that shell session.

 Is it really necessary
to query every category, save the results, and then restore them
all again, around every builtin command execution?

I don't think so.

For this do remember than things like "break" "continue" "return"
"local" etc, are all just built in commands technically.  And yes,
it can make a difference.  I can do "break 2", but that there 2 is
an arg to the break utility, and is required to be a number.  Fine.
But what if I have LC_CTYPE set to a locale with additional
characters which are digits?   Can I then use those digits as
the argument to the break utility (or continue/return/shift/exit/...)?

I'd say that's up to the implementation. (And are you really saying that
these hypothetical extra digits affect `break's treatment of its
argument as a "positive decimal integer?")

If I also (as above) had LC_ALL=C does that override the
LC_CTYPE setting for these utilities, or does it (assuming the
shell acts as users most likely intend when they set LC_CTYPE
after LC_ALL and just calls setlocale() on each, in order) use
the shell's current settings, and if so, what justifies that?

See above. It's easy to explain to a user that setting LC_ALL overrides
everything else.

--
``The lyf so short, the craft so long to lerne.'' - Chaucer
                 ``Ars longa, vita brevis'' - Hippocrates
Chet Ramey, UTech, CWRU    c...@case.edu    http://tiswww.cwru.edu/~chet/

Attachment: OpenPGP_signature.asc
Description: OpenPGP digital signature

Reply via email to