Date:        Wed, 26 Mar 2025 16:05:57 -0400
    From:        Chet Ramey <chet.ra...@case.edu>
    Message-ID:  <e478edbe-68a2-403f-9b3e-9a8bc5b44...@case.edu>

  | There is a precedence hierarchy associated with locale environment 
  | variables, since setting and unsetting environment variables is under
  | the user's control.

This brings up an unrelated, but related, issue (unrelated to the OP's issue,
but related to LC_* vars and shell behaviour), which is what the shell should
do (for itself) when the user sets one of the LC_* variables.

One answer would be "nothing" - they're just variables, usually exported,
and get set for applications to use (just as the shell would do when receiving
settings for those vars in the environment when it starts).

That's simple, and easy to explain, but probably isn't what users expect.

The next possibility is just to do setlocale(3) using the category implied
by the variable name, and the value the user assigned to it.   This is also
easy, and probably what the user actually wants, but means the shell ends
up operating in a way that is different than any other application behaves.

Eg: if I were to do (in the relevant shell)

        export LC_ALL=C

that's simple enough, setlocale() causes all the categories to be set to C
which is what we'd expect.   But then I do

        export LC_CTYPE=en_AU.UTF-8

If I do setlocale() now with that data, the locale in which the shell is
operating will have C for all categories except CTYPE, which will be
Aussie English (UTF-8 encoded).

Any other application (including a new instance of the shell, but not
just a subshell environment) would ignore the LC_CTYPE setting, as LC_ALL
overrides everything, but setlocale(3) doesn't work like that, it operates
as "most recent call wins".

And just to be more weird, what is the shell to do if I now do

        export LANG=fr

?

LANG is generally just the fallback for categories that haven't been
set to something else.   For this, assume the LC_ALL had never been
set (setting LANG after that is set should probably just do nothing)
so we have explicitly set LC_CTYPE, and then LANG, what should the
shell do with that one?   Lookup all the category vars (how does the
shell author even know what they all are - we know the basic ones,
but I believe locales can add more categories) and set the ones which
no var is set to to the locale being assigned to LANG?   Really?
If not, then what?

Things get even weirder when we start unsetting these things, as
there's no unsetlocale(3) to call to make it work.   Take the above
settings, then have the user execute "unset LC_ALL" - what does
the shell do now?

So, what does everyone believe the shell should really be doing in
these situations?  What does bash do?   Anyone know about other shells?
Further does it make any difference if these vars are being set in
the shell, but not exported?

Just in case there's any doubt, I'm asking about the effects of these
settings on the shell's internal operations, what it means when a
glob expression uses [[:alpha:]] as one of its elements, for example,
and the collating sequence used when sorting the var names when "set"
demands they be sorted, etc etc.

Also, please consider what built-in utilities are supposed to do
with all of this.  Do those just use the shell's locale settings?
If so, then depending upon the answers to the questions above, then
they might not behave the same as the equivalent external utility would
behave, and that's what is expected (as much as it can be).  For
example, if in the above case, if I set LC_NUMERIC as well (on an
export command, just like the others above) and run printf, does
it use the LC_NUMERIC that I just set, or does the LC_ALL setting
override it as it would if the external command were run instead?
Including if I explicitly do:
        LC_NUMERIC=xxx printf ...
(with LC_ALL still set in the environment).
If the builtins should act just as if they were external commands,
and given a major purpose of being builtin is to avoid forking
(and some operations of builtins cannot work if they do fork)
then how is the shell intended to save and restore its locale
environment so the builtin can set its own?   Is it really necessary
to query every category, save the results, and then restore them
all again, around every builtin command execution?

For this do remember than things like "break" "continue" "return"
"local" etc, are all just built in commands technically.  And yes,
it can make a difference.  I can do "break 2", but that there 2 is
an arg to the break utility, and is required to be a number.  Fine.
But what if I have LC_CTYPE set to a locale with additional
characters which are digits?   Can I then use those digits as
the argument to the break utility (or continue/return/shift/exit/...)?
If I also (as above) had LC_ALL=C does that override the
LC_CTYPE setting for these utilities, or does it (assuming the
shell acts as users most likely intend when they set LC_CTYPE
after LC_ALL and just calls setlocale() on each, in order) use
the shell's current settings, and if so, what justifies that?

All this is a big mess!    The locale system is really a disaster.
It was fine for what it was originally designed for - so manufacturers
in the late 1970's and 1980's (pre internet) could sell unix based
systems in non-English speaking countries.   There there would
basically be one local locale (and the default C locale - no POSIX
then of course) everyone would simply up the environment to make
their local locale work, and it would.   Except for the occasional
need to force LC_ALL=C for some particular operations, nothing would
ever alter anything in the local locale settings.

The world has moved on since then.

kre

Reply via email to