Hi Ingo, Thanks for your thoughts. I appreciate that you give true consideration to arguments.
> But i guess discussing such considerations in detail would be > off-topic on this mailing list We can stay on this mailing list. I'm not going to go deep into OpenBSD specific system design arguments. > So, you suggest to store this string in the library (where it has > no effect) even though POSIX does not define a method to retrieve > it again once it is stored? I have now submitted a request to add such a method to POSIX. Here: http://austingroupbugs.net/view.php?id=1220 I used OpenBSD 6.2 as an example in there, not to bash OpenBSD, but to prove that POSIX is incomplete so far. Which I should probably have done as early as 2005, when I noticed that the API is incomplete regarding GNU libc: https://sourceware.org/ml/libc-alpha/2005-03/msg00125.html > Why would any programmer call a library API for that rather > than simply storing the selected language in a variable? ... > setlocale(3) supports storing a string > in the library that the application program could just as easily, > or arguably even more easily, store itself. Many application programs are not small pieces of code, written by a small group of programmers, but are rather assembled through libraries, written by different groups of programmers. The following libc APIs exist, not in order primarily make system calls to the kernel, but to let information flow from one place of the application to another place of the application: <locale.h> setlocale, uselocale <setjmp.h> setjmp, longjmp <stdio.h> setbuf, setvbuf, clearerr <syslog.h> setlogmask <libintl.h> textdomain, bindtextdomain Going even further, applications can even dynamically load libraries, through <dlfcn.h>. For example, 'ldd /usr/bin/emacs' displays 110 libraries on my system, and a running 'kate' process has 46 dynamically loaded .so files open. That's where libc (or libstdc++, in the second case) as information dispatcher between different parts of the application becomes important. > For comparison, the point of using {set,new,use}locale(3) with > LC_CTYPE is not merely remembering which character set the user > asked for, but also changing the behaviour of many *wc*(3) and > *mb*(3) library functions. LC_MESSAGES, on the other hand, will > never have any effect on the behaviour of any library function > in the OpenBSD libc. The *mb* and *wc* function are only one consumer of the information (the name of the locale_t category). Other parts of the application want to consume this information as well. > Also, in your web server example, you certainly don't want syslog > messages in languages requested by clients, so calling uselocale(3) > would merely be asking for trouble... (Of course it's still possible > to write correct code, but harder.) You make a good (and often overlooked) point: In a properly internationalized system: When a message is generated, the audience of the message (which user? the web server administrator? the database administrator? the browser user?) needs to be considered *already* at the point where the message is generated. In my web server example, one could use - the global locale, set by setlocale(), for messages that go to the administrator, - a '__thread locale_t browser_use_locale;' object, or uselocale(), for messages that go to the browser user. > But LC_NUMERIC is certainly dangerous, it can > break parsers in subtle and surprising ways Yes, here too, consideration needs to be given to the question: who will parse the decimal number? A human user (supposed to be using which locale?) or a language neutral parser? The fact that these considerations can be done is shown by the category LC_TIME. Here, parsing locale-dependent output is so complex and buggy (see e.g. the ill-designed attempt in https://pubs.opengroup.org/onlinepubs/9699919799/functions/getdate.html) that real-world software is forced to make the distinction between localized and not localized time representations. For the not localized representations, software usually standardizes on the date +"%Y-%m-%dT%H:%M:%S" format (with Gregorian calendar). When you apply similar thought to LC_NUMERIC functionality, you can achieve good results. But I agree it's easy to introduce bugs in this area. Just last week, by mistake, I wrote code that prints a port number in a localized way: 8,080 or 8.080 depending on locale. Ouch. > I hoped to understand better what your point is by looking at the > HEAD of the master branch of the git repo of GNU grep because you > mentioned a test failure there You would better look at a GNU gettext release: https://ftp.gnu.org/gnu/gettext/gettext-0.19.8.1.tar.gz There, in the gettext-runtime/intl/ directory, you will find the localename.c file - which is my attempt at overcoming the lack of a locale name getter function in POSIX - and its use for gettext(). GNU grep indeed happens to include the localename test, but since 'grep' is not a multithreaded program, inspection of this code will not give you insights on this issue. Bruno