Re: OpenBSD locale system

Bruno Haible Thu, 20 Dec 2018 12:55:46 -0800

Hi Ingo,

Thanks for your thoughts. I appreciate that you give true consideration
to arguments.


> But i guess discussing such considerations in detail would be
> off-topic on this mailing list

We can stay on this mailing list. I'm not going to go deep into OpenBSD
specific system design arguments.

> So, you suggest to store this string in the library (where it has
> no effect) even though POSIX does not define a method to retrieve
> it again once it is stored?

I have now submitted a request to add such a method to POSIX. Here:
http://austingroupbugs.net/view.php?id=1220

I used OpenBSD 6.2 as an example in there, not to bash OpenBSD, but
to prove that POSIX is incomplete so far. Which I should probably
have done as early as 2005, when I noticed that the API is incomplete
regarding GNU libc:
https://sourceware.org/ml/libc-alpha/2005-03/msg00125.html

> Why would any programmer call a library API for that rather
> than simply storing the selected language in a variable?
...
> setlocale(3) supports storing a string
> in the library that the application program could just as easily,
> or arguably even more easily, store itself.

Many application programs are not small pieces of code, written by
a small group of programmers, but are rather assembled through
libraries, written by different groups of programmers.

The following libc APIs exist, not in order primarily make system calls
to the kernel, but to let information flow from one place of the
application to another place of the application:
  <locale.h>   setlocale, uselocale
  <setjmp.h>   setjmp, longjmp
  <stdio.h>    setbuf, setvbuf, clearerr
  <syslog.h>   setlogmask
  <libintl.h>  textdomain, bindtextdomain

Going even further, applications can even dynamically load libraries,
through <dlfcn.h>.

For example, 'ldd /usr/bin/emacs' displays 110 libraries on my system,
and a running 'kate' process has 46 dynamically loaded .so files open.
That's where libc (or libstdc++, in the second case) as information
dispatcher between different parts of the application becomes important.

> For comparison, the point of using {set,new,use}locale(3) with
> LC_CTYPE is not merely remembering which character set the user
> asked for, but also changing the behaviour of many *wc*(3) and
> *mb*(3) library functions.  LC_MESSAGES, on the other hand, will
> never have any effect on the behaviour of any library function
> in the OpenBSD libc.

The *mb* and *wc* function are only one consumer of the information
(the name of the locale_t category). Other parts of the application
want to consume this information as well.

> Also, in your web server example, you certainly don't want syslog
> messages in languages requested by clients, so calling uselocale(3)
> would merely be asking for trouble...  (Of course it's still possible
> to write correct code, but harder.)

You make a good (and often overlooked) point: In a properly internationalized
system: When a message is generated, the audience of the message (which
user? the web server administrator? the database administrator? the
browser user?) needs to be considered *already* at the point where the
message is generated. In my web server example, one could use
  - the global locale, set by setlocale(), for messages that go to the
    administrator,
  - a '__thread locale_t browser_use_locale;' object, or uselocale(),
    for messages that go to the browser user.

> But LC_NUMERIC is certainly dangerous, it can
> break parsers in subtle and surprising ways

Yes, here too, consideration needs to be given to the question: who
will parse the decimal number? A human user (supposed to be using
which locale?) or a language neutral parser?

The fact that these considerations can be done is shown by the
category LC_TIME. Here, parsing locale-dependent output is so
complex and buggy (see e.g. the ill-designed attempt in
https://pubs.opengroup.org/onlinepubs/9699919799/functions/getdate.html)
that real-world software is forced to make the distinction between
localized and not localized time representations. For the not localized
representations, software usually standardizes on the
  date +"%Y-%m-%dT%H:%M:%S"
format (with Gregorian calendar).

When you apply similar thought to LC_NUMERIC functionality, you can
achieve good results. But I agree it's easy to introduce bugs in this
area. Just last week, by mistake, I wrote code that prints a port number
in a localized way: 8,080 or 8.080 depending on locale. Ouch.

> I hoped to understand better what your point is by looking at the
> HEAD of the master branch of the git repo of GNU grep because you
> mentioned a test failure there

You would better look at a GNU gettext release:
https://ftp.gnu.org/gnu/gettext/gettext-0.19.8.1.tar.gz
There, in the gettext-runtime/intl/ directory, you will find the
localename.c file - which is my attempt at overcoming the lack of
a locale name getter function in POSIX - and its use for gettext().

GNU grep indeed happens to include the localename test, but since
'grep' is not a multithreaded program, inspection of this code will
not give you insights on this issue.

Bruno

Re: OpenBSD locale system

Reply via email to