On Mon, Dec 16, 2024 at 11:31:31PM +0100, Patrice Dumas wrote:
> Hello,
> 
> I tested GNU Texinfo on Alpine Linux, which uses musl and not glibc in
> cfarm94.cfarm.net.  Tests with translations of strings in output in C
> fail, most probably because musl libintl does not use LANGUAGE to set
> the locale.  In that case, setting the configure
> --enable-xs-perl-libintl flag fixes the tests and the build.  Should
> that situation be documented in the manual?

Can you point me to any documentation about the behavour of
musl gettext and its treatment of LANGUAGE?

I found at the musl manual at https://musl.libc.org/doc/1.1.24/manual.html,
it lists variables, but not LANGUAGE:

  LC_ALL, LC_CTYPE, LC_NUMERIC, LC_TIME, LC_COLLATE, LC_MONETARY,
  LC_MESSAGES, and LANG -
  Used by setlocale and newlocale to determine a
  locale name to use when a zero-length string is passed. The precedence
  rules follow POSIX: LC_ALL overrides category-specific variables, and
  LANG provides a default for any category not set.

I found another page which was a release announcement from 2014:

  The features presently supported are:
  
  - The setting of the LC_MESSAGES locale category is recorded
    regardless of whether a libc locale file is available to be loaded.
    This will be used by the gettext interfaces if the application uses
    gettext message translation and can be retrieved by the application
    by calling setlocale(LC_MESSAGES, 0).

https://www.openwall.com/lists/musl/2014/08/01/1

I don't know if this is out of date, but it appears to agree with
the testing I did (see below) where setting LC_MESSAGES gives the
desired translations, even though it is not apparent that locales
for these translations are installed.

Later on in that page, he says:

  Using gettext:
  
  The gettext translation functions are largely compatible with the
  documented interfaces in the GNU gettext manual. This does not include
  some more recent, undocumented, ill-designed features in GNU gettext
  which are used mostly (only?) by some GNU packages so far. The main
  deviation from GNU gettext in the outward behavior is that the
  LANGUAGE environment variable is not honored; that topic is covered in
  a separate message to the musl list. Also, there is no default path
  for translation files, but this should not affect applications since
  the documented usage is that calling bindtextdomain is required.

I didn't find any mention of gettext in any other documentation though,
so I don't know how you are supposed to know about the gettext behaviour.

There is a POSIX specification for gettext that mentions LANGUAGE,
but this may be newer than the musl implementation:
https://pubs.opengroup.org/onlinepubs/9799919799/functions/gettext.html

I found on this machine, there is no "locale" command installed
so I find it hard to know what locales are available.

I got translations to work in a test program:

--cut
#include <libintl.h>
#include <locale.h>

#include <stdio.h>

int
main (void)
{
        char *cur_locale = setlocale (LC_ALL, "");
        printf ("locale is %s\n", cur_locale);
        bindtextdomain ("texinfo_document",
                        "texinfo-7.2/build/tp/LocaleData");
        textdomain("texinfo_document");
        printf ("%s\n", gettext ("Table of contents"));
}
--cut

Output:

cfarm94:~$ cc test.c
cfarm94:~$ ./a.out
locale is C.UTF-8;C;C;C;C;C
Table of contents
cfarm94:~$ LANG=de ./a.out 
locale is de;de;de;C;de;de
Inhaltsverzeichnis
cfarm94:~$ 
$ LC_MESSAGES=de_DE ./a.out 
locale is C.UTF-8;C;C;C;C;de_DE
Inhaltsverzeichnis
cfarm94:~$ LC_MESSAGES=pl ./a.out 
locale is C.UTF-8;C;C;C;C;pl
Spis treści
cfarm94:~$ 

As you say, LANGUAGE doesn't work:

cfarm94:~$ LANGUAGE=de_DE ./a.out 
locale is C.UTF-8;C;C;C;C;C
Table of contents
cfarm94:~$ 

For the test suite, we set LC_ALL=C for predictable results, which we
need to override to get translations.  After a huge amount of trial
and error, I was able to get a program which translated strings after
setting LC_ALL=C on the command line:


--cut
#include <libintl.h>
#include <locale.h>

#include <stdio.h>
#include <stdlib.h>

int
main (void)
{
        char *cur_locale;
        cur_locale = setlocale (LC_ALL, "");
        printf ("locale is %s\n", cur_locale);

        bindtextdomain ("texinfo_document",
                        "texinfo-7.2/build/tp/LocaleData");
        textdomain("texinfo_document");

        int setenv_status3 = unsetenv ("LC_ALL");
        int setenv_status2 = setenv ("LC_MESSAGES", "de", 1);
        char *locale = setlocale (LC_MESSAGES, "");

        printf ("%s\n", gettext ("Table of contents"));
}
--cut

cfarm94:~$ LC_ALL=C ./a.out 
locale is C
Inhaltsverzeichnis

Note I had to unset LC_ALL with "unsetenv", as LC_ALL takes priority
over LANG and LC_MESSAGES (unlike the unsupported LANGUAGE).
(Setting LC_ALL to "" in the environment had the same effect.)

Whereas at least 4 tests, including
test_scripts/formatting_documentlanguage_cmdline.sh didn't pass
under tp/tests (tp/t tests were disabled), now I was able to get
all tests to pass by adding lines to tp/Texinfo/XS/main/translations.c:


diff --git a/tp/Texinfo/XS/main/translations.c 
b/tp/Texinfo/XS/main/translations.c
index 4a904e1fff..0bcaf6d46f 100644
--- a/tp/Texinfo/XS/main/translations.c
+++ b/tp/Texinfo/XS/main/translations.c
@@ -310,6 +310,20 @@ translate_string (const char *string, const char *in_lang,
               language_locales.text, string, strerror (errno));
     }
 
+#ifndef _WIN32
+  if (setenv ("LC_MESSAGES", language_locales.text, 1) != 0
+          || unsetenv ("LC_ALL") != 0)
+    {
+      fprintf (stderr,
+              "translate_string: setenv `%s' error for string `%s': %s\n",
+              language_locales.text, string, strerror (errno));
+    }
+  else
+    {
+      char *cur = setlocale (LC_MESSAGES, "");
+    }
+#endif
+
   /* pgettext only works with string litterals, so use pgettext_expr */
   if (translation_context)
     translated_string = strdup (pgettext_expr (translation_context, string));

I then checked that this worked well on my own machine, which it did.

Then the call to switch_messages_locale is not actually needed as LC_MESSAGES
is set.  This means we don't have to go looking for a working LC_ALL
setting.

Any thoughts if we should adopt this approach for Alpine Linux and musl
and even by default?

Reply via email to