On Mon, Dec 16, 2024 at 11:31:31PM +0100, Patrice Dumas wrote: > Hello, > > I tested GNU Texinfo on Alpine Linux, which uses musl and not glibc in > cfarm94.cfarm.net. Tests with translations of strings in output in C > fail, most probably because musl libintl does not use LANGUAGE to set > the locale. In that case, setting the configure > --enable-xs-perl-libintl flag fixes the tests and the build. Should > that situation be documented in the manual?
Can you point me to any documentation about the behavour of musl gettext and its treatment of LANGUAGE? I found at the musl manual at https://musl.libc.org/doc/1.1.24/manual.html, it lists variables, but not LANGUAGE: LC_ALL, LC_CTYPE, LC_NUMERIC, LC_TIME, LC_COLLATE, LC_MONETARY, LC_MESSAGES, and LANG - Used by setlocale and newlocale to determine a locale name to use when a zero-length string is passed. The precedence rules follow POSIX: LC_ALL overrides category-specific variables, and LANG provides a default for any category not set. I found another page which was a release announcement from 2014: The features presently supported are: - The setting of the LC_MESSAGES locale category is recorded regardless of whether a libc locale file is available to be loaded. This will be used by the gettext interfaces if the application uses gettext message translation and can be retrieved by the application by calling setlocale(LC_MESSAGES, 0). https://www.openwall.com/lists/musl/2014/08/01/1 I don't know if this is out of date, but it appears to agree with the testing I did (see below) where setting LC_MESSAGES gives the desired translations, even though it is not apparent that locales for these translations are installed. Later on in that page, he says: Using gettext: The gettext translation functions are largely compatible with the documented interfaces in the GNU gettext manual. This does not include some more recent, undocumented, ill-designed features in GNU gettext which are used mostly (only?) by some GNU packages so far. The main deviation from GNU gettext in the outward behavior is that the LANGUAGE environment variable is not honored; that topic is covered in a separate message to the musl list. Also, there is no default path for translation files, but this should not affect applications since the documented usage is that calling bindtextdomain is required. I didn't find any mention of gettext in any other documentation though, so I don't know how you are supposed to know about the gettext behaviour. There is a POSIX specification for gettext that mentions LANGUAGE, but this may be newer than the musl implementation: https://pubs.opengroup.org/onlinepubs/9799919799/functions/gettext.html I found on this machine, there is no "locale" command installed so I find it hard to know what locales are available. I got translations to work in a test program: --cut #include <libintl.h> #include <locale.h> #include <stdio.h> int main (void) { char *cur_locale = setlocale (LC_ALL, ""); printf ("locale is %s\n", cur_locale); bindtextdomain ("texinfo_document", "texinfo-7.2/build/tp/LocaleData"); textdomain("texinfo_document"); printf ("%s\n", gettext ("Table of contents")); } --cut Output: cfarm94:~$ cc test.c cfarm94:~$ ./a.out locale is C.UTF-8;C;C;C;C;C Table of contents cfarm94:~$ LANG=de ./a.out locale is de;de;de;C;de;de Inhaltsverzeichnis cfarm94:~$ $ LC_MESSAGES=de_DE ./a.out locale is C.UTF-8;C;C;C;C;de_DE Inhaltsverzeichnis cfarm94:~$ LC_MESSAGES=pl ./a.out locale is C.UTF-8;C;C;C;C;pl Spis treści cfarm94:~$ As you say, LANGUAGE doesn't work: cfarm94:~$ LANGUAGE=de_DE ./a.out locale is C.UTF-8;C;C;C;C;C Table of contents cfarm94:~$ For the test suite, we set LC_ALL=C for predictable results, which we need to override to get translations. After a huge amount of trial and error, I was able to get a program which translated strings after setting LC_ALL=C on the command line: --cut #include <libintl.h> #include <locale.h> #include <stdio.h> #include <stdlib.h> int main (void) { char *cur_locale; cur_locale = setlocale (LC_ALL, ""); printf ("locale is %s\n", cur_locale); bindtextdomain ("texinfo_document", "texinfo-7.2/build/tp/LocaleData"); textdomain("texinfo_document"); int setenv_status3 = unsetenv ("LC_ALL"); int setenv_status2 = setenv ("LC_MESSAGES", "de", 1); char *locale = setlocale (LC_MESSAGES, ""); printf ("%s\n", gettext ("Table of contents")); } --cut cfarm94:~$ LC_ALL=C ./a.out locale is C Inhaltsverzeichnis Note I had to unset LC_ALL with "unsetenv", as LC_ALL takes priority over LANG and LC_MESSAGES (unlike the unsupported LANGUAGE). (Setting LC_ALL to "" in the environment had the same effect.) Whereas at least 4 tests, including test_scripts/formatting_documentlanguage_cmdline.sh didn't pass under tp/tests (tp/t tests were disabled), now I was able to get all tests to pass by adding lines to tp/Texinfo/XS/main/translations.c: diff --git a/tp/Texinfo/XS/main/translations.c b/tp/Texinfo/XS/main/translations.c index 4a904e1fff..0bcaf6d46f 100644 --- a/tp/Texinfo/XS/main/translations.c +++ b/tp/Texinfo/XS/main/translations.c @@ -310,6 +310,20 @@ translate_string (const char *string, const char *in_lang, language_locales.text, string, strerror (errno)); } +#ifndef _WIN32 + if (setenv ("LC_MESSAGES", language_locales.text, 1) != 0 + || unsetenv ("LC_ALL") != 0) + { + fprintf (stderr, + "translate_string: setenv `%s' error for string `%s': %s\n", + language_locales.text, string, strerror (errno)); + } + else + { + char *cur = setlocale (LC_MESSAGES, ""); + } +#endif + /* pgettext only works with string litterals, so use pgettext_expr */ if (translation_context) translated_string = strdup (pgettext_expr (translation_context, string)); I then checked that this worked well on my own machine, which it did. Then the call to switch_messages_locale is not actually needed as LC_MESSAGES is set. This means we don't have to go looking for a working LC_ALL setting. Any thoughts if we should adopt this approach for Alpine Linux and musl and even by default?