Lasse Collin reported in <https://lists.gnu.org/archive/html/bug-gettext/2024-12/msg00111.html> that the setlocale() override from GNU libintl does not support the UTF-8 environment of native Windows correctly. That setlocale() override is based on the setlocale() override from gnulib. So let me add that support here.
What I call the "UTF-8 environment of native Windows" is a way of packaging an application (details are in [1]) in such a way that GetACP() return 65001, the codepage number for UTF-8. In fact, there are apparently two variants of this mode: - the legacy Windows settings variant: when you haven't ever (or recently?) changed the system default locale of Windows 10, - the modern Windows settings variant: when you have changed the system default locale of Windows 10. With the legacy Windows settings, the setlocale() function produces locale names such as "English_United States.65001" or "English_United States.utf8". With the modern Windows settings, it produces "en_US.UTF-8" instead. (This is with both mingw and MSVC, according to my testing.) The various locale-related modules of gnulib were never tested in the UTF-8 environment. This series of patches adds support for it, with unit tests. [1] <https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page> 2024-12-23 Bruno Haible <br...@clisp.org> mbrtowc tests: Test in the UTF-8 environment on native Windows. * tests/test-mbrtowc-w32utf8.sh: New file. * tests/test-mbrtowc-w32utf8.c: New file. * modules/mbrtowc-tests (Files): Add these files and m4/windows-rc.m4, tests/windows-utf8.rc, tests/windows-utf8.manifest. (Depends-on): Add test-xfail. (configure.ac): Invoke gl_WINDOWS_RC. (Makefile.am): Arrange to compile test-mbrtowc-w32utf8 and run test-mbrtowc-w32utf8.sh. 2024-12-23 Bruno Haible <br...@clisp.org> setlocale tests: Test in the UTF-8 environment on native Windows. * tests/test-setlocale-w32utf8.sh: New file. * tests/test-setlocale-w32utf8.c: New file. * modules/setlocale-tests (Files): Add these files and m4/windows-rc.m4, tests/windows-utf8.rc, tests/windows-utf8.manifest. (Depends-on): Add test-xfail. (configure.ac): Invoke gl_WINDOWS_RC. (Makefile.am): Arrange to compile test-setlocale-w32utf8 and run test-setlocale-w32utf8.sh. setlocale: Support the UTF-8 environment on native Windows. * lib/setlocale.c: Include <windows.h>. (setlocale_unixlike): In the UTF-8 environment, append a suffix ".65001" to the locale names passed to the native setlocale(). 2024-12-23 Bruno Haible <br...@clisp.org> localename tests: Test in the UTF-8 environment on native Windows. * tests/test-localename-w32utf8.sh: New file. * tests/test-localename-w32utf8.c: New file. * modules/localename-tests (Files): Add these files and m4/windows-rc.m4, tests/windows-utf8.rc, tests/windows-utf8.manifest. (Depends-on): Add test-xfail. (configure.ac): Invoke gl_WINDOWS_RC. (Makefile.am): Arrange to compile test-localename-w32utf8 and run test-localename-w32utf8.sh. localename-unsafe: Support the UTF-8 environment on native Windows. * lib/localename-unsafe.c (gl_locale_name_from_win32_LANGID): Append a suffix ".UTF-8" to the result if GetACP() is UTF-8. 2024-12-23 Bruno Haible <br...@clisp.org> localcharset tests: Test in the UTF-8 environment on native Windows. * m4/windows-rc.m4: New file. * tests/test-localcharset-w32utf8.sh: New file. * tests/test-localcharset-w32utf8.c: New file. * tests/windows-utf8.rc: New file. * tests/windows-utf8.manifest: New file. * modules/localcharset-tests (Files): Add these files. (Depends-on): Add test-xfail. (configure.ac): Invoke gl_WINDOWS_RC. (Makefile.am): Arrange to compile test-localcharset-w32utf8 and run test-localcharset-w32utf8.sh. localcharset: Support the UTF-8 environment on native Windows. * lib/localcharset.c (locale_charset): Recognize also the special case of a setlocale() result that ends in ".UTF-8".
>From 927a70e0853345315570f051fd6996cfeb7b4d96 Mon Sep 17 00:00:00 2001 From: Bruno Haible <br...@clisp.org> Date: Mon, 23 Dec 2024 16:56:15 +0100 Subject: [PATCH 1/7] localcharset: Support the UTF-8 environment on native Windows. * lib/localcharset.c (locale_charset): Recognize also the special case of a setlocale() result that ends in ".UTF-8". --- ChangeLog | 6 ++++++ lib/localcharset.c | 6 ++++-- 2 files changed, 10 insertions(+), 2 deletions(-) diff --git a/ChangeLog b/ChangeLog index c294898828..1ac323da3e 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,9 @@ +2024-12-23 Bruno Haible <br...@clisp.org> + + localcharset: Support the UTF-8 environment on native Windows. + * lib/localcharset.c (locale_charset): Recognize also the special case + of a setlocale() result that ends in ".UTF-8". + 2024-12-23 Bruno Haible <br...@clisp.org> setlocale tests: Add unit test for LC_MESSAGES handling. diff --git a/lib/localcharset.c b/lib/localcharset.c index bd3367477d..755645763d 100644 --- a/lib/localcharset.c +++ b/lib/localcharset.c @@ -939,8 +939,10 @@ locale_charset (void) sprintf (buf, "CP%u", GetACP ()); } /* For a locale name such as "French_France.65001", in Windows 10, - setlocale now returns "French_France.utf8" instead. */ - if (strcmp (buf + 2, "65001") == 0 || strcmp (buf + 2, "utf8") == 0) + setlocale now returns "French_France.utf8" instead, or in the UTF-8 + environment (with modern system settings) "fr_FR.UTF-8". */ + if (strcmp (buf + 2, "65001") == 0 || strcmp (buf + 2, "utf8") == 0 + || strcmp (buf + 2, "UTF-8") == 0) codeset = "UTF-8"; else { -- 2.43.0
>From a5c87eca2b85c624582eabeb6b409dc6fb50bfbd Mon Sep 17 00:00:00 2001 From: Bruno Haible <br...@clisp.org> Date: Mon, 23 Dec 2024 16:56:37 +0100 Subject: [PATCH 2/7] localcharset tests: Test in the UTF-8 environment on native Windows. * m4/windows-rc.m4: New file. * tests/test-localcharset-w32utf8.sh: New file. * tests/test-localcharset-w32utf8.c: New file. * tests/windows-utf8.rc: New file. * tests/windows-utf8.manifest: New file. * modules/localcharset-tests (Files): Add these files. (Depends-on): Add test-xfail. (configure.ac): Invoke gl_WINDOWS_RC. (Makefile.am): Arrange to compile test-localcharset-w32utf8 and run test-localcharset-w32utf8.sh. --- ChangeLog | 12 ++++++ m4/windows-rc.m4 | 21 ++++++++++ modules/localcharset-tests | 16 ++++++++ tests/test-localcharset-w32utf8.c | 61 ++++++++++++++++++++++++++++++ tests/test-localcharset-w32utf8.sh | 7 ++++ tests/windows-utf8.manifest | 20 ++++++++++ tests/windows-utf8.rc | 9 +++++ 7 files changed, 146 insertions(+) create mode 100644 m4/windows-rc.m4 create mode 100644 tests/test-localcharset-w32utf8.c create mode 100755 tests/test-localcharset-w32utf8.sh create mode 100644 tests/windows-utf8.manifest create mode 100644 tests/windows-utf8.rc diff --git a/ChangeLog b/ChangeLog index 1ac323da3e..bb9f076353 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,17 @@ 2024-12-23 Bruno Haible <br...@clisp.org> + localcharset tests: Test in the UTF-8 environment on native Windows. + * m4/windows-rc.m4: New file. + * tests/test-localcharset-w32utf8.sh: New file. + * tests/test-localcharset-w32utf8.c: New file. + * tests/windows-utf8.rc: New file. + * tests/windows-utf8.manifest: New file. + * modules/localcharset-tests (Files): Add these files. + (Depends-on): Add test-xfail. + (configure.ac): Invoke gl_WINDOWS_RC. + (Makefile.am): Arrange to compile test-localcharset-w32utf8 and run + test-localcharset-w32utf8.sh. + localcharset: Support the UTF-8 environment on native Windows. * lib/localcharset.c (locale_charset): Recognize also the special case of a setlocale() result that ends in ".UTF-8". diff --git a/m4/windows-rc.m4 b/m4/windows-rc.m4 new file mode 100644 index 0000000000..8a4deb14b8 --- /dev/null +++ b/m4/windows-rc.m4 @@ -0,0 +1,21 @@ +# windows-rc.m4 +# serial 1 +dnl Copyright (C) 2024 Free Software Foundation, Inc. +dnl This file is free software; the Free Software Foundation +dnl gives unlimited permission to copy and/or distribute it, +dnl with or without modifications, as long as this notice is preserved. +dnl This file is offered as-is, without any warranty. + +dnl Find the tool that "compiles" a Windows resource file (.rc) to an +dnl object file. + +AC_DEFUN_ONCE([gl_WINDOWS_RC], +[ + AC_REQUIRE([AC_CANONICAL_HOST]) + case "$host_os" in + mingw* | windows*) + dnl Check for a program that compiles Windows resource files. + AC_CHECK_TOOL([WINDRES], [windres]) + ;; + esac +]) diff --git a/modules/localcharset-tests b/modules/localcharset-tests index 3f2dde6dfd..a171c0cfbf 100644 --- a/modules/localcharset-tests +++ b/modules/localcharset-tests @@ -1,11 +1,27 @@ Files: tests/test-localcharset.c +tests/test-localcharset-w32utf8.sh +tests/test-localcharset-w32utf8.c +tests/windows-utf8.rc +tests/windows-utf8.manifest +m4/windows-rc.m4 Depends-on: setlocale +test-xfail configure.ac: +gl_WINDOWS_RC Makefile.am: noinst_PROGRAMS += test-localcharset test_localcharset_LDADD = $(LDADD) $(SETLOCALE_LIB) + +if OS_IS_NATIVE_WINDOWS +TESTS += test-localcharset-w32utf8.sh +noinst_PROGRAMS += test-localcharset-w32utf8 +test_localcharset_w32utf8_LDADD = $(LDADD) test-localcharset-windows-utf8.res $(SETLOCALE_LIB) +test-localcharset-windows-utf8.res : $(srcdir)/windows-utf8.rc + $(WINDRES) -i $(srcdir)/windows-utf8.rc -o test-localcharset-windows-utf8.res --output-format=coff +MOSTLYCLEANFILES += test-localcharset-windows-utf8.res +endif diff --git a/tests/test-localcharset-w32utf8.c b/tests/test-localcharset-w32utf8.c new file mode 100644 index 0000000000..f40db9c397 --- /dev/null +++ b/tests/test-localcharset-w32utf8.c @@ -0,0 +1,61 @@ +/* Test of localcharset() function + on native Windows in the UTF-8 environment. + Copyright (C) 2024 Free Software Foundation, Inc. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <https://www.gnu.org/licenses/>. */ + +/* Written by Bruno Haible <br...@clisp.org>, 2024. */ + +#include <config.h> + +#include "localcharset.h" + +#include <locale.h> +#include <stdio.h> +#include <string.h> +#include <stdlib.h> + +#define WIN32_LEAN_AND_MEAN +#include <windows.h> + +int +main (void) +{ +#ifdef _UCRT + unsigned int active_codepage = GetACP (); + if (!(active_codepage == 65001)) + { + fprintf (stderr, + "The active codepage is %u, not 65001 as expected.\n" + "(This is normal on Windows older than Windows 10.)\n", + active_codepage); + exit (1); + } + + setlocale (LC_ALL, ""); + const char *lc = locale_charset (); + if (!(strcmp (lc, "UTF-8") == 0)) + { + fprintf (stderr, + "locale_charset () is \"%s\", not \"UTF-8\" as expected.\n", + lc); + exit (1); + } + + return 0; +#else + fputs ("Skipping test: not using the UCRT runtime\n", stderr); + return 77; +#endif +} diff --git a/tests/test-localcharset-w32utf8.sh b/tests/test-localcharset-w32utf8.sh new file mode 100755 index 0000000000..1e6a95b545 --- /dev/null +++ b/tests/test-localcharset-w32utf8.sh @@ -0,0 +1,7 @@ +#!/bin/sh + +# Test the UTF-8 environment on native Windows. +unset LC_ALL +unset LC_CTYPE +unset LANG +${CHECKER} ./test-localcharset-w32utf8${EXEEXT} diff --git a/tests/windows-utf8.manifest b/tests/windows-utf8.manifest new file mode 100644 index 0000000000..3a43a70c6d --- /dev/null +++ b/tests/windows-utf8.manifest @@ -0,0 +1,20 @@ +<?xml version="1.0" encoding="UTF-8" standalone="yes"?> +<!-- This file is in the public domain. --> + +<!-- This file is an application manifest that has the effect that in the + application, GetACP () == 65001 instead of e.g. 1252. + Documentation: + https://learn.microsoft.com/en-us/windows/win32/sbscs/application-manifests#activeCodePage + https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page + XML schema that this file needs to obey: + https://learn.microsoft.com/en-us/windows/win32/sbscs/manifest-file-schema + It is supposed to work in Windows 10 version 1903 or newer, + when the UCRT runtime is in use (as opposed to old MSVCRT). +--> +<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0"> + <application xmlns="urn:schemas-microsoft-com:asm.v3"> + <windowsSettings> + <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage> + </windowsSettings> + </application> +</assembly> diff --git a/tests/windows-utf8.rc b/tests/windows-utf8.rc new file mode 100644 index 0000000000..110241aa16 --- /dev/null +++ b/tests/windows-utf8.rc @@ -0,0 +1,9 @@ +/* This file is in the public domain. */ + +/* This file is a resource definition file. + When compiled to an object file, it embeds the windows-utf8.manifest file, + that has the effect that in the application, GetACP () == 65001 instead + of e.g. 1252. */ + +#include <winresrc.h> /* includes <winuser.h>, <winver.h> */ +CREATEPROCESS_MANIFEST_RESOURCE_ID RT_MANIFEST "windows-utf8.manifest" -- 2.43.0
>From 9f7ff4f423cd805866cd4edef806c32393621df0 Mon Sep 17 00:00:00 2001 From: Bruno Haible <br...@clisp.org> Date: Mon, 23 Dec 2024 16:56:57 +0100 Subject: [PATCH 3/7] localename-unsafe: Support the UTF-8 environment on native Windows. * lib/localename-unsafe.c (gl_locale_name_from_win32_LANGID): Append a suffix ".UTF-8" to the result if GetACP() is UTF-8. --- ChangeLog | 6 + lib/localename-unsafe.c | 848 ++++++++++++++++++++-------------------- 2 files changed, 433 insertions(+), 421 deletions(-) diff --git a/ChangeLog b/ChangeLog index bb9f076353..d9f282c21e 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,9 @@ +2024-12-23 Bruno Haible <br...@clisp.org> + + localename-unsafe: Support the UTF-8 environment on native Windows. + * lib/localename-unsafe.c (gl_locale_name_from_win32_LANGID): Append a + suffix ".UTF-8" to the result if GetACP() is UTF-8. + 2024-12-23 Bruno Haible <br...@clisp.org> localcharset tests: Test in the UTF-8 environment on native Windows. diff --git a/lib/localename-unsafe.c b/lib/localename-unsafe.c index 0a2654d8a3..7088616892 100644 --- a/lib/localename-unsafe.c +++ b/lib/localename-unsafe.c @@ -1502,6 +1502,8 @@ static const char * gl_locale_name_from_win32_LANGID (LANGID langid) { + int is_utf8 = (GetACP () == 65001); + /* Activate the new code only when the GETTEXT_MUI environment variable is set, for the time being, since the new code is not well tested. */ if (getenv ("GETTEXT_MUI") != NULL) @@ -1512,10 +1514,12 @@ gl_locale_name_from_win32_LANGID (LANGID langid) On Windows95/98/ME, GetLocaleInfoA returns some incorrect results. But we don't need to support systems that are so old. */ if (GetLocaleInfoA (MAKELCID (langid, SORT_DEFAULT), LOCALE_SNAME, - namebuf, sizeof (namebuf) - 1)) + namebuf, sizeof (namebuf) - 1 - 6)) { /* Convert it to a Unix locale name. */ gl_locale_name_canonicalize (namebuf); + if (is_utf8) + strcat (namebuf, ".UTF-8"); return namebuf; } } @@ -1525,6 +1529,7 @@ gl_locale_name_from_win32_LANGID (LANGID langid) Windows base (e.g. they have different character conversion facilities that produce different results). */ /* Use our own table. */ + #define N(name) (is_utf8 ? name ".UTF-8" : name) { int primary, sub; @@ -1540,146 +1545,146 @@ gl_locale_name_from_win32_LANGID (LANGID langid) case LANG_AFRIKAANS: switch (sub) { - case SUBLANG_AFRIKAANS_SOUTH_AFRICA: return "af_ZA"; + case SUBLANG_AFRIKAANS_SOUTH_AFRICA: return N("af_ZA"); } - return "af"; + return N("af"); case LANG_ALBANIAN: switch (sub) { - case SUBLANG_ALBANIAN_ALBANIA: return "sq_AL"; + case SUBLANG_ALBANIAN_ALBANIA: return N("sq_AL"); } - return "sq"; + return N("sq"); case LANG_ALSATIAN: switch (sub) { - case SUBLANG_ALSATIAN_FRANCE: return "gsw_FR"; + case SUBLANG_ALSATIAN_FRANCE: return N("gsw_FR"); } - return "gsw"; + return N("gsw"); case LANG_AMHARIC: switch (sub) { - case SUBLANG_AMHARIC_ETHIOPIA: return "am_ET"; + case SUBLANG_AMHARIC_ETHIOPIA: return N("am_ET"); } - return "am"; + return N("am"); case LANG_ARABIC: switch (sub) { - case SUBLANG_ARABIC_SAUDI_ARABIA: return "ar_SA"; - case SUBLANG_ARABIC_IRAQ: return "ar_IQ"; - case SUBLANG_ARABIC_EGYPT: return "ar_EG"; - case SUBLANG_ARABIC_LIBYA: return "ar_LY"; - case SUBLANG_ARABIC_ALGERIA: return "ar_DZ"; - case SUBLANG_ARABIC_MOROCCO: return "ar_MA"; - case SUBLANG_ARABIC_TUNISIA: return "ar_TN"; - case SUBLANG_ARABIC_OMAN: return "ar_OM"; - case SUBLANG_ARABIC_YEMEN: return "ar_YE"; - case SUBLANG_ARABIC_SYRIA: return "ar_SY"; - case SUBLANG_ARABIC_JORDAN: return "ar_JO"; - case SUBLANG_ARABIC_LEBANON: return "ar_LB"; - case SUBLANG_ARABIC_KUWAIT: return "ar_KW"; - case SUBLANG_ARABIC_UAE: return "ar_AE"; - case SUBLANG_ARABIC_BAHRAIN: return "ar_BH"; - case SUBLANG_ARABIC_QATAR: return "ar_QA"; - } - return "ar"; + case SUBLANG_ARABIC_SAUDI_ARABIA: return N("ar_SA"); + case SUBLANG_ARABIC_IRAQ: return N("ar_IQ"); + case SUBLANG_ARABIC_EGYPT: return N("ar_EG"); + case SUBLANG_ARABIC_LIBYA: return N("ar_LY"); + case SUBLANG_ARABIC_ALGERIA: return N("ar_DZ"); + case SUBLANG_ARABIC_MOROCCO: return N("ar_MA"); + case SUBLANG_ARABIC_TUNISIA: return N("ar_TN"); + case SUBLANG_ARABIC_OMAN: return N("ar_OM"); + case SUBLANG_ARABIC_YEMEN: return N("ar_YE"); + case SUBLANG_ARABIC_SYRIA: return N("ar_SY"); + case SUBLANG_ARABIC_JORDAN: return N("ar_JO"); + case SUBLANG_ARABIC_LEBANON: return N("ar_LB"); + case SUBLANG_ARABIC_KUWAIT: return N("ar_KW"); + case SUBLANG_ARABIC_UAE: return N("ar_AE"); + case SUBLANG_ARABIC_BAHRAIN: return N("ar_BH"); + case SUBLANG_ARABIC_QATAR: return N("ar_QA"); + } + return N("ar"); case LANG_ARMENIAN: switch (sub) { - case SUBLANG_ARMENIAN_ARMENIA: return "hy_AM"; + case SUBLANG_ARMENIAN_ARMENIA: return N("hy_AM"); } - return "hy"; + return N("hy"); case LANG_ASSAMESE: switch (sub) { - case SUBLANG_ASSAMESE_INDIA: return "as_IN"; + case SUBLANG_ASSAMESE_INDIA: return N("as_IN"); } - return "as"; + return N("as"); case LANG_AZERI: switch (sub) { - case 0x1e: return "az"; - case SUBLANG_AZERI_LATIN: return "az_AZ"; - case 0x1d: return "az@cyrillic"; - case SUBLANG_AZERI_CYRILLIC: return "az_AZ@cyrillic"; + case 0x1e: return N("az"); + case SUBLANG_AZERI_LATIN: return N("az_AZ"); + case 0x1d: return N("az@cyrillic"); + case SUBLANG_AZERI_CYRILLIC: return N("az_AZ@cyrillic"); } - return "az"; + return N("az"); case LANG_BASHKIR: switch (sub) { - case SUBLANG_BASHKIR_RUSSIA: return "ba_RU"; + case SUBLANG_BASHKIR_RUSSIA: return N("ba_RU"); } - return "ba"; + return N("ba"); case LANG_BASQUE: switch (sub) { - case SUBLANG_BASQUE_BASQUE: return "eu_ES"; + case SUBLANG_BASQUE_BASQUE: return N("eu_ES"); } - return "eu"; /* Ambiguous: could be "eu_ES" or "eu_FR". */ + return N("eu"); /* Ambiguous: could be "eu_ES" or "eu_FR". */ case LANG_BELARUSIAN: switch (sub) { - case SUBLANG_BELARUSIAN_BELARUS: return "be_BY"; + case SUBLANG_BELARUSIAN_BELARUS: return N("be_BY"); } - return "be"; + return N("be"); case LANG_BENGALI: switch (sub) { - case SUBLANG_BENGALI_INDIA: return "bn_IN"; - case SUBLANG_BENGALI_BANGLADESH: return "bn_BD"; + case SUBLANG_BENGALI_INDIA: return N("bn_IN"); + case SUBLANG_BENGALI_BANGLADESH: return N("bn_BD"); } - return "bn"; + return N("bn"); case LANG_BRETON: switch (sub) { - case SUBLANG_BRETON_FRANCE: return "br_FR"; + case SUBLANG_BRETON_FRANCE: return N("br_FR"); } - return "br"; + return N("br"); case LANG_BULGARIAN: switch (sub) { - case SUBLANG_BULGARIAN_BULGARIA: return "bg_BG"; + case SUBLANG_BULGARIAN_BULGARIA: return N("bg_BG"); } - return "bg"; + return N("bg"); case LANG_BURMESE: switch (sub) { - case SUBLANG_DEFAULT: return "my_MM"; + case SUBLANG_DEFAULT: return N("my_MM"); } - return "my"; + return N("my"); case LANG_CAMBODIAN: switch (sub) { - case SUBLANG_CAMBODIAN_CAMBODIA: return "km_KH"; + case SUBLANG_CAMBODIAN_CAMBODIA: return N("km_KH"); } - return "km"; + return N("km"); case LANG_CATALAN: switch (sub) { - case SUBLANG_CATALAN_SPAIN: return "ca_ES"; + case SUBLANG_CATALAN_SPAIN: return N("ca_ES"); } - return "ca"; + return N("ca"); case LANG_CHEROKEE: switch (sub) { - case SUBLANG_DEFAULT: return "chr_US"; + case SUBLANG_DEFAULT: return N("chr_US"); } - return "chr"; + return N("chr"); case LANG_CHINESE: switch (sub) { - case SUBLANG_CHINESE_TRADITIONAL: case 0x1f: return "zh_TW"; - case SUBLANG_CHINESE_SIMPLIFIED: case 0x00: return "zh_CN"; - case SUBLANG_CHINESE_HONGKONG: return "zh_HK"; /* traditional */ - case SUBLANG_CHINESE_SINGAPORE: return "zh_SG"; /* simplified */ - case SUBLANG_CHINESE_MACAU: return "zh_MO"; /* traditional */ + case SUBLANG_CHINESE_TRADITIONAL: case 0x1f: return N("zh_TW"); + case SUBLANG_CHINESE_SIMPLIFIED: case 0x00: return N("zh_CN"); + case SUBLANG_CHINESE_HONGKONG: return N("zh_HK"); /* traditional */ + case SUBLANG_CHINESE_SINGAPORE: return N("zh_SG"); /* simplified */ + case SUBLANG_CHINESE_MACAU: return N("zh_MO"); /* traditional */ } - return "zh"; + return N("zh"); case LANG_CORSICAN: switch (sub) { - case SUBLANG_CORSICAN_FRANCE: return "co_FR"; + case SUBLANG_CORSICAN_FRANCE: return N("co_FR"); } - return "co"; + return N("co"); case LANG_CROATIAN: /* LANG_CROATIAN == LANG_SERBIAN == LANG_BOSNIAN * What used to be called Serbo-Croatian * should really now be two separate @@ -1691,68 +1696,68 @@ gl_locale_name_from_win32_LANGID (LANGID langid) switch (sub) { /* Croatian */ - case 0x00: return "hr"; - case SUBLANG_CROATIAN_CROATIA: return "hr_HR"; - case SUBLANG_CROATIAN_BOSNIA_HERZEGOVINA_LATIN: return "hr_BA"; + case 0x00: return N("hr"); + case SUBLANG_CROATIAN_CROATIA: return N("hr_HR"); + case SUBLANG_CROATIAN_BOSNIA_HERZEGOVINA_LATIN: return N("hr_BA"); /* Serbian */ - case 0x1f: return "sr"; - case 0x1c: return "sr"; /* latin */ - case SUBLANG_SERBIAN_LATIN: return "sr_CS"; /* latin */ - case 0x09: return "sr_RS"; /* latin */ - case 0x0b: return "sr_ME"; /* latin */ - case 0x06: return "sr_BA"; /* latin */ - case 0x1b: return "sr@cyrillic"; - case SUBLANG_SERBIAN_CYRILLIC: return "sr_CS@cyrillic"; - case 0x0a: return "sr_RS@cyrillic"; - case 0x0c: return "sr_ME@cyrillic"; - case 0x07: return "sr_BA@cyrillic"; + case 0x1f: return N("sr"); + case 0x1c: return N("sr"); /* latin */ + case SUBLANG_SERBIAN_LATIN: return N("sr_CS"); /* latin */ + case 0x09: return N("sr_RS"); /* latin */ + case 0x0b: return N("sr_ME"); /* latin */ + case 0x06: return N("sr_BA"); /* latin */ + case 0x1b: return N("sr@cyrillic"); + case SUBLANG_SERBIAN_CYRILLIC: return N("sr_CS@cyrillic"); + case 0x0a: return N("sr_RS@cyrillic"); + case 0x0c: return N("sr_ME@cyrillic"); + case 0x07: return N("sr_BA@cyrillic"); /* Bosnian */ - case 0x1e: return "bs"; - case 0x1a: return "bs"; /* latin */ - case SUBLANG_BOSNIAN_BOSNIA_HERZEGOVINA_LATIN: return "bs_BA"; /* latin */ - case 0x19: return "bs@cyrillic"; - case SUBLANG_BOSNIAN_BOSNIA_HERZEGOVINA_CYRILLIC: return "bs_BA@cyrillic"; + case 0x1e: return N("bs"); + case 0x1a: return N("bs"); /* latin */ + case SUBLANG_BOSNIAN_BOSNIA_HERZEGOVINA_LATIN: return N("bs_BA"); /* latin */ + case 0x19: return N("bs@cyrillic"); + case SUBLANG_BOSNIAN_BOSNIA_HERZEGOVINA_CYRILLIC: return N("bs_BA@cyrillic"); } - return "hr"; + return N("hr"); case LANG_CZECH: switch (sub) { - case SUBLANG_CZECH_CZECH_REPUBLIC: return "cs_CZ"; + case SUBLANG_CZECH_CZECH_REPUBLIC: return N("cs_CZ"); } - return "cs"; + return N("cs"); case LANG_DANISH: switch (sub) { - case SUBLANG_DANISH_DENMARK: return "da_DK"; + case SUBLANG_DANISH_DENMARK: return N("da_DK"); } - return "da"; + return N("da"); case LANG_DARI: /* FIXME: Adjust this when such locales appear on Unix. */ switch (sub) { - case SUBLANG_DARI_AFGHANISTAN: return "prs_AF"; + case SUBLANG_DARI_AFGHANISTAN: return N("prs_AF"); } - return "prs"; + return N("prs"); case LANG_DIVEHI: switch (sub) { - case SUBLANG_DIVEHI_MALDIVES: return "dv_MV"; + case SUBLANG_DIVEHI_MALDIVES: return N("dv_MV"); } - return "dv"; + return N("dv"); case LANG_DUTCH: switch (sub) { - case SUBLANG_DUTCH: return "nl_NL"; - case SUBLANG_DUTCH_BELGIAN: /* FLEMISH, VLAAMS */ return "nl_BE"; - case SUBLANG_DUTCH_SURINAM: return "nl_SR"; + case SUBLANG_DUTCH: return N("nl_NL"); + case SUBLANG_DUTCH_BELGIAN: /* FLEMISH, VLAAMS */ return N("nl_BE"); + case SUBLANG_DUTCH_SURINAM: return N("nl_SR"); } - return "nl"; + return N("nl"); case LANG_EDO: switch (sub) { - case SUBLANG_DEFAULT: return "bin_NG"; + case SUBLANG_DEFAULT: return N("bin_NG"); } - return "bin"; + return N("bin"); case LANG_ENGLISH: switch (sub) { @@ -1760,541 +1765,541 @@ gl_locale_name_from_win32_LANGID (LANGID langid) * English was the language spoken in England. * Oh well. */ - case SUBLANG_ENGLISH_US: return "en_US"; - case SUBLANG_ENGLISH_UK: return "en_GB"; - case SUBLANG_ENGLISH_AUS: return "en_AU"; - case SUBLANG_ENGLISH_CAN: return "en_CA"; - case SUBLANG_ENGLISH_NZ: return "en_NZ"; - case SUBLANG_ENGLISH_EIRE: return "en_IE"; - case SUBLANG_ENGLISH_SOUTH_AFRICA: return "en_ZA"; - case SUBLANG_ENGLISH_JAMAICA: return "en_JM"; - case SUBLANG_ENGLISH_CARIBBEAN: return "en_GD"; /* Grenada? */ - case SUBLANG_ENGLISH_BELIZE: return "en_BZ"; - case SUBLANG_ENGLISH_TRINIDAD: return "en_TT"; - case SUBLANG_ENGLISH_ZIMBABWE: return "en_ZW"; - case SUBLANG_ENGLISH_PHILIPPINES: return "en_PH"; - case SUBLANG_ENGLISH_INDONESIA: return "en_ID"; - case SUBLANG_ENGLISH_HONGKONG: return "en_HK"; - case SUBLANG_ENGLISH_INDIA: return "en_IN"; - case SUBLANG_ENGLISH_MALAYSIA: return "en_MY"; - case SUBLANG_ENGLISH_SINGAPORE: return "en_SG"; - } - return "en"; + case SUBLANG_ENGLISH_US: return N("en_US"); + case SUBLANG_ENGLISH_UK: return N("en_GB"); + case SUBLANG_ENGLISH_AUS: return N("en_AU"); + case SUBLANG_ENGLISH_CAN: return N("en_CA"); + case SUBLANG_ENGLISH_NZ: return N("en_NZ"); + case SUBLANG_ENGLISH_EIRE: return N("en_IE"); + case SUBLANG_ENGLISH_SOUTH_AFRICA: return N("en_ZA"); + case SUBLANG_ENGLISH_JAMAICA: return N("en_JM"); + case SUBLANG_ENGLISH_CARIBBEAN: return N("en_GD"); /* Grenada? */ + case SUBLANG_ENGLISH_BELIZE: return N("en_BZ"); + case SUBLANG_ENGLISH_TRINIDAD: return N("en_TT"); + case SUBLANG_ENGLISH_ZIMBABWE: return N("en_ZW"); + case SUBLANG_ENGLISH_PHILIPPINES: return N("en_PH"); + case SUBLANG_ENGLISH_INDONESIA: return N("en_ID"); + case SUBLANG_ENGLISH_HONGKONG: return N("en_HK"); + case SUBLANG_ENGLISH_INDIA: return N("en_IN"); + case SUBLANG_ENGLISH_MALAYSIA: return N("en_MY"); + case SUBLANG_ENGLISH_SINGAPORE: return N("en_SG"); + } + return N("en"); case LANG_ESTONIAN: switch (sub) { - case SUBLANG_ESTONIAN_ESTONIA: return "et_EE"; + case SUBLANG_ESTONIAN_ESTONIA: return N("et_EE"); } - return "et"; + return N("et"); case LANG_FAEROESE: switch (sub) { - case SUBLANG_FAEROESE_FAROE_ISLANDS: return "fo_FO"; + case SUBLANG_FAEROESE_FAROE_ISLANDS: return N("fo_FO"); } - return "fo"; + return N("fo"); case LANG_FARSI: switch (sub) { - case SUBLANG_FARSI_IRAN: return "fa_IR"; + case SUBLANG_FARSI_IRAN: return N("fa_IR"); } - return "fa"; + return N("fa"); case LANG_FINNISH: switch (sub) { - case SUBLANG_FINNISH_FINLAND: return "fi_FI"; + case SUBLANG_FINNISH_FINLAND: return N("fi_FI"); } - return "fi"; + return N("fi"); case LANG_FRENCH: switch (sub) { - case SUBLANG_FRENCH: return "fr_FR"; - case SUBLANG_FRENCH_BELGIAN: /* WALLOON */ return "fr_BE"; - case SUBLANG_FRENCH_CANADIAN: return "fr_CA"; - case SUBLANG_FRENCH_SWISS: return "fr_CH"; - case SUBLANG_FRENCH_LUXEMBOURG: return "fr_LU"; - case SUBLANG_FRENCH_MONACO: return "fr_MC"; - case SUBLANG_FRENCH_WESTINDIES: return "fr"; /* Caribbean? */ - case SUBLANG_FRENCH_REUNION: return "fr_RE"; - case SUBLANG_FRENCH_CONGO: return "fr_CG"; - case SUBLANG_FRENCH_SENEGAL: return "fr_SN"; - case SUBLANG_FRENCH_CAMEROON: return "fr_CM"; - case SUBLANG_FRENCH_COTEDIVOIRE: return "fr_CI"; - case SUBLANG_FRENCH_MALI: return "fr_ML"; - case SUBLANG_FRENCH_MOROCCO: return "fr_MA"; - case SUBLANG_FRENCH_HAITI: return "fr_HT"; - } - return "fr"; + case SUBLANG_FRENCH: return N("fr_FR"); + case SUBLANG_FRENCH_BELGIAN: /* WALLOON */ return N("fr_BE"); + case SUBLANG_FRENCH_CANADIAN: return N("fr_CA"); + case SUBLANG_FRENCH_SWISS: return N("fr_CH"); + case SUBLANG_FRENCH_LUXEMBOURG: return N("fr_LU"); + case SUBLANG_FRENCH_MONACO: return N("fr_MC"); + case SUBLANG_FRENCH_WESTINDIES: return N("fr"); /* Caribbean? */ + case SUBLANG_FRENCH_REUNION: return N("fr_RE"); + case SUBLANG_FRENCH_CONGO: return N("fr_CG"); + case SUBLANG_FRENCH_SENEGAL: return N("fr_SN"); + case SUBLANG_FRENCH_CAMEROON: return N("fr_CM"); + case SUBLANG_FRENCH_COTEDIVOIRE: return N("fr_CI"); + case SUBLANG_FRENCH_MALI: return N("fr_ML"); + case SUBLANG_FRENCH_MOROCCO: return N("fr_MA"); + case SUBLANG_FRENCH_HAITI: return N("fr_HT"); + } + return N("fr"); case LANG_FRISIAN: switch (sub) { - case SUBLANG_FRISIAN_NETHERLANDS: return "fy_NL"; + case SUBLANG_FRISIAN_NETHERLANDS: return N("fy_NL"); } - return "fy"; + return N("fy"); case LANG_FULFULDE: /* Spoken in Nigeria, Guinea, Senegal, Mali, Niger, Cameroon, Benin. */ switch (sub) { - case SUBLANG_DEFAULT: return "ff_NG"; + case SUBLANG_DEFAULT: return N("ff_NG"); } - return "ff"; + return N("ff"); case LANG_GAELIC: switch (sub) { case 0x01: /* SCOTTISH */ /* old, superseded by LANG_SCOTTISH_GAELIC */ - return "gd_GB"; - case SUBLANG_IRISH_IRELAND: return "ga_IE"; + return N("gd_GB"); + case SUBLANG_IRISH_IRELAND: return N("ga_IE"); } - return "ga"; + return N("ga"); case LANG_GALICIAN: switch (sub) { - case SUBLANG_GALICIAN_SPAIN: return "gl_ES"; + case SUBLANG_GALICIAN_SPAIN: return N("gl_ES"); } - return "gl"; + return N("gl"); case LANG_GEORGIAN: switch (sub) { - case SUBLANG_GEORGIAN_GEORGIA: return "ka_GE"; + case SUBLANG_GEORGIAN_GEORGIA: return N("ka_GE"); } - return "ka"; + return N("ka"); case LANG_GERMAN: switch (sub) { - case SUBLANG_GERMAN: return "de_DE"; - case SUBLANG_GERMAN_SWISS: return "de_CH"; - case SUBLANG_GERMAN_AUSTRIAN: return "de_AT"; - case SUBLANG_GERMAN_LUXEMBOURG: return "de_LU"; - case SUBLANG_GERMAN_LIECHTENSTEIN: return "de_LI"; + case SUBLANG_GERMAN: return N("de_DE"); + case SUBLANG_GERMAN_SWISS: return N("de_CH"); + case SUBLANG_GERMAN_AUSTRIAN: return N("de_AT"); + case SUBLANG_GERMAN_LUXEMBOURG: return N("de_LU"); + case SUBLANG_GERMAN_LIECHTENSTEIN: return N("de_LI"); } - return "de"; + return N("de"); case LANG_GREEK: switch (sub) { - case SUBLANG_GREEK_GREECE: return "el_GR"; + case SUBLANG_GREEK_GREECE: return N("el_GR"); } - return "el"; + return N("el"); case LANG_GREENLANDIC: switch (sub) { - case SUBLANG_GREENLANDIC_GREENLAND: return "kl_GL"; + case SUBLANG_GREENLANDIC_GREENLAND: return N("kl_GL"); } - return "kl"; + return N("kl"); case LANG_GUARANI: switch (sub) { - case SUBLANG_DEFAULT: return "gn_PY"; + case SUBLANG_DEFAULT: return N("gn_PY"); } - return "gn"; + return N("gn"); case LANG_GUJARATI: switch (sub) { - case SUBLANG_GUJARATI_INDIA: return "gu_IN"; + case SUBLANG_GUJARATI_INDIA: return N("gu_IN"); } - return "gu"; + return N("gu"); case LANG_HAUSA: switch (sub) { - case 0x1f: return "ha"; - case SUBLANG_HAUSA_NIGERIA_LATIN: return "ha_NG"; + case 0x1f: return N("ha"); + case SUBLANG_HAUSA_NIGERIA_LATIN: return N("ha_NG"); } - return "ha"; + return N("ha"); case LANG_HAWAIIAN: /* FIXME: Do they mean Hawaiian ("haw_US", 1000 speakers) or Hawaii Creole English ("cpe_US", 600000 speakers)? */ switch (sub) { - case SUBLANG_DEFAULT: return "cpe_US"; + case SUBLANG_DEFAULT: return N("cpe_US"); } - return "cpe"; + return N("cpe"); case LANG_HEBREW: switch (sub) { - case SUBLANG_HEBREW_ISRAEL: return "he_IL"; + case SUBLANG_HEBREW_ISRAEL: return N("he_IL"); } - return "he"; + return N("he"); case LANG_HINDI: switch (sub) { - case SUBLANG_HINDI_INDIA: return "hi_IN"; + case SUBLANG_HINDI_INDIA: return N("hi_IN"); } - return "hi"; + return N("hi"); case LANG_HUNGARIAN: switch (sub) { - case SUBLANG_HUNGARIAN_HUNGARY: return "hu_HU"; + case SUBLANG_HUNGARIAN_HUNGARY: return N("hu_HU"); } - return "hu"; + return N("hu"); case LANG_IBIBIO: switch (sub) { - case SUBLANG_DEFAULT: return "nic_NG"; + case SUBLANG_DEFAULT: return N("nic_NG"); } - return "nic"; + return N("nic"); case LANG_ICELANDIC: switch (sub) { - case SUBLANG_ICELANDIC_ICELAND: return "is_IS"; + case SUBLANG_ICELANDIC_ICELAND: return N("is_IS"); } - return "is"; + return N("is"); case LANG_IGBO: switch (sub) { - case SUBLANG_IGBO_NIGERIA: return "ig_NG"; + case SUBLANG_IGBO_NIGERIA: return N("ig_NG"); } - return "ig"; + return N("ig"); case LANG_INDONESIAN: switch (sub) { - case SUBLANG_INDONESIAN_INDONESIA: return "id_ID"; + case SUBLANG_INDONESIAN_INDONESIA: return N("id_ID"); } - return "id"; + return N("id"); case LANG_INUKTITUT: switch (sub) { - case 0x1e: return "iu"; /* syllabic */ - case SUBLANG_INUKTITUT_CANADA: return "iu_CA"; /* syllabic */ - case 0x1f: return "iu@latin"; - case SUBLANG_INUKTITUT_CANADA_LATIN: return "iu_CA@latin"; + case 0x1e: return N("iu"); /* syllabic */ + case SUBLANG_INUKTITUT_CANADA: return N("iu_CA"); /* syllabic */ + case 0x1f: return N("iu@latin"); + case SUBLANG_INUKTITUT_CANADA_LATIN: return N("iu_CA@latin"); } - return "iu"; + return N("iu"); case LANG_ITALIAN: switch (sub) { - case SUBLANG_ITALIAN: return "it_IT"; - case SUBLANG_ITALIAN_SWISS: return "it_CH"; + case SUBLANG_ITALIAN: return N("it_IT"); + case SUBLANG_ITALIAN_SWISS: return N("it_CH"); } - return "it"; + return N("it"); case LANG_JAPANESE: switch (sub) { - case SUBLANG_JAPANESE_JAPAN: return "ja_JP"; + case SUBLANG_JAPANESE_JAPAN: return N("ja_JP"); } - return "ja"; + return N("ja"); case LANG_KANNADA: switch (sub) { - case SUBLANG_KANNADA_INDIA: return "kn_IN"; + case SUBLANG_KANNADA_INDIA: return N("kn_IN"); } - return "kn"; + return N("kn"); case LANG_KANURI: switch (sub) { - case SUBLANG_DEFAULT: return "kr_NG"; + case SUBLANG_DEFAULT: return N("kr_NG"); } - return "kr"; + return N("kr"); case LANG_KASHMIRI: switch (sub) { - case SUBLANG_DEFAULT: return "ks_PK"; - case SUBLANG_KASHMIRI_INDIA: return "ks_IN"; + case SUBLANG_DEFAULT: return N("ks_PK"); + case SUBLANG_KASHMIRI_INDIA: return N("ks_IN"); } - return "ks"; + return N("ks"); case LANG_KAZAK: switch (sub) { - case SUBLANG_KAZAK_KAZAKHSTAN: return "kk_KZ"; + case SUBLANG_KAZAK_KAZAKHSTAN: return N("kk_KZ"); } - return "kk"; + return N("kk"); case LANG_KICHE: /* FIXME: Adjust this when such locales appear on Unix. */ switch (sub) { - case SUBLANG_KICHE_GUATEMALA: return "qut_GT"; + case SUBLANG_KICHE_GUATEMALA: return N("qut_GT"); } - return "qut"; + return N("qut"); case LANG_KINYARWANDA: switch (sub) { - case SUBLANG_KINYARWANDA_RWANDA: return "rw_RW"; + case SUBLANG_KINYARWANDA_RWANDA: return N("rw_RW"); } - return "rw"; + return N("rw"); case LANG_KONKANI: switch (sub) { - case SUBLANG_KONKANI_INDIA: return "kok_IN"; + case SUBLANG_KONKANI_INDIA: return N("kok_IN"); } - return "kok"; + return N("kok"); case LANG_KOREAN: switch (sub) { - case SUBLANG_DEFAULT: return "ko_KR"; + case SUBLANG_DEFAULT: return N("ko_KR"); } - return "ko"; + return N("ko"); case LANG_KYRGYZ: switch (sub) { - case SUBLANG_KYRGYZ_KYRGYZSTAN: return "ky_KG"; + case SUBLANG_KYRGYZ_KYRGYZSTAN: return N("ky_KG"); } - return "ky"; + return N("ky"); case LANG_LAO: switch (sub) { - case SUBLANG_LAO_LAOS: return "lo_LA"; + case SUBLANG_LAO_LAOS: return N("lo_LA"); } - return "lo"; + return N("lo"); case LANG_LATIN: switch (sub) { - case SUBLANG_DEFAULT: return "la_VA"; + case SUBLANG_DEFAULT: return N("la_VA"); } - return "la"; + return N("la"); case LANG_LATVIAN: switch (sub) { - case SUBLANG_LATVIAN_LATVIA: return "lv_LV"; + case SUBLANG_LATVIAN_LATVIA: return N("lv_LV"); } - return "lv"; + return N("lv"); case LANG_LITHUANIAN: switch (sub) { - case SUBLANG_LITHUANIAN_LITHUANIA: return "lt_LT"; + case SUBLANG_LITHUANIAN_LITHUANIA: return N("lt_LT"); } - return "lt"; + return N("lt"); case LANG_LUXEMBOURGISH: switch (sub) { - case SUBLANG_LUXEMBOURGISH_LUXEMBOURG: return "lb_LU"; + case SUBLANG_LUXEMBOURGISH_LUXEMBOURG: return N("lb_LU"); } - return "lb"; + return N("lb"); case LANG_MACEDONIAN: switch (sub) { - case SUBLANG_MACEDONIAN_MACEDONIA: return "mk_MK"; + case SUBLANG_MACEDONIAN_MACEDONIA: return N("mk_MK"); } - return "mk"; + return N("mk"); case LANG_MALAY: switch (sub) { - case SUBLANG_MALAY_MALAYSIA: return "ms_MY"; - case SUBLANG_MALAY_BRUNEI_DARUSSALAM: return "ms_BN"; + case SUBLANG_MALAY_MALAYSIA: return N("ms_MY"); + case SUBLANG_MALAY_BRUNEI_DARUSSALAM: return N("ms_BN"); } - return "ms"; + return N("ms"); case LANG_MALAYALAM: switch (sub) { - case SUBLANG_MALAYALAM_INDIA: return "ml_IN"; + case SUBLANG_MALAYALAM_INDIA: return N("ml_IN"); } - return "ml"; + return N("ml"); case LANG_MALTESE: switch (sub) { - case SUBLANG_MALTESE_MALTA: return "mt_MT"; + case SUBLANG_MALTESE_MALTA: return N("mt_MT"); } - return "mt"; + return N("mt"); case LANG_MANIPURI: switch (sub) { - case SUBLANG_DEFAULT: return "mni_IN"; + case SUBLANG_DEFAULT: return N("mni_IN"); } - return "mni"; + return N("mni"); case LANG_MAORI: switch (sub) { - case SUBLANG_MAORI_NEW_ZEALAND: return "mi_NZ"; + case SUBLANG_MAORI_NEW_ZEALAND: return N("mi_NZ"); } - return "mi"; + return N("mi"); case LANG_MAPUDUNGUN: switch (sub) { - case SUBLANG_MAPUDUNGUN_CHILE: return "arn_CL"; + case SUBLANG_MAPUDUNGUN_CHILE: return N("arn_CL"); } - return "arn"; + return N("arn"); case LANG_MARATHI: switch (sub) { - case SUBLANG_MARATHI_INDIA: return "mr_IN"; + case SUBLANG_MARATHI_INDIA: return N("mr_IN"); } - return "mr"; + return N("mr"); case LANG_MOHAWK: switch (sub) { - case SUBLANG_MOHAWK_CANADA: return "moh_CA"; + case SUBLANG_MOHAWK_CANADA: return N("moh_CA"); } - return "moh"; + return N("moh"); case LANG_MONGOLIAN: switch (sub) { - case SUBLANG_MONGOLIAN_CYRILLIC_MONGOLIA: case 0x1e: return "mn_MN"; - case SUBLANG_MONGOLIAN_PRC: case 0x1f: return "mn_CN"; + case SUBLANG_MONGOLIAN_CYRILLIC_MONGOLIA: case 0x1e: return N("mn_MN"); + case SUBLANG_MONGOLIAN_PRC: case 0x1f: return N("mn_CN"); } - return "mn"; /* Ambiguous: could be "mn_CN" or "mn_MN". */ + return N("mn"); /* Ambiguous: could be "mn_CN" or "mn_MN". */ case LANG_NEPALI: switch (sub) { - case SUBLANG_NEPALI_NEPAL: return "ne_NP"; - case SUBLANG_NEPALI_INDIA: return "ne_IN"; + case SUBLANG_NEPALI_NEPAL: return N("ne_NP"); + case SUBLANG_NEPALI_INDIA: return N("ne_IN"); } - return "ne"; + return N("ne"); case LANG_NORWEGIAN: switch (sub) { - case 0x1f: return "nb"; - case SUBLANG_NORWEGIAN_BOKMAL: return "nb_NO"; - case 0x1e: return "nn"; - case SUBLANG_NORWEGIAN_NYNORSK: return "nn_NO"; + case 0x1f: return N("nb"); + case SUBLANG_NORWEGIAN_BOKMAL: return N("nb_NO"); + case 0x1e: return N("nn"); + case SUBLANG_NORWEGIAN_NYNORSK: return N("nn_NO"); } - return "no"; + return N("no"); case LANG_OCCITAN: switch (sub) { - case SUBLANG_OCCITAN_FRANCE: return "oc_FR"; + case SUBLANG_OCCITAN_FRANCE: return N("oc_FR"); } - return "oc"; + return N("oc"); case LANG_ORIYA: switch (sub) { - case SUBLANG_ORIYA_INDIA: return "or_IN"; + case SUBLANG_ORIYA_INDIA: return N("or_IN"); } - return "or"; + return N("or"); case LANG_OROMO: switch (sub) { - case SUBLANG_DEFAULT: return "om_ET"; + case SUBLANG_DEFAULT: return N("om_ET"); } - return "om"; + return N("om"); case LANG_PAPIAMENTU: switch (sub) { - case SUBLANG_DEFAULT: return "pap_AN"; + case SUBLANG_DEFAULT: return N("pap_AN"); } - return "pap"; + return N("pap"); case LANG_PASHTO: switch (sub) { - case SUBLANG_PASHTO_AFGHANISTAN: return "ps_AF"; + case SUBLANG_PASHTO_AFGHANISTAN: return N("ps_AF"); } - return "ps"; /* Ambiguous: could be "ps_PK" or "ps_AF". */ + return N("ps"); /* Ambiguous: could be "ps_PK" or "ps_AF". */ case LANG_POLISH: switch (sub) { - case SUBLANG_POLISH_POLAND: return "pl_PL"; + case SUBLANG_POLISH_POLAND: return N("pl_PL"); } - return "pl"; + return N("pl"); case LANG_PORTUGUESE: switch (sub) { /* Hmm. SUBLANG_PORTUGUESE_BRAZILIAN == SUBLANG_DEFAULT. Same phenomenon as SUBLANG_ENGLISH_US == SUBLANG_DEFAULT. */ - case SUBLANG_PORTUGUESE_BRAZILIAN: return "pt_BR"; - case SUBLANG_PORTUGUESE: return "pt_PT"; + case SUBLANG_PORTUGUESE_BRAZILIAN: return N("pt_BR"); + case SUBLANG_PORTUGUESE: return N("pt_PT"); } - return "pt"; + return N("pt"); case LANG_PUNJABI: switch (sub) { - case SUBLANG_PUNJABI_INDIA: return "pa_IN"; /* Gurmukhi script */ - case SUBLANG_PUNJABI_PAKISTAN: return "pa_PK"; /* Arabic script */ + case SUBLANG_PUNJABI_INDIA: return N("pa_IN"); /* Gurmukhi script */ + case SUBLANG_PUNJABI_PAKISTAN: return N("pa_PK"); /* Arabic script */ } - return "pa"; + return N("pa"); case LANG_QUECHUA: /* Note: Microsoft uses the non-ISO language code "quz". */ switch (sub) { - case SUBLANG_QUECHUA_BOLIVIA: return "qu_BO"; - case SUBLANG_QUECHUA_ECUADOR: return "qu_EC"; - case SUBLANG_QUECHUA_PERU: return "qu_PE"; + case SUBLANG_QUECHUA_BOLIVIA: return N("qu_BO"); + case SUBLANG_QUECHUA_ECUADOR: return N("qu_EC"); + case SUBLANG_QUECHUA_PERU: return N("qu_PE"); } - return "qu"; + return N("qu"); case LANG_ROMANIAN: switch (sub) { - case SUBLANG_ROMANIAN_ROMANIA: return "ro_RO"; - case SUBLANG_ROMANIAN_MOLDOVA: return "ro_MD"; + case SUBLANG_ROMANIAN_ROMANIA: return N("ro_RO"); + case SUBLANG_ROMANIAN_MOLDOVA: return N("ro_MD"); } - return "ro"; + return N("ro"); case LANG_ROMANSH: switch (sub) { - case SUBLANG_ROMANSH_SWITZERLAND: return "rm_CH"; + case SUBLANG_ROMANSH_SWITZERLAND: return N("rm_CH"); } - return "rm"; + return N("rm"); case LANG_RUSSIAN: switch (sub) { - case SUBLANG_RUSSIAN_RUSSIA: return "ru_RU"; - case SUBLANG_RUSSIAN_MOLDAVIA: return "ru_MD"; + case SUBLANG_RUSSIAN_RUSSIA: return N("ru_RU"); + case SUBLANG_RUSSIAN_MOLDAVIA: return N("ru_MD"); } - return "ru"; /* Ambiguous: could be "ru_RU" or "ru_UA" or "ru_MD". */ + return N("ru"); /* Ambiguous: could be "ru_RU" or "ru_UA" or "ru_MD". */ case LANG_SAMI: switch (sub) { /* Northern Sami */ - case 0x00: return "se"; - case SUBLANG_SAMI_NORTHERN_NORWAY: return "se_NO"; - case SUBLANG_SAMI_NORTHERN_SWEDEN: return "se_SE"; - case SUBLANG_SAMI_NORTHERN_FINLAND: return "se_FI"; + case 0x00: return N("se"); + case SUBLANG_SAMI_NORTHERN_NORWAY: return N("se_NO"); + case SUBLANG_SAMI_NORTHERN_SWEDEN: return N("se_SE"); + case SUBLANG_SAMI_NORTHERN_FINLAND: return N("se_FI"); /* Lule Sami */ - case 0x1f: return "smj"; - case SUBLANG_SAMI_LULE_NORWAY: return "smj_NO"; - case SUBLANG_SAMI_LULE_SWEDEN: return "smj_SE"; + case 0x1f: return N("smj"); + case SUBLANG_SAMI_LULE_NORWAY: return N("smj_NO"); + case SUBLANG_SAMI_LULE_SWEDEN: return N("smj_SE"); /* Southern Sami */ - case 0x1e: return "sma"; - case SUBLANG_SAMI_SOUTHERN_NORWAY: return "sma_NO"; - case SUBLANG_SAMI_SOUTHERN_SWEDEN: return "sma_SE"; + case 0x1e: return N("sma"); + case SUBLANG_SAMI_SOUTHERN_NORWAY: return N("sma_NO"); + case SUBLANG_SAMI_SOUTHERN_SWEDEN: return N("sma_SE"); /* Skolt Sami */ - case 0x1d: return "sms"; - case SUBLANG_SAMI_SKOLT_FINLAND: return "sms_FI"; + case 0x1d: return N("sms"); + case SUBLANG_SAMI_SKOLT_FINLAND: return N("sms_FI"); /* Inari Sami */ - case 0x1c: return "smn"; - case SUBLANG_SAMI_INARI_FINLAND: return "smn_FI"; + case 0x1c: return N("smn"); + case SUBLANG_SAMI_INARI_FINLAND: return N("smn_FI"); } - return "se"; /* or "smi"? */ + return N("se"); /* or "smi"? */ case LANG_SANSKRIT: switch (sub) { - case SUBLANG_SANSKRIT_INDIA: return "sa_IN"; + case SUBLANG_SANSKRIT_INDIA: return N("sa_IN"); } - return "sa"; + return N("sa"); case LANG_SCOTTISH_GAELIC: switch (sub) { - case SUBLANG_DEFAULT: return "gd_GB"; + case SUBLANG_DEFAULT: return N("gd_GB"); } - return "gd"; + return N("gd"); case LANG_SINDHI: switch (sub) { - case SUBLANG_SINDHI_INDIA: return "sd_IN"; - case SUBLANG_SINDHI_PAKISTAN: return "sd_PK"; - /*case SUBLANG_SINDHI_AFGHANISTAN: return "sd_AF";*/ + case SUBLANG_SINDHI_INDIA: return N("sd_IN"); + case SUBLANG_SINDHI_PAKISTAN: return N("sd_PK"); + /*case SUBLANG_SINDHI_AFGHANISTAN: return N("sd_AF");*/ } - return "sd"; + return N("sd"); case LANG_SINHALESE: switch (sub) { - case SUBLANG_SINHALESE_SRI_LANKA: return "si_LK"; + case SUBLANG_SINHALESE_SRI_LANKA: return N("si_LK"); } - return "si"; + return N("si"); case LANG_SLOVAK: switch (sub) { - case SUBLANG_SLOVAK_SLOVAKIA: return "sk_SK"; + case SUBLANG_SLOVAK_SLOVAKIA: return N("sk_SK"); } - return "sk"; + return N("sk"); case LANG_SLOVENIAN: switch (sub) { - case SUBLANG_SLOVENIAN_SLOVENIA: return "sl_SI"; + case SUBLANG_SLOVENIAN_SLOVENIA: return N("sl_SI"); } - return "sl"; + return N("sl"); case LANG_SOMALI: switch (sub) { - case SUBLANG_DEFAULT: return "so_SO"; + case SUBLANG_DEFAULT: return N("so_SO"); } - return "so"; + return N("so"); case LANG_SORBIAN: switch (sub) { /* Upper Sorbian */ - case 0x00: return "hsb"; - case SUBLANG_UPPER_SORBIAN_GERMANY: return "hsb_DE"; + case 0x00: return N("hsb"); + case SUBLANG_UPPER_SORBIAN_GERMANY: return N("hsb_DE"); /* Lower Sorbian */ - case 0x1f: return "dsb"; - case SUBLANG_LOWER_SORBIAN_GERMANY: return "dsb_DE"; + case 0x1f: return N("dsb"); + case SUBLANG_LOWER_SORBIAN_GERMANY: return N("dsb_DE"); } - return "wen"; + return N("wen"); case LANG_SOTHO: /* <https://docs.microsoft.com/en-us/windows/desktop/Intl/language-identifier-constants-and-strings> calls it "Sesotho sa Leboa"; according to @@ -2303,240 +2308,241 @@ gl_locale_name_from_win32_LANGID (LANGID langid) it's the same as Northern Sotho. */ switch (sub) { - case SUBLANG_SOTHO_SOUTH_AFRICA: return "nso_ZA"; + case SUBLANG_SOTHO_SOUTH_AFRICA: return N("nso_ZA"); } - return "nso"; + return N("nso"); case LANG_SPANISH: switch (sub) { - case SUBLANG_SPANISH: return "es_ES"; - case SUBLANG_SPANISH_MEXICAN: return "es_MX"; + case SUBLANG_SPANISH: return N("es_ES"); + case SUBLANG_SPANISH_MEXICAN: return N("es_MX"); case SUBLANG_SPANISH_MODERN: - return "es_ES@modern"; /* not seen on Unix */ - case SUBLANG_SPANISH_GUATEMALA: return "es_GT"; - case SUBLANG_SPANISH_COSTA_RICA: return "es_CR"; - case SUBLANG_SPANISH_PANAMA: return "es_PA"; - case SUBLANG_SPANISH_DOMINICAN_REPUBLIC: return "es_DO"; - case SUBLANG_SPANISH_VENEZUELA: return "es_VE"; - case SUBLANG_SPANISH_COLOMBIA: return "es_CO"; - case SUBLANG_SPANISH_PERU: return "es_PE"; - case SUBLANG_SPANISH_ARGENTINA: return "es_AR"; - case SUBLANG_SPANISH_ECUADOR: return "es_EC"; - case SUBLANG_SPANISH_CHILE: return "es_CL"; - case SUBLANG_SPANISH_URUGUAY: return "es_UY"; - case SUBLANG_SPANISH_PARAGUAY: return "es_PY"; - case SUBLANG_SPANISH_BOLIVIA: return "es_BO"; - case SUBLANG_SPANISH_EL_SALVADOR: return "es_SV"; - case SUBLANG_SPANISH_HONDURAS: return "es_HN"; - case SUBLANG_SPANISH_NICARAGUA: return "es_NI"; - case SUBLANG_SPANISH_PUERTO_RICO: return "es_PR"; - case SUBLANG_SPANISH_US: return "es_US"; - } - return "es"; + return N("es_ES@modern"); /* not seen on Unix */ + case SUBLANG_SPANISH_GUATEMALA: return N("es_GT"); + case SUBLANG_SPANISH_COSTA_RICA: return N("es_CR"); + case SUBLANG_SPANISH_PANAMA: return N("es_PA"); + case SUBLANG_SPANISH_DOMINICAN_REPUBLIC: return N("es_DO"); + case SUBLANG_SPANISH_VENEZUELA: return N("es_VE"); + case SUBLANG_SPANISH_COLOMBIA: return N("es_CO"); + case SUBLANG_SPANISH_PERU: return N("es_PE"); + case SUBLANG_SPANISH_ARGENTINA: return N("es_AR"); + case SUBLANG_SPANISH_ECUADOR: return N("es_EC"); + case SUBLANG_SPANISH_CHILE: return N("es_CL"); + case SUBLANG_SPANISH_URUGUAY: return N("es_UY"); + case SUBLANG_SPANISH_PARAGUAY: return N("es_PY"); + case SUBLANG_SPANISH_BOLIVIA: return N("es_BO"); + case SUBLANG_SPANISH_EL_SALVADOR: return N("es_SV"); + case SUBLANG_SPANISH_HONDURAS: return N("es_HN"); + case SUBLANG_SPANISH_NICARAGUA: return N("es_NI"); + case SUBLANG_SPANISH_PUERTO_RICO: return N("es_PR"); + case SUBLANG_SPANISH_US: return N("es_US"); + } + return N("es"); case LANG_SUTU: switch (sub) { - case SUBLANG_DEFAULT: return "bnt_TZ"; /* or "st_LS" or "nso_ZA"? */ + case SUBLANG_DEFAULT: return N("bnt_TZ"); /* or "st_LS" or "nso_ZA"? */ } - return "bnt"; + return N("bnt"); case LANG_SWAHILI: switch (sub) { - case SUBLANG_SWAHILI_KENYA: return "sw_KE"; + case SUBLANG_SWAHILI_KENYA: return N("sw_KE"); } - return "sw"; + return N("sw"); case LANG_SWEDISH: switch (sub) { - case SUBLANG_SWEDISH_SWEDEN: return "sv_SE"; - case SUBLANG_SWEDISH_FINLAND: return "sv_FI"; + case SUBLANG_SWEDISH_SWEDEN: return N("sv_SE"); + case SUBLANG_SWEDISH_FINLAND: return N("sv_FI"); } - return "sv"; + return N("sv"); case LANG_SYRIAC: switch (sub) { - case SUBLANG_SYRIAC_SYRIA: return "syr_SY"; /* An extinct language. */ + case SUBLANG_SYRIAC_SYRIA: return N("syr_SY"); /* An extinct language. */ } - return "syr"; + return N("syr"); case LANG_TAGALOG: switch (sub) { - case SUBLANG_TAGALOG_PHILIPPINES: return "tl_PH"; /* or "fil_PH"? */ + case SUBLANG_TAGALOG_PHILIPPINES: return N("tl_PH"); /* or "fil_PH"? */ } - return "tl"; /* or "fil"? */ + return N("tl"); /* or "fil"? */ case LANG_TAJIK: switch (sub) { - case 0x1f: return "tg"; - case SUBLANG_TAJIK_TAJIKISTAN: return "tg_TJ"; + case 0x1f: return N("tg"); + case SUBLANG_TAJIK_TAJIKISTAN: return N("tg_TJ"); } - return "tg"; + return N("tg"); case LANG_TAMAZIGHT: /* Note: Microsoft uses the non-ISO language code "tmz". */ switch (sub) { - case SUBLANG_TAMAZIGHT_ARABIC: return "ber_MA"; - case 0x1f: return "ber@latin"; - case SUBLANG_TAMAZIGHT_ALGERIA_LATIN: return "ber_DZ"; + case SUBLANG_TAMAZIGHT_ARABIC: return N("ber_MA"); + case 0x1f: return N("ber@latin"); + case SUBLANG_TAMAZIGHT_ALGERIA_LATIN: return N("ber_DZ"); } - return "ber"; + return N("ber"); case LANG_TAMIL: switch (sub) { - case SUBLANG_TAMIL_INDIA: return "ta_IN"; + case SUBLANG_TAMIL_INDIA: return N("ta_IN"); } - return "ta"; /* Ambiguous: could be "ta_IN" or "ta_LK" or "ta_SG". */ + return N("ta"); /* Ambiguous: could be "ta_IN" or "ta_LK" or "ta_SG". */ case LANG_TATAR: switch (sub) { - case SUBLANG_TATAR_RUSSIA: return "tt_RU"; + case SUBLANG_TATAR_RUSSIA: return N("tt_RU"); } - return "tt"; + return N("tt"); case LANG_TELUGU: switch (sub) { - case SUBLANG_TELUGU_INDIA: return "te_IN"; + case SUBLANG_TELUGU_INDIA: return N("te_IN"); } - return "te"; + return N("te"); case LANG_THAI: switch (sub) { - case SUBLANG_THAI_THAILAND: return "th_TH"; + case SUBLANG_THAI_THAILAND: return N("th_TH"); } - return "th"; + return N("th"); case LANG_TIBETAN: switch (sub) { case SUBLANG_TIBETAN_PRC: /* Most Tibetans would not like "bo_CN". But Tibet does not yet have a country code of its own. */ - return "bo"; - case SUBLANG_TIBETAN_BHUTAN: return "bo_BT"; + return N("bo"); + case SUBLANG_TIBETAN_BHUTAN: return N("bo_BT"); } - return "bo"; + return N("bo"); case LANG_TIGRINYA: switch (sub) { - case SUBLANG_TIGRINYA_ETHIOPIA: return "ti_ET"; - case SUBLANG_TIGRINYA_ERITREA: return "ti_ER"; + case SUBLANG_TIGRINYA_ETHIOPIA: return N("ti_ET"); + case SUBLANG_TIGRINYA_ERITREA: return N("ti_ER"); } - return "ti"; + return N("ti"); case LANG_TSONGA: switch (sub) { - case SUBLANG_DEFAULT: return "ts_ZA"; + case SUBLANG_DEFAULT: return N("ts_ZA"); } - return "ts"; + return N("ts"); case LANG_TSWANA: /* Spoken in South Africa, Botswana. */ switch (sub) { - case SUBLANG_TSWANA_SOUTH_AFRICA: return "tn_ZA"; + case SUBLANG_TSWANA_SOUTH_AFRICA: return N("tn_ZA"); } - return "tn"; + return N("tn"); case LANG_TURKISH: switch (sub) { - case SUBLANG_TURKISH_TURKEY: return "tr_TR"; + case SUBLANG_TURKISH_TURKEY: return N("tr_TR"); } - return "tr"; + return N("tr"); case LANG_TURKMEN: switch (sub) { - case SUBLANG_TURKMEN_TURKMENISTAN: return "tk_TM"; + case SUBLANG_TURKMEN_TURKMENISTAN: return N("tk_TM"); } - return "tk"; + return N("tk"); case LANG_UIGHUR: switch (sub) { - case SUBLANG_UIGHUR_PRC: return "ug_CN"; + case SUBLANG_UIGHUR_PRC: return N("ug_CN"); } - return "ug"; + return N("ug"); case LANG_UKRAINIAN: switch (sub) { - case SUBLANG_UKRAINIAN_UKRAINE: return "uk_UA"; + case SUBLANG_UKRAINIAN_UKRAINE: return N("uk_UA"); } - return "uk"; + return N("uk"); case LANG_URDU: switch (sub) { - case SUBLANG_URDU_PAKISTAN: return "ur_PK"; - case SUBLANG_URDU_INDIA: return "ur_IN"; + case SUBLANG_URDU_PAKISTAN: return N("ur_PK"); + case SUBLANG_URDU_INDIA: return N("ur_IN"); } - return "ur"; + return N("ur"); case LANG_UZBEK: switch (sub) { - case 0x1f: return "uz"; - case SUBLANG_UZBEK_LATIN: return "uz_UZ"; - case 0x1e: return "uz@cyrillic"; - case SUBLANG_UZBEK_CYRILLIC: return "uz_UZ@cyrillic"; + case 0x1f: return N("uz"); + case SUBLANG_UZBEK_LATIN: return N("uz_UZ"); + case 0x1e: return N("uz@cyrillic"); + case SUBLANG_UZBEK_CYRILLIC: return N("uz_UZ@cyrillic"); } - return "uz"; + return N("uz"); case LANG_VENDA: switch (sub) { - case SUBLANG_DEFAULT: return "ve_ZA"; + case SUBLANG_DEFAULT: return N("ve_ZA"); } - return "ve"; + return N("ve"); case LANG_VIETNAMESE: switch (sub) { - case SUBLANG_VIETNAMESE_VIETNAM: return "vi_VN"; + case SUBLANG_VIETNAMESE_VIETNAM: return N("vi_VN"); } - return "vi"; + return N("vi"); case LANG_WELSH: switch (sub) { - case SUBLANG_WELSH_UNITED_KINGDOM: return "cy_GB"; + case SUBLANG_WELSH_UNITED_KINGDOM: return N("cy_GB"); } - return "cy"; + return N("cy"); case LANG_WOLOF: switch (sub) { - case SUBLANG_WOLOF_SENEGAL: return "wo_SN"; + case SUBLANG_WOLOF_SENEGAL: return N("wo_SN"); } - return "wo"; + return N("wo"); case LANG_XHOSA: switch (sub) { - case SUBLANG_XHOSA_SOUTH_AFRICA: return "xh_ZA"; + case SUBLANG_XHOSA_SOUTH_AFRICA: return N("xh_ZA"); } - return "xh"; + return N("xh"); case LANG_YAKUT: switch (sub) { - case SUBLANG_YAKUT_RUSSIA: return "sah_RU"; + case SUBLANG_YAKUT_RUSSIA: return N("sah_RU"); } - return "sah"; + return N("sah"); case LANG_YI: switch (sub) { - case SUBLANG_YI_PRC: return "ii_CN"; + case SUBLANG_YI_PRC: return N("ii_CN"); } - return "ii"; + return N("ii"); case LANG_YIDDISH: switch (sub) { - case SUBLANG_DEFAULT: return "yi_IL"; + case SUBLANG_DEFAULT: return N("yi_IL"); } - return "yi"; + return N("yi"); case LANG_YORUBA: switch (sub) { - case SUBLANG_YORUBA_NIGERIA: return "yo_NG"; + case SUBLANG_YORUBA_NIGERIA: return N("yo_NG"); } - return "yo"; + return N("yo"); case LANG_ZULU: switch (sub) { - case SUBLANG_ZULU_SOUTH_AFRICA: return "zu_ZA"; + case SUBLANG_ZULU_SOUTH_AFRICA: return N("zu_ZA"); } - return "zu"; - default: return "C"; + return N("zu"); + default: return N("C"); } } + #undef N } # if !defined IN_LIBINTL -- 2.43.0
>From e63eea8ea358041610c3f9a9ed4d5a1e44be5cc4 Mon Sep 17 00:00:00 2001 From: Bruno Haible <br...@clisp.org> Date: Mon, 23 Dec 2024 16:57:02 +0100 Subject: [PATCH 4/7] localename tests: Test in the UTF-8 environment on native Windows. * tests/test-localename-w32utf8.sh: New file. * tests/test-localename-w32utf8.c: New file. * modules/localename-tests (Files): Add these files and m4/windows-rc.m4, tests/windows-utf8.rc, tests/windows-utf8.manifest. (Depends-on): Add test-xfail. (configure.ac): Invoke gl_WINDOWS_RC. (Makefile.am): Arrange to compile test-localename-w32utf8 and run test-localename-w32utf8.sh. --- ChangeLog | 10 +++++++ modules/localename-tests | 15 ++++++++++ tests/test-localename-w32utf8.c | 47 ++++++++++++++++++++++++++++++++ tests/test-localename-w32utf8.sh | 7 +++++ 4 files changed, 79 insertions(+) create mode 100644 tests/test-localename-w32utf8.c create mode 100755 tests/test-localename-w32utf8.sh diff --git a/ChangeLog b/ChangeLog index d9f282c21e..fd3cf9f7ca 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,15 @@ 2024-12-23 Bruno Haible <br...@clisp.org> + localename tests: Test in the UTF-8 environment on native Windows. + * tests/test-localename-w32utf8.sh: New file. + * tests/test-localename-w32utf8.c: New file. + * modules/localename-tests (Files): Add these files and + m4/windows-rc.m4, tests/windows-utf8.rc, tests/windows-utf8.manifest. + (Depends-on): Add test-xfail. + (configure.ac): Invoke gl_WINDOWS_RC. + (Makefile.am): Arrange to compile test-localename-w32utf8 and run + test-localename-w32utf8.sh. + localename-unsafe: Support the UTF-8 environment on native Windows. * lib/localename-unsafe.c (gl_locale_name_from_win32_LANGID): Append a suffix ".UTF-8" to the result if GetACP() is UTF-8. diff --git a/modules/localename-tests b/modules/localename-tests index 0c24d5b4b6..cf4d586806 100644 --- a/modules/localename-tests +++ b/modules/localename-tests @@ -1,7 +1,12 @@ Files: tests/test-localename.c +tests/test-localename-w32utf8.sh +tests/test-localename-w32utf8.c +tests/windows-utf8.rc +tests/windows-utf8.manifest tests/macros.h m4/musl.m4 +m4/windows-rc.m4 Depends-on: locale @@ -9,13 +14,23 @@ setenv unsetenv setlocale strdup +test-xfail configure.ac: gl_CHECK_FUNCS_ANDROID([newlocale], [[#include <locale.h>]]) gl_MUSL_LIBC +gl_WINDOWS_RC Makefile.am: TESTS += test-localename check_PROGRAMS += test-localename test_localename_LDADD = $(LDADD) $(SETLOCALE_LIB) @INTL_MACOSX_LIBS@ $(LIBTHREAD) +if OS_IS_NATIVE_WINDOWS +TESTS += test-localename-w32utf8.sh +noinst_PROGRAMS += test-localename-w32utf8 +test_localename_w32utf8_LDADD = $(LDADD) test-localename-windows-utf8.res $(SETLOCALE_LIB) +test-localename-windows-utf8.res : $(srcdir)/windows-utf8.rc + $(WINDRES) -i $(srcdir)/windows-utf8.rc -o test-localename-windows-utf8.res --output-format=coff +MOSTLYCLEANFILES += test-localename-windows-utf8.res +endif diff --git a/tests/test-localename-w32utf8.c b/tests/test-localename-w32utf8.c new file mode 100644 index 0000000000..72a01c0749 --- /dev/null +++ b/tests/test-localename-w32utf8.c @@ -0,0 +1,47 @@ +/* Test of gl_locale_name function and its variants + on native Windows in the UTF-8 environment. + Copyright (C) 2024 Free Software Foundation, Inc. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <https://www.gnu.org/licenses/>. */ + +/* Written by Bruno Haible <br...@clisp.org>, 2024. */ + +#include <config.h> + +#include "localename.h" + +#include <stdio.h> +#include <string.h> + +#include "macros.h" + +int +main (void) +{ +#ifdef _UCRT + const char *name = gl_locale_name_default (); + + ASSERT (name != NULL); + + /* With the legacy system settings, expect "C.UTF-8", not "C", because "C" is + a single-byte locale. + With the modern system settings, expect some "ll_CC.UTF-8" name. */ + ASSERT (strlen (name) > 6 && strcmp (name + strlen (name)- 6, ".UTF-8") == 0); + + return test_exit_status; +#else + fputs ("Skipping test: not using the UCRT runtime\n", stderr); + return 77; +#endif +} diff --git a/tests/test-localename-w32utf8.sh b/tests/test-localename-w32utf8.sh new file mode 100755 index 0000000000..de7629c3a7 --- /dev/null +++ b/tests/test-localename-w32utf8.sh @@ -0,0 +1,7 @@ +#!/bin/sh + +# Test the UTF-8 environment on native Windows. +unset LC_ALL +unset LC_CTYPE +unset LANG +${CHECKER} ./test-localename-w32utf8${EXEEXT} -- 2.43.0
>From 00211fc69c926d6c8f6e3f3cf1d8802623db2af9 Mon Sep 17 00:00:00 2001 From: Bruno Haible <br...@clisp.org> Date: Mon, 23 Dec 2024 16:57:15 +0100 Subject: [PATCH 5/7] setlocale: Support the UTF-8 environment on native Windows. * lib/setlocale.c: Include <windows.h>. (setlocale_unixlike): In the UTF-8 environment, append a suffix ".65001" to the locale names passed to the native setlocale(). --- ChangeLog | 7 +++++++ lib/setlocale.c | 51 ++++++++++++++++++++++++++++++++++++++++++++----- 2 files changed, 53 insertions(+), 5 deletions(-) diff --git a/ChangeLog b/ChangeLog index fd3cf9f7ca..9f89cb8718 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,10 @@ +2024-12-23 Bruno Haible <br...@clisp.org> + + setlocale: Support the UTF-8 environment on native Windows. + * lib/setlocale.c: Include <windows.h>. + (setlocale_unixlike): In the UTF-8 environment, append a suffix ".65001" + to the locale names passed to the native setlocale(). + 2024-12-23 Bruno Haible <br...@clisp.org> localename tests: Test in the UTF-8 environment on native Windows. diff --git a/lib/setlocale.c b/lib/setlocale.c index 62dce81de3..3cb711d8e1 100644 --- a/lib/setlocale.c +++ b/lib/setlocale.c @@ -47,6 +47,11 @@ extern void gl_locale_name_canonicalize (char *name); #endif +#if defined _WIN32 && !defined __CYGWIN__ +# define WIN32_LEAN_AND_MEAN +# include <windows.h> +#endif + #if 1 # undef setlocale @@ -672,6 +677,7 @@ search (const struct table_entry *table, size_t table_size, const char *string, static char * setlocale_unixlike (int category, const char *locale) { + int is_utf8 = (GetACP () == 65001); char *result; char llCC_buf[64]; char ll_buf[64]; @@ -682,6 +688,15 @@ setlocale_unixlike (int category, const char *locale) if (locale != NULL && strcmp (locale, "POSIX") == 0) locale = "C"; + /* The native Windows implementation of setlocale, in the UTF-8 environment, + does not understand the locale names "C.UTF-8" or "C.utf8" or "C.65001", + but it understands "English_United States.65001", which is functionally + equivalent. */ + if (locale != NULL + && ((is_utf8 && strcmp (locale, "C") == 0) + || strcmp (locale, "C.UTF-8") == 0)) + locale = "English_United States.65001"; + /* First, try setlocale with the original argument unchanged. */ result = setlocale_mtsafe (category, locale); if (result != NULL) @@ -714,7 +729,15 @@ setlocale_unixlike (int category, const char *locale) */ if (strcmp (llCC_buf, locale) != 0) { - result = setlocale (category, llCC_buf); + if (is_utf8) + { + char buf[64+6]; + strcpy (buf, llCC_buf); + strcat (buf, ".65001"); + result = setlocale (category, buf); + } + else + result = setlocale (category, llCC_buf); if (result != NULL) return result; } @@ -731,7 +754,15 @@ setlocale_unixlike (int category, const char *locale) for (i = range.lo; i < range.hi; i++) { /* Try the replacement in language_table[i]. */ - result = setlocale (category, language_table[i].english); + if (is_utf8) + { + char buf[64+6]; + strcpy (buf, language_table[i].english); + strcat (buf, ".65001"); + result = setlocale (category, buf); + } + else + result = setlocale (category, language_table[i].english); if (result != NULL) return result; } @@ -785,13 +816,15 @@ setlocale_unixlike (int category, const char *locale) size_t part1_len = strlen (part1); const char *part2 = country_table[j].english; size_t part2_len = strlen (part2) + 1; - char buf[64+64]; + char buf[64+64+6]; if (!(part1_len + 1 + part2_len <= sizeof (buf))) abort (); memcpy (buf, part1, part1_len); buf[part1_len] = '_'; memcpy (buf + part1_len + 1, part2, part2_len); + if (is_utf8) + strcat (buf, ".65001"); /* Try the concatenated replacements. */ result = setlocale (category, buf); @@ -809,8 +842,16 @@ setlocale_unixlike (int category, const char *locale) for (i = language_range.lo; i < language_range.hi; i++) { /* Try only the language replacement. */ - result = - setlocale (category, language_table[i].english); + if (is_utf8) + { + char buf[64+6]; + strcpy (buf, language_table[i].english); + strcat (buf, ".65001"); + result = setlocale (category, buf); + } + else + result = + setlocale (category, language_table[i].english); if (result != NULL) return result; } -- 2.43.0
>From 2f4391fde8620749fb3859c568f952a958e2ca2c Mon Sep 17 00:00:00 2001 From: Bruno Haible <br...@clisp.org> Date: Mon, 23 Dec 2024 16:58:53 +0100 Subject: [PATCH 6/7] setlocale tests: Test in the UTF-8 environment on native Windows. * tests/test-setlocale-w32utf8.sh: New file. * tests/test-setlocale-w32utf8.c: New file. * modules/setlocale-tests (Files): Add these files and m4/windows-rc.m4, tests/windows-utf8.rc, tests/windows-utf8.manifest. (Depends-on): Add test-xfail. (configure.ac): Invoke gl_WINDOWS_RC. (Makefile.am): Arrange to compile test-setlocale-w32utf8 and run test-setlocale-w32utf8.sh. --- ChangeLog | 10 +++++ modules/setlocale-tests | 16 ++++++++ tests/test-setlocale-w32utf8.c | 69 +++++++++++++++++++++++++++++++++ tests/test-setlocale-w32utf8.sh | 12 ++++++ 4 files changed, 107 insertions(+) create mode 100644 tests/test-setlocale-w32utf8.c create mode 100755 tests/test-setlocale-w32utf8.sh diff --git a/ChangeLog b/ChangeLog index 9f89cb8718..c5e2e8b1b2 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,5 +1,15 @@ 2024-12-23 Bruno Haible <br...@clisp.org> + setlocale tests: Test in the UTF-8 environment on native Windows. + * tests/test-setlocale-w32utf8.sh: New file. + * tests/test-setlocale-w32utf8.c: New file. + * modules/setlocale-tests (Files): Add these files and + m4/windows-rc.m4, tests/windows-utf8.rc, tests/windows-utf8.manifest. + (Depends-on): Add test-xfail. + (configure.ac): Invoke gl_WINDOWS_RC. + (Makefile.am): Arrange to compile test-setlocale-w32utf8 and run + test-setlocale-w32utf8.sh. + setlocale: Support the UTF-8 environment on native Windows. * lib/setlocale.c: Include <windows.h>. (setlocale_unixlike): In the UTF-8 environment, append a suffix ".65001" diff --git a/modules/setlocale-tests b/modules/setlocale-tests index ad0a536bc6..23cc6ddd17 100644 --- a/modules/setlocale-tests +++ b/modules/setlocale-tests @@ -4,21 +4,28 @@ tests/test-setlocale1.c tests/test-setlocale2.sh tests/test-setlocale2.c tests/test-setlocale-w32.c +tests/test-setlocale-w32utf8.sh +tests/test-setlocale-w32utf8.c +tests/windows-utf8.rc +tests/windows-utf8.manifest tests/signature.h tests/macros.h m4/locale-fr.m4 m4/locale-ja.m4 m4/locale-zh.m4 m4/codeset.m4 +m4/windows-rc.m4 Depends-on: strdup +test-xfail configure.ac: gt_LOCALE_FR gt_LOCALE_FR_UTF8 gt_LOCALE_JA gt_LOCALE_ZH_CN +gl_WINDOWS_RC Makefile.am: TESTS += test-setlocale1.sh test-setlocale2.sh test-setlocale-w32 @@ -31,3 +38,12 @@ check_PROGRAMS += test-setlocale1 test-setlocale2 test-setlocale-w32 test_setlocale1_LDADD = $(LDADD) @SETLOCALE_LIB@ test_setlocale2_LDADD = $(LDADD) @SETLOCALE_LIB@ test_setlocale_w32_LDADD = $(LDADD) @SETLOCALE_LIB@ + +if OS_IS_NATIVE_WINDOWS +TESTS += test-setlocale-w32utf8.sh +noinst_PROGRAMS += test-setlocale-w32utf8 +test_setlocale_w32utf8_LDADD = $(LDADD) test-setlocale-windows-utf8.res $(SETLOCALE_LIB) +test-setlocale-windows-utf8.res : $(srcdir)/windows-utf8.rc + $(WINDRES) -i $(srcdir)/windows-utf8.rc -o test-setlocale-windows-utf8.res --output-format=coff +MOSTLYCLEANFILES += test-setlocale-windows-utf8.res +endif diff --git a/tests/test-setlocale-w32utf8.c b/tests/test-setlocale-w32utf8.c new file mode 100644 index 0000000000..f0bbce05b7 --- /dev/null +++ b/tests/test-setlocale-w32utf8.c @@ -0,0 +1,69 @@ +/* Test of setting the current locale + on native Windows in the UTF-8 environment. + Copyright (C) 2024 Free Software Foundation, Inc. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <https://www.gnu.org/licenses/>. */ + +/* Written by Bruno Haible <br...@clisp.org>, 2024. */ + +#include <config.h> + +#include <locale.h> + +#include <stdio.h> +#include <stdlib.h> +#include <string.h> + +int +main (void) +{ +#ifdef _UCRT + /* Test that setlocale() works as expected in a UTF-8 locale. */ + char *name; + + /* This looks at all LC_*, LANG environment variables, which are all unset + at this point. */ + if (setlocale (LC_ALL, "") == NULL) + return 1; + + name = setlocale (LC_ALL, NULL); + /* With the legacy system settings, expect some mixed locale, due to the + limitations of the native setlocale(). + With the modern system settings, expect some "ll_CC.UTF-8" name. */ + if (!((strlen (name) > 6 && strcmp (name + strlen (name) - 6, ".UTF-8") == 0) + || strcmp (name, "LC_COLLATE=English_United States.65001;" + "LC_CTYPE=English_United States.65001;" + "LC_MONETARY=English_United States.65001;" + "LC_NUMERIC=English_United States.65001;" + "LC_TIME=English_United States.65001;" + "LC_MESSAGES=C.UTF-8") + == 0 + || strcmp (name, "LC_COLLATE=English_United States.utf8;" + "LC_CTYPE=English_United States.utf8;" + "LC_MONETARY=English_United States.utf8;" + "LC_NUMERIC=English_United States.utf8;" + "LC_TIME=English_United States.utf8;" + "LC_MESSAGES=C.UTF-8") + == 0)) + { + fprintf (stderr, "setlocale() returned \"%s\".\n", name); + exit (1); + } + + return 0; +#else + fputs ("Skipping test: not using the UCRT runtime\n", stderr); + return 77; +#endif +} diff --git a/tests/test-setlocale-w32utf8.sh b/tests/test-setlocale-w32utf8.sh new file mode 100755 index 0000000000..e8f7484cf0 --- /dev/null +++ b/tests/test-setlocale-w32utf8.sh @@ -0,0 +1,12 @@ +#!/bin/sh + +# Test the UTF-8 environment on native Windows. +unset LC_ALL +unset LC_CTYPE +unset LC_MESSAGES +unset LC_NUMERIC +unset LC_COLLATE +unset LC_MONETARY +unset LC_TIME +unset LANG +${CHECKER} ./test-setlocale-w32utf8${EXEEXT} -- 2.43.0
From c11a2e675ccc8637e6322b98d878b0315a8bb7e6 Mon Sep 17 00:00:00 2001 From: Bruno Haible <br...@clisp.org> Date: Mon, 23 Dec 2024 16:59:20 +0100 Subject: [PATCH 7/7] mbrtowc tests: Test in the UTF-8 environment on native Windows. * tests/test-mbrtowc-w32utf8.sh: New file. * tests/test-mbrtowc-w32utf8.c: New file. * modules/mbrtowc-tests (Files): Add these files and m4/windows-rc.m4, tests/windows-utf8.rc, tests/windows-utf8.manifest. (Depends-on): Add test-xfail. (configure.ac): Invoke gl_WINDOWS_RC. (Makefile.am): Arrange to compile test-mbrtowc-w32utf8 and run test-mbrtowc-w32utf8.sh. --- ChangeLog | 12 +++ modules/mbrtowc-tests | 16 ++++ tests/test-mbrtowc-w32utf8.c | 166 ++++++++++++++++++++++++++++++++++ tests/test-mbrtowc-w32utf8.sh | 12 +++ 4 files changed, 206 insertions(+) create mode 100644 tests/test-mbrtowc-w32utf8.c create mode 100755 tests/test-mbrtowc-w32utf8.sh diff --git a/ChangeLog b/ChangeLog index c5e2e8b1b2..e6d2e1d592 100644 --- a/ChangeLog +++ b/ChangeLog @@ -1,3 +1,15 @@ +2024-12-23 Bruno Haible <br...@clisp.org> + + mbrtowc tests: Test in the UTF-8 environment on native Windows. + * tests/test-mbrtowc-w32utf8.sh: New file. + * tests/test-mbrtowc-w32utf8.c: New file. + * modules/mbrtowc-tests (Files): Add these files and + m4/windows-rc.m4, tests/windows-utf8.rc, tests/windows-utf8.manifest. + (Depends-on): Add test-xfail. + (configure.ac): Invoke gl_WINDOWS_RC. + (Makefile.am): Arrange to compile test-mbrtowc-w32utf8 and run + test-mbrtowc-w32utf8.sh. + 2024-12-23 Bruno Haible <br...@clisp.org> setlocale tests: Test in the UTF-8 environment on native Windows. diff --git a/modules/mbrtowc-tests b/modules/mbrtowc-tests index d152e2e472..d9add89fee 100644 --- a/modules/mbrtowc-tests +++ b/modules/mbrtowc-tests @@ -13,6 +13,10 @@ tests/test-mbrtowc-w32-6.sh tests/test-mbrtowc-w32-7.sh tests/test-mbrtowc-w32-8.sh tests/test-mbrtowc-w32.c +tests/test-mbrtowc-w32utf8.sh +tests/test-mbrtowc-w32utf8.c +tests/windows-utf8.rc +tests/windows-utf8.manifest tests/signature.h tests/macros.h m4/locale-en.m4 @@ -20,12 +24,14 @@ m4/locale-fr.m4 m4/locale-ja.m4 m4/locale-zh.m4 m4/codeset.m4 +m4/windows-rc.m4 Depends-on: mbsinit wctob setlocale localcharset +test-xfail configure.ac: gt_LOCALE_EN_UTF8 @@ -33,6 +39,7 @@ gt_LOCALE_FR gt_LOCALE_FR_UTF8 gt_LOCALE_JA gt_LOCALE_ZH_CN +gl_WINDOWS_RC Makefile.am: TESTS += \ @@ -49,3 +56,12 @@ TESTS_ENVIRONMENT += \ LOCALE_ZH_CN='@LOCALE_ZH_CN@' check_PROGRAMS += test-mbrtowc test-mbrtowc-w32 test_mbrtowc_LDADD = $(LDADD) $(SETLOCALE_LIB) $(MBRTOWC_LIB) + +if OS_IS_NATIVE_WINDOWS +TESTS += test-mbrtowc-w32utf8.sh +noinst_PROGRAMS += test-mbrtowc-w32utf8 +test_mbrtowc_w32utf8_LDADD = $(LDADD) test-mbrtowc-windows-utf8.res $(SETLOCALE_LIB) +test-mbrtowc-windows-utf8.res : $(srcdir)/windows-utf8.rc + $(WINDRES) -i $(srcdir)/windows-utf8.rc -o test-mbrtowc-windows-utf8.res --output-format=coff +MOSTLYCLEANFILES += test-mbrtowc-windows-utf8.res +endif diff --git a/tests/test-mbrtowc-w32utf8.c b/tests/test-mbrtowc-w32utf8.c new file mode 100644 index 0000000000..803c1638c0 --- /dev/null +++ b/tests/test-mbrtowc-w32utf8.c @@ -0,0 +1,166 @@ +/* Test of conversion of multibyte character to wide character + on native Windows in the UTF-8 environment. + Copyright (C) 2024 Free Software Foundation, Inc. + + This program is free software: you can redistribute it and/or modify + it under the terms of the GNU General Public License as published by + the Free Software Foundation, either version 3 of the License, or + (at your option) any later version. + + This program is distributed in the hope that it will be useful, + but WITHOUT ANY WARRANTY; without even the implied warranty of + MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + GNU General Public License for more details. + + You should have received a copy of the GNU General Public License + along with this program. If not, see <https://www.gnu.org/licenses/>. */ + +/* Written by Bruno Haible <br...@clisp.org>, 2024. */ + +#include <config.h> + +#include <wchar.h> + +#include <errno.h> +#include <locale.h> +#include <stdio.h> +#include <stdlib.h> +#include <string.h> + +#include "macros.h" + +int +main (void) +{ +#ifdef _UCRT + /* Test that MB_CUR_MAX and mbrtowc() work as expected in a UTF-8 locale. */ + mbstate_t state; + wchar_t wc; + size_t ret; + + if (setlocale (LC_ALL, "") == NULL) + return 1; + + ASSERT (MB_CUR_MAX >= 4); + + { + char input[] = "B\303\274\303\237er"; /* "B????er" */ + memset (&state, '\0', sizeof (mbstate_t)); + + wc = (wchar_t) 0xBADFACE; + ret = mbrtowc (&wc, input, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == 'B'); + ASSERT (mbsinit (&state)); + input[0] = '\0'; + + wc = (wchar_t) 0xBADFACE; + ret = mbrtowc (&wc, input + 1, 1, &state); + ASSERT (ret == (size_t)(-2)); + ASSERT (wc == (wchar_t) 0xBADFACE); + ASSERT (!mbsinit (&state)); + input[1] = '\0'; + + wc = (wchar_t) 0xBADFACE; + ret = mbrtowc (&wc, input + 2, 5, &state); + ASSERT (ret == 1); + ASSERT (wctob (wc) == EOF); + ASSERT (wc == 0x00FC); + ASSERT (mbsinit (&state)); + input[2] = '\0'; + + /* Test support of NULL first argument. */ + ret = mbrtowc (NULL, input + 3, 4, &state); + ASSERT (ret == 2); + ASSERT (mbsinit (&state)); + + wc = (wchar_t) 0xBADFACE; + ret = mbrtowc (&wc, input + 3, 4, &state); + ASSERT (ret == 2); + ASSERT (wctob (wc) == EOF); + ASSERT (wc == 0x00DF); + ASSERT (mbsinit (&state)); + input[3] = '\0'; + input[4] = '\0'; + + wc = (wchar_t) 0xBADFACE; + ret = mbrtowc (&wc, input + 5, 2, &state); + ASSERT (ret == 1); + ASSERT (wc == 'e'); + ASSERT (mbsinit (&state)); + input[5] = '\0'; + + wc = (wchar_t) 0xBADFACE; + ret = mbrtowc (&wc, input + 6, 1, &state); + ASSERT (ret == 1); + ASSERT (wc == 'r'); + ASSERT (mbsinit (&state)); + + /* Test some invalid input. */ + memset (&state, '\0', sizeof (mbstate_t)); + wc = (wchar_t) 0xBADFACE; + ret = mbrtowc (&wc, "\377", 1, &state); /* 0xFF */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (wchar_t) 0xBADFACE; + ret = mbrtowc (&wc, "\303\300", 2, &state); /* 0xC3 0xC0 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (wchar_t) 0xBADFACE; + ret = mbrtowc (&wc, "\343\300", 2, &state); /* 0xE3 0xC0 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (wchar_t) 0xBADFACE; + ret = mbrtowc (&wc, "\343\300\200", 3, &state); /* 0xE3 0xC0 0x80 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (wchar_t) 0xBADFACE; + ret = mbrtowc (&wc, "\343\200\300", 3, &state); /* 0xE3 0x80 0xC0 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (wchar_t) 0xBADFACE; + ret = mbrtowc (&wc, "\363\300", 2, &state); /* 0xF3 0xC0 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (wchar_t) 0xBADFACE; + ret = mbrtowc (&wc, "\363\300\200\200", 4, &state); /* 0xF3 0xC0 0x80 0x80 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (wchar_t) 0xBADFACE; + ret = mbrtowc (&wc, "\363\200\300", 3, &state); /* 0xF3 0x80 0xC0 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (wchar_t) 0xBADFACE; + ret = mbrtowc (&wc, "\363\200\300\200", 4, &state); /* 0xF3 0x80 0xC0 0x80 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + + memset (&state, '\0', sizeof (mbstate_t)); + wc = (wchar_t) 0xBADFACE; + ret = mbrtowc (&wc, "\363\200\200\300", 4, &state); /* 0xF3 0x80 0x80 0xC0 */ + ASSERT (ret == (size_t)-1); + ASSERT (errno == EILSEQ); + } + + return test_exit_status; +#else + fputs ("Skipping test: not using the UCRT runtime\n", stderr); + return 77; +#endif +} diff --git a/tests/test-mbrtowc-w32utf8.sh b/tests/test-mbrtowc-w32utf8.sh new file mode 100755 index 0000000000..d0a953486c --- /dev/null +++ b/tests/test-mbrtowc-w32utf8.sh @@ -0,0 +1,12 @@ +#!/bin/sh + +# Test the UTF-8 environment on native Windows. +unset LC_ALL +unset LC_CTYPE +unset LC_MESSAGES +unset LC_NUMERIC +unset LC_COLLATE +unset LC_MONETARY +unset LC_TIME +unset LANG +${CHECKER} ./test-mbrtowc-w32utf8${EXEEXT} -- 2.43.0