supporting in the UTF-8 environment on native Windows

Bruno Haible via Gnulib discussion list Mon, 23 Dec 2024 08:15:38 -0800

Lasse Collin reported in
<https://lists.gnu.org/archive/html/bug-gettext/2024-12/msg00111.html>
that the setlocale() override from GNU libintl does not support the
UTF-8 environment of native Windows correctly. That setlocale() override
is based on the setlocale() override from gnulib. So let me add that
support here.


What I call the "UTF-8 environment of native Windows" is a way of
packaging an application (details are in [1]) in such a way that
GetACP() return 65001, the codepage number for UTF-8.

In fact, there are apparently two variants of this mode:
  - the legacy Windows settings variant: when you haven't ever
    (or recently?) changed the system default locale of Windows 10,
  - the modern Windows settings variant: when you have changed
    the system default locale of Windows 10.
With the legacy Windows settings, the setlocale() function produces
locale names such as "English_United States.65001" or
"English_United States.utf8". With the modern Windows settings, it
produces "en_US.UTF-8" instead. (This is with both mingw and MSVC,
according to my testing.)

The various locale-related modules of gnulib were never tested in
the UTF-8 environment. This series of patches adds support for it,
with unit tests.

[1] 
<https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page>


2024-12-23  Bruno Haible  <br...@clisp.org>

        mbrtowc tests: Test in the UTF-8 environment on native Windows.
        * tests/test-mbrtowc-w32utf8.sh: New file.
        * tests/test-mbrtowc-w32utf8.c: New file.
        * modules/mbrtowc-tests (Files): Add these files and
        m4/windows-rc.m4, tests/windows-utf8.rc, tests/windows-utf8.manifest.
        (Depends-on): Add test-xfail.
        (configure.ac): Invoke gl_WINDOWS_RC.
        (Makefile.am): Arrange to compile test-mbrtowc-w32utf8 and run
        test-mbrtowc-w32utf8.sh.

2024-12-23  Bruno Haible  <br...@clisp.org>

        setlocale tests: Test in the UTF-8 environment on native Windows.
        * tests/test-setlocale-w32utf8.sh: New file.
        * tests/test-setlocale-w32utf8.c: New file.
        * modules/setlocale-tests (Files): Add these files and
        m4/windows-rc.m4, tests/windows-utf8.rc, tests/windows-utf8.manifest.
        (Depends-on): Add test-xfail.
        (configure.ac): Invoke gl_WINDOWS_RC.
        (Makefile.am): Arrange to compile test-setlocale-w32utf8 and run
        test-setlocale-w32utf8.sh.

        setlocale: Support the UTF-8 environment on native Windows.
        * lib/setlocale.c: Include <windows.h>.
        (setlocale_unixlike): In the UTF-8 environment, append a suffix ".65001"
        to the locale names passed to the native setlocale().

2024-12-23  Bruno Haible  <br...@clisp.org>

        localename tests: Test in the UTF-8 environment on native Windows.
        * tests/test-localename-w32utf8.sh: New file.
        * tests/test-localename-w32utf8.c: New file.
        * modules/localename-tests (Files): Add these files and
        m4/windows-rc.m4, tests/windows-utf8.rc, tests/windows-utf8.manifest.
        (Depends-on): Add test-xfail.
        (configure.ac): Invoke gl_WINDOWS_RC.
        (Makefile.am): Arrange to compile test-localename-w32utf8 and run
        test-localename-w32utf8.sh.

        localename-unsafe: Support the UTF-8 environment on native Windows.
        * lib/localename-unsafe.c (gl_locale_name_from_win32_LANGID): Append a
        suffix ".UTF-8" to the result if GetACP() is UTF-8.

2024-12-23  Bruno Haible  <br...@clisp.org>

        localcharset tests: Test in the UTF-8 environment on native Windows.
        * m4/windows-rc.m4: New file.
        * tests/test-localcharset-w32utf8.sh: New file.
        * tests/test-localcharset-w32utf8.c: New file.
        * tests/windows-utf8.rc: New file.
        * tests/windows-utf8.manifest: New file.
        * modules/localcharset-tests (Files): Add these files.
        (Depends-on): Add test-xfail.
        (configure.ac): Invoke gl_WINDOWS_RC.
        (Makefile.am): Arrange to compile test-localcharset-w32utf8 and run
        test-localcharset-w32utf8.sh.

        localcharset: Support the UTF-8 environment on native Windows.
        * lib/localcharset.c (locale_charset): Recognize also the special case
        of a setlocale() result that ends in ".UTF-8".

>From 927a70e0853345315570f051fd6996cfeb7b4d96 Mon Sep 17 00:00:00 2001
From: Bruno Haible <br...@clisp.org>
Date: Mon, 23 Dec 2024 16:56:15 +0100
Subject: [PATCH 1/7] localcharset: Support the UTF-8 environment on native
 Windows.

* lib/localcharset.c (locale_charset): Recognize also the special case
of a setlocale() result that ends in ".UTF-8".
---
 ChangeLog          | 6 ++++++
 lib/localcharset.c | 6 ++++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index c294898828..1ac323da3e 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2024-12-23  Bruno Haible  <br...@clisp.org>
+
+	localcharset: Support the UTF-8 environment on native Windows.
+	* lib/localcharset.c (locale_charset): Recognize also the special case
+	of a setlocale() result that ends in ".UTF-8".
+
 2024-12-23  Bruno Haible  <br...@clisp.org>
 
 	setlocale tests: Add unit test for LC_MESSAGES handling.
diff --git a/lib/localcharset.c b/lib/localcharset.c
index bd3367477d..755645763d 100644
--- a/lib/localcharset.c
+++ b/lib/localcharset.c
@@ -939,8 +939,10 @@ locale_charset (void)
       sprintf (buf, "CP%u", GetACP ());
     }
   /* For a locale name such as "French_France.65001", in Windows 10,
-     setlocale now returns "French_France.utf8" instead.  */
-  if (strcmp (buf + 2, "65001") == 0 || strcmp (buf + 2, "utf8") == 0)
+     setlocale now returns "French_France.utf8" instead, or in the UTF-8
+     environment (with modern system settings) "fr_FR.UTF-8".  */
+  if (strcmp (buf + 2, "65001") == 0 || strcmp (buf + 2, "utf8") == 0
+      || strcmp (buf + 2, "UTF-8") == 0)
     codeset = "UTF-8";
   else
     {
-- 
2.43.0

>From a5c87eca2b85c624582eabeb6b409dc6fb50bfbd Mon Sep 17 00:00:00 2001
From: Bruno Haible <br...@clisp.org>
Date: Mon, 23 Dec 2024 16:56:37 +0100
Subject: [PATCH 2/7] localcharset tests: Test in the UTF-8 environment on
 native Windows.

* m4/windows-rc.m4: New file.
* tests/test-localcharset-w32utf8.sh: New file.
* tests/test-localcharset-w32utf8.c: New file.
* tests/windows-utf8.rc: New file.
* tests/windows-utf8.manifest: New file.
* modules/localcharset-tests (Files): Add these files.
(Depends-on): Add test-xfail.
(configure.ac): Invoke gl_WINDOWS_RC.
(Makefile.am): Arrange to compile test-localcharset-w32utf8 and run
test-localcharset-w32utf8.sh.
---
 ChangeLog                          | 12 ++++++
 m4/windows-rc.m4                   | 21 ++++++++++
 modules/localcharset-tests         | 16 ++++++++
 tests/test-localcharset-w32utf8.c  | 61 ++++++++++++++++++++++++++++++
 tests/test-localcharset-w32utf8.sh |  7 ++++
 tests/windows-utf8.manifest        | 20 ++++++++++
 tests/windows-utf8.rc              |  9 +++++
 7 files changed, 146 insertions(+)
 create mode 100644 m4/windows-rc.m4
 create mode 100644 tests/test-localcharset-w32utf8.c
 create mode 100755 tests/test-localcharset-w32utf8.sh
 create mode 100644 tests/windows-utf8.manifest
 create mode 100644 tests/windows-utf8.rc

diff --git a/ChangeLog b/ChangeLog
index 1ac323da3e..bb9f076353 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,17 @@
 2024-12-23  Bruno Haible  <br...@clisp.org>
 
+	localcharset tests: Test in the UTF-8 environment on native Windows.
+	* m4/windows-rc.m4: New file.
+	* tests/test-localcharset-w32utf8.sh: New file.
+	* tests/test-localcharset-w32utf8.c: New file.
+	* tests/windows-utf8.rc: New file.
+	* tests/windows-utf8.manifest: New file.
+	* modules/localcharset-tests (Files): Add these files.
+	(Depends-on): Add test-xfail.
+	(configure.ac): Invoke gl_WINDOWS_RC.
+	(Makefile.am): Arrange to compile test-localcharset-w32utf8 and run
+	test-localcharset-w32utf8.sh.
+
 	localcharset: Support the UTF-8 environment on native Windows.
 	* lib/localcharset.c (locale_charset): Recognize also the special case
 	of a setlocale() result that ends in ".UTF-8".
diff --git a/m4/windows-rc.m4 b/m4/windows-rc.m4
new file mode 100644
index 0000000000..8a4deb14b8
--- /dev/null
+++ b/m4/windows-rc.m4
@@ -0,0 +1,21 @@
+# windows-rc.m4
+# serial 1
+dnl Copyright (C) 2024 Free Software Foundation, Inc.
+dnl This file is free software; the Free Software Foundation
+dnl gives unlimited permission to copy and/or distribute it,
+dnl with or without modifications, as long as this notice is preserved.
+dnl This file is offered as-is, without any warranty.
+
+dnl Find the tool that "compiles" a Windows resource file (.rc) to an
+dnl object file.
+
+AC_DEFUN_ONCE([gl_WINDOWS_RC],
+[
+  AC_REQUIRE([AC_CANONICAL_HOST])
+  case "$host_os" in
+    mingw* | windows*)
+      dnl Check for a program that compiles Windows resource files.
+      AC_CHECK_TOOL([WINDRES], [windres])
+      ;;
+  esac
+])
diff --git a/modules/localcharset-tests b/modules/localcharset-tests
index 3f2dde6dfd..a171c0cfbf 100644
--- a/modules/localcharset-tests
+++ b/modules/localcharset-tests
@@ -1,11 +1,27 @@
 Files:
 tests/test-localcharset.c
+tests/test-localcharset-w32utf8.sh
+tests/test-localcharset-w32utf8.c
+tests/windows-utf8.rc
+tests/windows-utf8.manifest
+m4/windows-rc.m4
 
 Depends-on:
 setlocale
+test-xfail
 
 configure.ac:
+gl_WINDOWS_RC
 
 Makefile.am:
 noinst_PROGRAMS += test-localcharset
 test_localcharset_LDADD = $(LDADD) $(SETLOCALE_LIB)
+
+if OS_IS_NATIVE_WINDOWS
+TESTS += test-localcharset-w32utf8.sh
+noinst_PROGRAMS += test-localcharset-w32utf8
+test_localcharset_w32utf8_LDADD = $(LDADD) test-localcharset-windows-utf8.res $(SETLOCALE_LIB)
+test-localcharset-windows-utf8.res : $(srcdir)/windows-utf8.rc
+	$(WINDRES) -i $(srcdir)/windows-utf8.rc -o test-localcharset-windows-utf8.res --output-format=coff
+MOSTLYCLEANFILES += test-localcharset-windows-utf8.res
+endif
diff --git a/tests/test-localcharset-w32utf8.c b/tests/test-localcharset-w32utf8.c
new file mode 100644
index 0000000000..f40db9c397
--- /dev/null
+++ b/tests/test-localcharset-w32utf8.c
@@ -0,0 +1,61 @@
+/* Test of localcharset() function
+   on native Windows in the UTF-8 environment.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation, either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */
+
+/* Written by Bruno Haible <br...@clisp.org>, 2024.  */
+
+#include <config.h>
+
+#include "localcharset.h"
+
+#include <locale.h>
+#include <stdio.h>
+#include <string.h>
+#include <stdlib.h>
+
+#define WIN32_LEAN_AND_MEAN
+#include <windows.h>
+
+int
+main (void)
+{
+#ifdef _UCRT
+  unsigned int active_codepage = GetACP ();
+  if (!(active_codepage == 65001))
+    {
+      fprintf (stderr,
+               "The active codepage is %u, not 65001 as expected.\n"
+               "(This is normal on Windows older than Windows 10.)\n",
+               active_codepage);
+      exit (1);
+    }
+
+  setlocale (LC_ALL, "");
+  const char *lc = locale_charset ();
+  if (!(strcmp (lc, "UTF-8") == 0))
+    {
+      fprintf (stderr,
+               "locale_charset () is \"%s\", not \"UTF-8\" as expected.\n",
+               lc);
+      exit (1);
+    }
+
+  return 0;
+#else
+  fputs ("Skipping test: not using the UCRT runtime\n", stderr);
+  return 77;
+#endif
+}
diff --git a/tests/test-localcharset-w32utf8.sh b/tests/test-localcharset-w32utf8.sh
new file mode 100755
index 0000000000..1e6a95b545
--- /dev/null
+++ b/tests/test-localcharset-w32utf8.sh
@@ -0,0 +1,7 @@
+#!/bin/sh
+
+# Test the UTF-8 environment on native Windows.
+unset LC_ALL
+unset LC_CTYPE
+unset LANG
+${CHECKER} ./test-localcharset-w32utf8${EXEEXT}
diff --git a/tests/windows-utf8.manifest b/tests/windows-utf8.manifest
new file mode 100644
index 0000000000..3a43a70c6d
--- /dev/null
+++ b/tests/windows-utf8.manifest
@@ -0,0 +1,20 @@
+<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
+<!-- This file is in the public domain. -->
+
+<!-- This file is an application manifest that has the effect that in the
+     application, GetACP () == 65001 instead of e.g. 1252.
+     Documentation:
+     https://learn.microsoft.com/en-us/windows/win32/sbscs/application-manifests#activeCodePage
+     https://learn.microsoft.com/en-us/windows/apps/design/globalizing/use-utf8-code-page
+     XML schema that this file needs to obey:
+     https://learn.microsoft.com/en-us/windows/win32/sbscs/manifest-file-schema
+     It is supposed to work in Windows 10 version 1903 or newer,
+     when the UCRT runtime is in use (as opposed to old MSVCRT).
+-->
+<assembly xmlns="urn:schemas-microsoft-com:asm.v1" manifestVersion="1.0">
+  <application xmlns="urn:schemas-microsoft-com:asm.v3">
+    <windowsSettings>
+      <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings";>UTF-8</activeCodePage>
+    </windowsSettings>
+  </application>
+</assembly>
diff --git a/tests/windows-utf8.rc b/tests/windows-utf8.rc
new file mode 100644
index 0000000000..110241aa16
--- /dev/null
+++ b/tests/windows-utf8.rc
@@ -0,0 +1,9 @@
+/* This file is in the public domain. */
+
+/* This file is a resource definition file.
+   When compiled to an object file, it embeds the windows-utf8.manifest file,
+   that has the effect that in the application, GetACP () == 65001 instead
+   of e.g. 1252. */
+
+#include <winresrc.h>  /* includes <winuser.h>, <winver.h> */
+CREATEPROCESS_MANIFEST_RESOURCE_ID RT_MANIFEST "windows-utf8.manifest"
-- 
2.43.0

>From 9f7ff4f423cd805866cd4edef806c32393621df0 Mon Sep 17 00:00:00 2001
From: Bruno Haible <br...@clisp.org>
Date: Mon, 23 Dec 2024 16:56:57 +0100
Subject: [PATCH 3/7] localename-unsafe: Support the UTF-8 environment on
 native Windows.

* lib/localename-unsafe.c (gl_locale_name_from_win32_LANGID): Append a
suffix ".UTF-8" to the result if GetACP() is UTF-8.
---
 ChangeLog               |   6 +
 lib/localename-unsafe.c | 848 ++++++++++++++++++++--------------------
 2 files changed, 433 insertions(+), 421 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index bb9f076353..d9f282c21e 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,9 @@
+2024-12-23  Bruno Haible  <br...@clisp.org>
+
+	localename-unsafe: Support the UTF-8 environment on native Windows.
+	* lib/localename-unsafe.c (gl_locale_name_from_win32_LANGID): Append a
+	suffix ".UTF-8" to the result if GetACP() is UTF-8.
+
 2024-12-23  Bruno Haible  <br...@clisp.org>
 
 	localcharset tests: Test in the UTF-8 environment on native Windows.
diff --git a/lib/localename-unsafe.c b/lib/localename-unsafe.c
index 0a2654d8a3..7088616892 100644
--- a/lib/localename-unsafe.c
+++ b/lib/localename-unsafe.c
@@ -1502,6 +1502,8 @@ static
 const char *
 gl_locale_name_from_win32_LANGID (LANGID langid)
 {
+  int is_utf8 = (GetACP () == 65001);
+
   /* Activate the new code only when the GETTEXT_MUI environment variable is
      set, for the time being, since the new code is not well tested.  */
   if (getenv ("GETTEXT_MUI") != NULL)
@@ -1512,10 +1514,12 @@ gl_locale_name_from_win32_LANGID (LANGID langid)
          On Windows95/98/ME, GetLocaleInfoA returns some incorrect results.
          But we don't need to support systems that are so old.  */
       if (GetLocaleInfoA (MAKELCID (langid, SORT_DEFAULT), LOCALE_SNAME,
-                          namebuf, sizeof (namebuf) - 1))
+                          namebuf, sizeof (namebuf) - 1 - 6))
         {
           /* Convert it to a Unix locale name.  */
           gl_locale_name_canonicalize (namebuf);
+          if (is_utf8)
+            strcat (namebuf, ".UTF-8");
           return namebuf;
         }
     }
@@ -1525,6 +1529,7 @@ gl_locale_name_from_win32_LANGID (LANGID langid)
      Windows base (e.g. they have different character conversion facilities
      that produce different results).  */
   /* Use our own table.  */
+  #define N(name) (is_utf8 ? name ".UTF-8" : name)
   {
     int primary, sub;
 
@@ -1540,146 +1545,146 @@ gl_locale_name_from_win32_LANGID (LANGID langid)
       case LANG_AFRIKAANS:
         switch (sub)
           {
-          case SUBLANG_AFRIKAANS_SOUTH_AFRICA: return "af_ZA";
+          case SUBLANG_AFRIKAANS_SOUTH_AFRICA: return N("af_ZA");
           }
-        return "af";
+        return N("af");
       case LANG_ALBANIAN:
         switch (sub)
           {
-          case SUBLANG_ALBANIAN_ALBANIA: return "sq_AL";
+          case SUBLANG_ALBANIAN_ALBANIA: return N("sq_AL");
           }
-        return "sq";
+        return N("sq");
       case LANG_ALSATIAN:
         switch (sub)
           {
-          case SUBLANG_ALSATIAN_FRANCE: return "gsw_FR";
+          case SUBLANG_ALSATIAN_FRANCE: return N("gsw_FR");
           }
-        return "gsw";
+        return N("gsw");
       case LANG_AMHARIC:
         switch (sub)
           {
-          case SUBLANG_AMHARIC_ETHIOPIA: return "am_ET";
+          case SUBLANG_AMHARIC_ETHIOPIA: return N("am_ET");
           }
-        return "am";
+        return N("am");
       case LANG_ARABIC:
         switch (sub)
           {
-          case SUBLANG_ARABIC_SAUDI_ARABIA: return "ar_SA";
-          case SUBLANG_ARABIC_IRAQ: return "ar_IQ";
-          case SUBLANG_ARABIC_EGYPT: return "ar_EG";
-          case SUBLANG_ARABIC_LIBYA: return "ar_LY";
-          case SUBLANG_ARABIC_ALGERIA: return "ar_DZ";
-          case SUBLANG_ARABIC_MOROCCO: return "ar_MA";
-          case SUBLANG_ARABIC_TUNISIA: return "ar_TN";
-          case SUBLANG_ARABIC_OMAN: return "ar_OM";
-          case SUBLANG_ARABIC_YEMEN: return "ar_YE";
-          case SUBLANG_ARABIC_SYRIA: return "ar_SY";
-          case SUBLANG_ARABIC_JORDAN: return "ar_JO";
-          case SUBLANG_ARABIC_LEBANON: return "ar_LB";
-          case SUBLANG_ARABIC_KUWAIT: return "ar_KW";
-          case SUBLANG_ARABIC_UAE: return "ar_AE";
-          case SUBLANG_ARABIC_BAHRAIN: return "ar_BH";
-          case SUBLANG_ARABIC_QATAR: return "ar_QA";
-          }
-        return "ar";
+          case SUBLANG_ARABIC_SAUDI_ARABIA: return N("ar_SA");
+          case SUBLANG_ARABIC_IRAQ: return N("ar_IQ");
+          case SUBLANG_ARABIC_EGYPT: return N("ar_EG");
+          case SUBLANG_ARABIC_LIBYA: return N("ar_LY");
+          case SUBLANG_ARABIC_ALGERIA: return N("ar_DZ");
+          case SUBLANG_ARABIC_MOROCCO: return N("ar_MA");
+          case SUBLANG_ARABIC_TUNISIA: return N("ar_TN");
+          case SUBLANG_ARABIC_OMAN: return N("ar_OM");
+          case SUBLANG_ARABIC_YEMEN: return N("ar_YE");
+          case SUBLANG_ARABIC_SYRIA: return N("ar_SY");
+          case SUBLANG_ARABIC_JORDAN: return N("ar_JO");
+          case SUBLANG_ARABIC_LEBANON: return N("ar_LB");
+          case SUBLANG_ARABIC_KUWAIT: return N("ar_KW");
+          case SUBLANG_ARABIC_UAE: return N("ar_AE");
+          case SUBLANG_ARABIC_BAHRAIN: return N("ar_BH");
+          case SUBLANG_ARABIC_QATAR: return N("ar_QA");
+          }
+        return N("ar");
       case LANG_ARMENIAN:
         switch (sub)
           {
-          case SUBLANG_ARMENIAN_ARMENIA: return "hy_AM";
+          case SUBLANG_ARMENIAN_ARMENIA: return N("hy_AM");
           }
-        return "hy";
+        return N("hy");
       case LANG_ASSAMESE:
         switch (sub)
           {
-          case SUBLANG_ASSAMESE_INDIA: return "as_IN";
+          case SUBLANG_ASSAMESE_INDIA: return N("as_IN");
           }
-        return "as";
+        return N("as");
       case LANG_AZERI:
         switch (sub)
           {
-          case 0x1e: return "az";
-          case SUBLANG_AZERI_LATIN: return "az_AZ";
-          case 0x1d: return "az@cyrillic";
-          case SUBLANG_AZERI_CYRILLIC: return "az_AZ@cyrillic";
+          case 0x1e: return N("az");
+          case SUBLANG_AZERI_LATIN: return N("az_AZ");
+          case 0x1d: return N("az@cyrillic");
+          case SUBLANG_AZERI_CYRILLIC: return N("az_AZ@cyrillic");
           }
-        return "az";
+        return N("az");
       case LANG_BASHKIR:
         switch (sub)
           {
-          case SUBLANG_BASHKIR_RUSSIA: return "ba_RU";
+          case SUBLANG_BASHKIR_RUSSIA: return N("ba_RU");
           }
-        return "ba";
+        return N("ba");
       case LANG_BASQUE:
         switch (sub)
           {
-          case SUBLANG_BASQUE_BASQUE: return "eu_ES";
+          case SUBLANG_BASQUE_BASQUE: return N("eu_ES");
           }
-        return "eu"; /* Ambiguous: could be "eu_ES" or "eu_FR".  */
+        return N("eu"); /* Ambiguous: could be "eu_ES" or "eu_FR".  */
       case LANG_BELARUSIAN:
         switch (sub)
           {
-          case SUBLANG_BELARUSIAN_BELARUS: return "be_BY";
+          case SUBLANG_BELARUSIAN_BELARUS: return N("be_BY");
           }
-        return "be";
+        return N("be");
       case LANG_BENGALI:
         switch (sub)
           {
-          case SUBLANG_BENGALI_INDIA: return "bn_IN";
-          case SUBLANG_BENGALI_BANGLADESH: return "bn_BD";
+          case SUBLANG_BENGALI_INDIA: return N("bn_IN");
+          case SUBLANG_BENGALI_BANGLADESH: return N("bn_BD");
           }
-        return "bn";
+        return N("bn");
       case LANG_BRETON:
         switch (sub)
           {
-          case SUBLANG_BRETON_FRANCE: return "br_FR";
+          case SUBLANG_BRETON_FRANCE: return N("br_FR");
           }
-        return "br";
+        return N("br");
       case LANG_BULGARIAN:
         switch (sub)
           {
-          case SUBLANG_BULGARIAN_BULGARIA: return "bg_BG";
+          case SUBLANG_BULGARIAN_BULGARIA: return N("bg_BG");
           }
-        return "bg";
+        return N("bg");
       case LANG_BURMESE:
         switch (sub)
           {
-          case SUBLANG_DEFAULT: return "my_MM";
+          case SUBLANG_DEFAULT: return N("my_MM");
           }
-        return "my";
+        return N("my");
       case LANG_CAMBODIAN:
         switch (sub)
           {
-          case SUBLANG_CAMBODIAN_CAMBODIA: return "km_KH";
+          case SUBLANG_CAMBODIAN_CAMBODIA: return N("km_KH");
           }
-        return "km";
+        return N("km");
       case LANG_CATALAN:
         switch (sub)
           {
-          case SUBLANG_CATALAN_SPAIN: return "ca_ES";
+          case SUBLANG_CATALAN_SPAIN: return N("ca_ES");
           }
-        return "ca";
+        return N("ca");
       case LANG_CHEROKEE:
         switch (sub)
           {
-          case SUBLANG_DEFAULT: return "chr_US";
+          case SUBLANG_DEFAULT: return N("chr_US");
           }
-        return "chr";
+        return N("chr");
       case LANG_CHINESE:
         switch (sub)
           {
-          case SUBLANG_CHINESE_TRADITIONAL: case 0x1f: return "zh_TW";
-          case SUBLANG_CHINESE_SIMPLIFIED: case 0x00: return "zh_CN";
-          case SUBLANG_CHINESE_HONGKONG: return "zh_HK"; /* traditional */
-          case SUBLANG_CHINESE_SINGAPORE: return "zh_SG"; /* simplified */
-          case SUBLANG_CHINESE_MACAU: return "zh_MO"; /* traditional */
+          case SUBLANG_CHINESE_TRADITIONAL: case 0x1f: return N("zh_TW");
+          case SUBLANG_CHINESE_SIMPLIFIED: case 0x00: return N("zh_CN");
+          case SUBLANG_CHINESE_HONGKONG: return N("zh_HK"); /* traditional */
+          case SUBLANG_CHINESE_SINGAPORE: return N("zh_SG"); /* simplified */
+          case SUBLANG_CHINESE_MACAU: return N("zh_MO"); /* traditional */
           }
-        return "zh";
+        return N("zh");
       case LANG_CORSICAN:
         switch (sub)
           {
-          case SUBLANG_CORSICAN_FRANCE: return "co_FR";
+          case SUBLANG_CORSICAN_FRANCE: return N("co_FR");
           }
-        return "co";
+        return N("co");
       case LANG_CROATIAN:      /* LANG_CROATIAN == LANG_SERBIAN == LANG_BOSNIAN
                                 * What used to be called Serbo-Croatian
                                 * should really now be two separate
@@ -1691,68 +1696,68 @@ gl_locale_name_from_win32_LANGID (LANGID langid)
         switch (sub)
           {
           /* Croatian */
-          case 0x00: return "hr";
-          case SUBLANG_CROATIAN_CROATIA: return "hr_HR";
-          case SUBLANG_CROATIAN_BOSNIA_HERZEGOVINA_LATIN: return "hr_BA";
+          case 0x00: return N("hr");
+          case SUBLANG_CROATIAN_CROATIA: return N("hr_HR");
+          case SUBLANG_CROATIAN_BOSNIA_HERZEGOVINA_LATIN: return N("hr_BA");
           /* Serbian */
-          case 0x1f: return "sr";
-          case 0x1c: return "sr"; /* latin */
-          case SUBLANG_SERBIAN_LATIN: return "sr_CS"; /* latin */
-          case 0x09: return "sr_RS"; /* latin */
-          case 0x0b: return "sr_ME"; /* latin */
-          case 0x06: return "sr_BA"; /* latin */
-          case 0x1b: return "sr@cyrillic";
-          case SUBLANG_SERBIAN_CYRILLIC: return "sr_CS@cyrillic";
-          case 0x0a: return "sr_RS@cyrillic";
-          case 0x0c: return "sr_ME@cyrillic";
-          case 0x07: return "sr_BA@cyrillic";
+          case 0x1f: return N("sr");
+          case 0x1c: return N("sr"); /* latin */
+          case SUBLANG_SERBIAN_LATIN: return N("sr_CS"); /* latin */
+          case 0x09: return N("sr_RS"); /* latin */
+          case 0x0b: return N("sr_ME"); /* latin */
+          case 0x06: return N("sr_BA"); /* latin */
+          case 0x1b: return N("sr@cyrillic");
+          case SUBLANG_SERBIAN_CYRILLIC: return N("sr_CS@cyrillic");
+          case 0x0a: return N("sr_RS@cyrillic");
+          case 0x0c: return N("sr_ME@cyrillic");
+          case 0x07: return N("sr_BA@cyrillic");
           /* Bosnian */
-          case 0x1e: return "bs";
-          case 0x1a: return "bs"; /* latin */
-          case SUBLANG_BOSNIAN_BOSNIA_HERZEGOVINA_LATIN: return "bs_BA"; /* latin */
-          case 0x19: return "bs@cyrillic";
-          case SUBLANG_BOSNIAN_BOSNIA_HERZEGOVINA_CYRILLIC: return "bs_BA@cyrillic";
+          case 0x1e: return N("bs");
+          case 0x1a: return N("bs"); /* latin */
+          case SUBLANG_BOSNIAN_BOSNIA_HERZEGOVINA_LATIN: return N("bs_BA"); /* latin */
+          case 0x19: return N("bs@cyrillic");
+          case SUBLANG_BOSNIAN_BOSNIA_HERZEGOVINA_CYRILLIC: return N("bs_BA@cyrillic");
           }
-        return "hr";
+        return N("hr");
       case LANG_CZECH:
         switch (sub)
           {
-          case SUBLANG_CZECH_CZECH_REPUBLIC: return "cs_CZ";
+          case SUBLANG_CZECH_CZECH_REPUBLIC: return N("cs_CZ");
           }
-        return "cs";
+        return N("cs");
       case LANG_DANISH:
         switch (sub)
           {
-          case SUBLANG_DANISH_DENMARK: return "da_DK";
+          case SUBLANG_DANISH_DENMARK: return N("da_DK");
           }
-        return "da";
+        return N("da");
       case LANG_DARI:
         /* FIXME: Adjust this when such locales appear on Unix.  */
         switch (sub)
           {
-          case SUBLANG_DARI_AFGHANISTAN: return "prs_AF";
+          case SUBLANG_DARI_AFGHANISTAN: return N("prs_AF");
           }
-        return "prs";
+        return N("prs");
       case LANG_DIVEHI:
         switch (sub)
           {
-          case SUBLANG_DIVEHI_MALDIVES: return "dv_MV";
+          case SUBLANG_DIVEHI_MALDIVES: return N("dv_MV");
           }
-        return "dv";
+        return N("dv");
       case LANG_DUTCH:
         switch (sub)
           {
-          case SUBLANG_DUTCH: return "nl_NL";
-          case SUBLANG_DUTCH_BELGIAN: /* FLEMISH, VLAAMS */ return "nl_BE";
-          case SUBLANG_DUTCH_SURINAM: return "nl_SR";
+          case SUBLANG_DUTCH: return N("nl_NL");
+          case SUBLANG_DUTCH_BELGIAN: /* FLEMISH, VLAAMS */ return N("nl_BE");
+          case SUBLANG_DUTCH_SURINAM: return N("nl_SR");
           }
-        return "nl";
+        return N("nl");
       case LANG_EDO:
         switch (sub)
           {
-          case SUBLANG_DEFAULT: return "bin_NG";
+          case SUBLANG_DEFAULT: return N("bin_NG");
           }
-        return "bin";
+        return N("bin");
       case LANG_ENGLISH:
         switch (sub)
           {
@@ -1760,541 +1765,541 @@ gl_locale_name_from_win32_LANGID (LANGID langid)
            * English was the language spoken in England.
            * Oh well.
            */
-          case SUBLANG_ENGLISH_US: return "en_US";
-          case SUBLANG_ENGLISH_UK: return "en_GB";
-          case SUBLANG_ENGLISH_AUS: return "en_AU";
-          case SUBLANG_ENGLISH_CAN: return "en_CA";
-          case SUBLANG_ENGLISH_NZ: return "en_NZ";
-          case SUBLANG_ENGLISH_EIRE: return "en_IE";
-          case SUBLANG_ENGLISH_SOUTH_AFRICA: return "en_ZA";
-          case SUBLANG_ENGLISH_JAMAICA: return "en_JM";
-          case SUBLANG_ENGLISH_CARIBBEAN: return "en_GD"; /* Grenada? */
-          case SUBLANG_ENGLISH_BELIZE: return "en_BZ";
-          case SUBLANG_ENGLISH_TRINIDAD: return "en_TT";
-          case SUBLANG_ENGLISH_ZIMBABWE: return "en_ZW";
-          case SUBLANG_ENGLISH_PHILIPPINES: return "en_PH";
-          case SUBLANG_ENGLISH_INDONESIA: return "en_ID";
-          case SUBLANG_ENGLISH_HONGKONG: return "en_HK";
-          case SUBLANG_ENGLISH_INDIA: return "en_IN";
-          case SUBLANG_ENGLISH_MALAYSIA: return "en_MY";
-          case SUBLANG_ENGLISH_SINGAPORE: return "en_SG";
-          }
-        return "en";
+          case SUBLANG_ENGLISH_US: return N("en_US");
+          case SUBLANG_ENGLISH_UK: return N("en_GB");
+          case SUBLANG_ENGLISH_AUS: return N("en_AU");
+          case SUBLANG_ENGLISH_CAN: return N("en_CA");
+          case SUBLANG_ENGLISH_NZ: return N("en_NZ");
+          case SUBLANG_ENGLISH_EIRE: return N("en_IE");
+          case SUBLANG_ENGLISH_SOUTH_AFRICA: return N("en_ZA");
+          case SUBLANG_ENGLISH_JAMAICA: return N("en_JM");
+          case SUBLANG_ENGLISH_CARIBBEAN: return N("en_GD"); /* Grenada? */
+          case SUBLANG_ENGLISH_BELIZE: return N("en_BZ");
+          case SUBLANG_ENGLISH_TRINIDAD: return N("en_TT");
+          case SUBLANG_ENGLISH_ZIMBABWE: return N("en_ZW");
+          case SUBLANG_ENGLISH_PHILIPPINES: return N("en_PH");
+          case SUBLANG_ENGLISH_INDONESIA: return N("en_ID");
+          case SUBLANG_ENGLISH_HONGKONG: return N("en_HK");
+          case SUBLANG_ENGLISH_INDIA: return N("en_IN");
+          case SUBLANG_ENGLISH_MALAYSIA: return N("en_MY");
+          case SUBLANG_ENGLISH_SINGAPORE: return N("en_SG");
+          }
+        return N("en");
       case LANG_ESTONIAN:
         switch (sub)
           {
-          case SUBLANG_ESTONIAN_ESTONIA: return "et_EE";
+          case SUBLANG_ESTONIAN_ESTONIA: return N("et_EE");
           }
-        return "et";
+        return N("et");
       case LANG_FAEROESE:
         switch (sub)
           {
-          case SUBLANG_FAEROESE_FAROE_ISLANDS: return "fo_FO";
+          case SUBLANG_FAEROESE_FAROE_ISLANDS: return N("fo_FO");
           }
-        return "fo";
+        return N("fo");
       case LANG_FARSI:
         switch (sub)
           {
-          case SUBLANG_FARSI_IRAN: return "fa_IR";
+          case SUBLANG_FARSI_IRAN: return N("fa_IR");
           }
-        return "fa";
+        return N("fa");
       case LANG_FINNISH:
         switch (sub)
           {
-          case SUBLANG_FINNISH_FINLAND: return "fi_FI";
+          case SUBLANG_FINNISH_FINLAND: return N("fi_FI");
           }
-        return "fi";
+        return N("fi");
       case LANG_FRENCH:
         switch (sub)
           {
-          case SUBLANG_FRENCH: return "fr_FR";
-          case SUBLANG_FRENCH_BELGIAN: /* WALLOON */ return "fr_BE";
-          case SUBLANG_FRENCH_CANADIAN: return "fr_CA";
-          case SUBLANG_FRENCH_SWISS: return "fr_CH";
-          case SUBLANG_FRENCH_LUXEMBOURG: return "fr_LU";
-          case SUBLANG_FRENCH_MONACO: return "fr_MC";
-          case SUBLANG_FRENCH_WESTINDIES: return "fr"; /* Caribbean? */
-          case SUBLANG_FRENCH_REUNION: return "fr_RE";
-          case SUBLANG_FRENCH_CONGO: return "fr_CG";
-          case SUBLANG_FRENCH_SENEGAL: return "fr_SN";
-          case SUBLANG_FRENCH_CAMEROON: return "fr_CM";
-          case SUBLANG_FRENCH_COTEDIVOIRE: return "fr_CI";
-          case SUBLANG_FRENCH_MALI: return "fr_ML";
-          case SUBLANG_FRENCH_MOROCCO: return "fr_MA";
-          case SUBLANG_FRENCH_HAITI: return "fr_HT";
-          }
-        return "fr";
+          case SUBLANG_FRENCH: return N("fr_FR");
+          case SUBLANG_FRENCH_BELGIAN: /* WALLOON */ return N("fr_BE");
+          case SUBLANG_FRENCH_CANADIAN: return N("fr_CA");
+          case SUBLANG_FRENCH_SWISS: return N("fr_CH");
+          case SUBLANG_FRENCH_LUXEMBOURG: return N("fr_LU");
+          case SUBLANG_FRENCH_MONACO: return N("fr_MC");
+          case SUBLANG_FRENCH_WESTINDIES: return N("fr"); /* Caribbean? */
+          case SUBLANG_FRENCH_REUNION: return N("fr_RE");
+          case SUBLANG_FRENCH_CONGO: return N("fr_CG");
+          case SUBLANG_FRENCH_SENEGAL: return N("fr_SN");
+          case SUBLANG_FRENCH_CAMEROON: return N("fr_CM");
+          case SUBLANG_FRENCH_COTEDIVOIRE: return N("fr_CI");
+          case SUBLANG_FRENCH_MALI: return N("fr_ML");
+          case SUBLANG_FRENCH_MOROCCO: return N("fr_MA");
+          case SUBLANG_FRENCH_HAITI: return N("fr_HT");
+          }
+        return N("fr");
       case LANG_FRISIAN:
         switch (sub)
           {
-          case SUBLANG_FRISIAN_NETHERLANDS: return "fy_NL";
+          case SUBLANG_FRISIAN_NETHERLANDS: return N("fy_NL");
           }
-        return "fy";
+        return N("fy");
       case LANG_FULFULDE:
         /* Spoken in Nigeria, Guinea, Senegal, Mali, Niger, Cameroon, Benin.  */
         switch (sub)
           {
-          case SUBLANG_DEFAULT: return "ff_NG";
+          case SUBLANG_DEFAULT: return N("ff_NG");
           }
-        return "ff";
+        return N("ff");
       case LANG_GAELIC:
         switch (sub)
           {
           case 0x01: /* SCOTTISH */
             /* old, superseded by LANG_SCOTTISH_GAELIC */
-            return "gd_GB";
-          case SUBLANG_IRISH_IRELAND: return "ga_IE";
+            return N("gd_GB");
+          case SUBLANG_IRISH_IRELAND: return N("ga_IE");
           }
-        return "ga";
+        return N("ga");
       case LANG_GALICIAN:
         switch (sub)
           {
-          case SUBLANG_GALICIAN_SPAIN: return "gl_ES";
+          case SUBLANG_GALICIAN_SPAIN: return N("gl_ES");
           }
-        return "gl";
+        return N("gl");
       case LANG_GEORGIAN:
         switch (sub)
           {
-          case SUBLANG_GEORGIAN_GEORGIA: return "ka_GE";
+          case SUBLANG_GEORGIAN_GEORGIA: return N("ka_GE");
           }
-        return "ka";
+        return N("ka");
       case LANG_GERMAN:
         switch (sub)
           {
-          case SUBLANG_GERMAN: return "de_DE";
-          case SUBLANG_GERMAN_SWISS: return "de_CH";
-          case SUBLANG_GERMAN_AUSTRIAN: return "de_AT";
-          case SUBLANG_GERMAN_LUXEMBOURG: return "de_LU";
-          case SUBLANG_GERMAN_LIECHTENSTEIN: return "de_LI";
+          case SUBLANG_GERMAN: return N("de_DE");
+          case SUBLANG_GERMAN_SWISS: return N("de_CH");
+          case SUBLANG_GERMAN_AUSTRIAN: return N("de_AT");
+          case SUBLANG_GERMAN_LUXEMBOURG: return N("de_LU");
+          case SUBLANG_GERMAN_LIECHTENSTEIN: return N("de_LI");
           }
-        return "de";
+        return N("de");
       case LANG_GREEK:
         switch (sub)
           {
-          case SUBLANG_GREEK_GREECE: return "el_GR";
+          case SUBLANG_GREEK_GREECE: return N("el_GR");
           }
-        return "el";
+        return N("el");
       case LANG_GREENLANDIC:
         switch (sub)
           {
-          case SUBLANG_GREENLANDIC_GREENLAND: return "kl_GL";
+          case SUBLANG_GREENLANDIC_GREENLAND: return N("kl_GL");
           }
-        return "kl";
+        return N("kl");
       case LANG_GUARANI:
         switch (sub)
           {
-          case SUBLANG_DEFAULT: return "gn_PY";
+          case SUBLANG_DEFAULT: return N("gn_PY");
           }
-        return "gn";
+        return N("gn");
       case LANG_GUJARATI:
         switch (sub)
           {
-          case SUBLANG_GUJARATI_INDIA: return "gu_IN";
+          case SUBLANG_GUJARATI_INDIA: return N("gu_IN");
           }
-        return "gu";
+        return N("gu");
       case LANG_HAUSA:
         switch (sub)
           {
-          case 0x1f: return "ha";
-          case SUBLANG_HAUSA_NIGERIA_LATIN: return "ha_NG";
+          case 0x1f: return N("ha");
+          case SUBLANG_HAUSA_NIGERIA_LATIN: return N("ha_NG");
           }
-        return "ha";
+        return N("ha");
       case LANG_HAWAIIAN:
         /* FIXME: Do they mean Hawaiian ("haw_US", 1000 speakers)
            or Hawaii Creole English ("cpe_US", 600000 speakers)?  */
         switch (sub)
           {
-          case SUBLANG_DEFAULT: return "cpe_US";
+          case SUBLANG_DEFAULT: return N("cpe_US");
           }
-        return "cpe";
+        return N("cpe");
       case LANG_HEBREW:
         switch (sub)
           {
-          case SUBLANG_HEBREW_ISRAEL: return "he_IL";
+          case SUBLANG_HEBREW_ISRAEL: return N("he_IL");
           }
-        return "he";
+        return N("he");
       case LANG_HINDI:
         switch (sub)
           {
-          case SUBLANG_HINDI_INDIA: return "hi_IN";
+          case SUBLANG_HINDI_INDIA: return N("hi_IN");
           }
-        return "hi";
+        return N("hi");
       case LANG_HUNGARIAN:
         switch (sub)
           {
-          case SUBLANG_HUNGARIAN_HUNGARY: return "hu_HU";
+          case SUBLANG_HUNGARIAN_HUNGARY: return N("hu_HU");
           }
-        return "hu";
+        return N("hu");
       case LANG_IBIBIO:
         switch (sub)
           {
-          case SUBLANG_DEFAULT: return "nic_NG";
+          case SUBLANG_DEFAULT: return N("nic_NG");
           }
-        return "nic";
+        return N("nic");
       case LANG_ICELANDIC:
         switch (sub)
           {
-          case SUBLANG_ICELANDIC_ICELAND: return "is_IS";
+          case SUBLANG_ICELANDIC_ICELAND: return N("is_IS");
           }
-        return "is";
+        return N("is");
       case LANG_IGBO:
         switch (sub)
           {
-          case SUBLANG_IGBO_NIGERIA: return "ig_NG";
+          case SUBLANG_IGBO_NIGERIA: return N("ig_NG");
           }
-        return "ig";
+        return N("ig");
       case LANG_INDONESIAN:
         switch (sub)
           {
-          case SUBLANG_INDONESIAN_INDONESIA: return "id_ID";
+          case SUBLANG_INDONESIAN_INDONESIA: return N("id_ID");
           }
-        return "id";
+        return N("id");
       case LANG_INUKTITUT:
         switch (sub)
           {
-          case 0x1e: return "iu"; /* syllabic */
-          case SUBLANG_INUKTITUT_CANADA: return "iu_CA"; /* syllabic */
-          case 0x1f: return "iu@latin";
-          case SUBLANG_INUKTITUT_CANADA_LATIN: return "iu_CA@latin";
+          case 0x1e: return N("iu"); /* syllabic */
+          case SUBLANG_INUKTITUT_CANADA: return N("iu_CA"); /* syllabic */
+          case 0x1f: return N("iu@latin");
+          case SUBLANG_INUKTITUT_CANADA_LATIN: return N("iu_CA@latin");
           }
-        return "iu";
+        return N("iu");
       case LANG_ITALIAN:
         switch (sub)
           {
-          case SUBLANG_ITALIAN: return "it_IT";
-          case SUBLANG_ITALIAN_SWISS: return "it_CH";
+          case SUBLANG_ITALIAN: return N("it_IT");
+          case SUBLANG_ITALIAN_SWISS: return N("it_CH");
           }
-        return "it";
+        return N("it");
       case LANG_JAPANESE:
         switch (sub)
           {
-          case SUBLANG_JAPANESE_JAPAN: return "ja_JP";
+          case SUBLANG_JAPANESE_JAPAN: return N("ja_JP");
           }
-        return "ja";
+        return N("ja");
       case LANG_KANNADA:
         switch (sub)
           {
-          case SUBLANG_KANNADA_INDIA: return "kn_IN";
+          case SUBLANG_KANNADA_INDIA: return N("kn_IN");
           }
-        return "kn";
+        return N("kn");
       case LANG_KANURI:
         switch (sub)
           {
-          case SUBLANG_DEFAULT: return "kr_NG";
+          case SUBLANG_DEFAULT: return N("kr_NG");
           }
-        return "kr";
+        return N("kr");
       case LANG_KASHMIRI:
         switch (sub)
           {
-          case SUBLANG_DEFAULT: return "ks_PK";
-          case SUBLANG_KASHMIRI_INDIA: return "ks_IN";
+          case SUBLANG_DEFAULT: return N("ks_PK");
+          case SUBLANG_KASHMIRI_INDIA: return N("ks_IN");
           }
-        return "ks";
+        return N("ks");
       case LANG_KAZAK:
         switch (sub)
           {
-          case SUBLANG_KAZAK_KAZAKHSTAN: return "kk_KZ";
+          case SUBLANG_KAZAK_KAZAKHSTAN: return N("kk_KZ");
           }
-        return "kk";
+        return N("kk");
       case LANG_KICHE:
         /* FIXME: Adjust this when such locales appear on Unix.  */
         switch (sub)
           {
-          case SUBLANG_KICHE_GUATEMALA: return "qut_GT";
+          case SUBLANG_KICHE_GUATEMALA: return N("qut_GT");
           }
-        return "qut";
+        return N("qut");
       case LANG_KINYARWANDA:
         switch (sub)
           {
-          case SUBLANG_KINYARWANDA_RWANDA: return "rw_RW";
+          case SUBLANG_KINYARWANDA_RWANDA: return N("rw_RW");
           }
-        return "rw";
+        return N("rw");
       case LANG_KONKANI:
         switch (sub)
           {
-          case SUBLANG_KONKANI_INDIA: return "kok_IN";
+          case SUBLANG_KONKANI_INDIA: return N("kok_IN");
           }
-        return "kok";
+        return N("kok");
       case LANG_KOREAN:
         switch (sub)
           {
-          case SUBLANG_DEFAULT: return "ko_KR";
+          case SUBLANG_DEFAULT: return N("ko_KR");
           }
-        return "ko";
+        return N("ko");
       case LANG_KYRGYZ:
         switch (sub)
           {
-          case SUBLANG_KYRGYZ_KYRGYZSTAN: return "ky_KG";
+          case SUBLANG_KYRGYZ_KYRGYZSTAN: return N("ky_KG");
           }
-        return "ky";
+        return N("ky");
       case LANG_LAO:
         switch (sub)
           {
-          case SUBLANG_LAO_LAOS: return "lo_LA";
+          case SUBLANG_LAO_LAOS: return N("lo_LA");
           }
-        return "lo";
+        return N("lo");
       case LANG_LATIN:
         switch (sub)
           {
-          case SUBLANG_DEFAULT: return "la_VA";
+          case SUBLANG_DEFAULT: return N("la_VA");
           }
-        return "la";
+        return N("la");
       case LANG_LATVIAN:
         switch (sub)
           {
-          case SUBLANG_LATVIAN_LATVIA: return "lv_LV";
+          case SUBLANG_LATVIAN_LATVIA: return N("lv_LV");
           }
-        return "lv";
+        return N("lv");
       case LANG_LITHUANIAN:
         switch (sub)
           {
-          case SUBLANG_LITHUANIAN_LITHUANIA: return "lt_LT";
+          case SUBLANG_LITHUANIAN_LITHUANIA: return N("lt_LT");
           }
-        return "lt";
+        return N("lt");
       case LANG_LUXEMBOURGISH:
         switch (sub)
           {
-          case SUBLANG_LUXEMBOURGISH_LUXEMBOURG: return "lb_LU";
+          case SUBLANG_LUXEMBOURGISH_LUXEMBOURG: return N("lb_LU");
           }
-        return "lb";
+        return N("lb");
       case LANG_MACEDONIAN:
         switch (sub)
           {
-          case SUBLANG_MACEDONIAN_MACEDONIA: return "mk_MK";
+          case SUBLANG_MACEDONIAN_MACEDONIA: return N("mk_MK");
           }
-        return "mk";
+        return N("mk");
       case LANG_MALAY:
         switch (sub)
           {
-          case SUBLANG_MALAY_MALAYSIA: return "ms_MY";
-          case SUBLANG_MALAY_BRUNEI_DARUSSALAM: return "ms_BN";
+          case SUBLANG_MALAY_MALAYSIA: return N("ms_MY");
+          case SUBLANG_MALAY_BRUNEI_DARUSSALAM: return N("ms_BN");
           }
-        return "ms";
+        return N("ms");
       case LANG_MALAYALAM:
         switch (sub)
           {
-          case SUBLANG_MALAYALAM_INDIA: return "ml_IN";
+          case SUBLANG_MALAYALAM_INDIA: return N("ml_IN");
           }
-        return "ml";
+        return N("ml");
       case LANG_MALTESE:
         switch (sub)
           {
-          case SUBLANG_MALTESE_MALTA: return "mt_MT";
+          case SUBLANG_MALTESE_MALTA: return N("mt_MT");
           }
-        return "mt";
+        return N("mt");
       case LANG_MANIPURI:
         switch (sub)
           {
-          case SUBLANG_DEFAULT: return "mni_IN";
+          case SUBLANG_DEFAULT: return N("mni_IN");
           }
-        return "mni";
+        return N("mni");
       case LANG_MAORI:
         switch (sub)
           {
-          case SUBLANG_MAORI_NEW_ZEALAND: return "mi_NZ";
+          case SUBLANG_MAORI_NEW_ZEALAND: return N("mi_NZ");
           }
-        return "mi";
+        return N("mi");
       case LANG_MAPUDUNGUN:
         switch (sub)
           {
-          case SUBLANG_MAPUDUNGUN_CHILE: return "arn_CL";
+          case SUBLANG_MAPUDUNGUN_CHILE: return N("arn_CL");
           }
-        return "arn";
+        return N("arn");
       case LANG_MARATHI:
         switch (sub)
           {
-          case SUBLANG_MARATHI_INDIA: return "mr_IN";
+          case SUBLANG_MARATHI_INDIA: return N("mr_IN");
           }
-        return "mr";
+        return N("mr");
       case LANG_MOHAWK:
         switch (sub)
           {
-          case SUBLANG_MOHAWK_CANADA: return "moh_CA";
+          case SUBLANG_MOHAWK_CANADA: return N("moh_CA");
           }
-        return "moh";
+        return N("moh");
       case LANG_MONGOLIAN:
         switch (sub)
           {
-          case SUBLANG_MONGOLIAN_CYRILLIC_MONGOLIA: case 0x1e: return "mn_MN";
-          case SUBLANG_MONGOLIAN_PRC: case 0x1f: return "mn_CN";
+          case SUBLANG_MONGOLIAN_CYRILLIC_MONGOLIA: case 0x1e: return N("mn_MN");
+          case SUBLANG_MONGOLIAN_PRC: case 0x1f: return N("mn_CN");
           }
-        return "mn"; /* Ambiguous: could be "mn_CN" or "mn_MN".  */
+        return N("mn"); /* Ambiguous: could be "mn_CN" or "mn_MN".  */
       case LANG_NEPALI:
         switch (sub)
           {
-          case SUBLANG_NEPALI_NEPAL: return "ne_NP";
-          case SUBLANG_NEPALI_INDIA: return "ne_IN";
+          case SUBLANG_NEPALI_NEPAL: return N("ne_NP");
+          case SUBLANG_NEPALI_INDIA: return N("ne_IN");
           }
-        return "ne";
+        return N("ne");
       case LANG_NORWEGIAN:
         switch (sub)
           {
-          case 0x1f: return "nb";
-          case SUBLANG_NORWEGIAN_BOKMAL: return "nb_NO";
-          case 0x1e: return "nn";
-          case SUBLANG_NORWEGIAN_NYNORSK: return "nn_NO";
+          case 0x1f: return N("nb");
+          case SUBLANG_NORWEGIAN_BOKMAL: return N("nb_NO");
+          case 0x1e: return N("nn");
+          case SUBLANG_NORWEGIAN_NYNORSK: return N("nn_NO");
           }
-        return "no";
+        return N("no");
       case LANG_OCCITAN:
         switch (sub)
           {
-          case SUBLANG_OCCITAN_FRANCE: return "oc_FR";
+          case SUBLANG_OCCITAN_FRANCE: return N("oc_FR");
           }
-        return "oc";
+        return N("oc");
       case LANG_ORIYA:
         switch (sub)
           {
-          case SUBLANG_ORIYA_INDIA: return "or_IN";
+          case SUBLANG_ORIYA_INDIA: return N("or_IN");
           }
-        return "or";
+        return N("or");
       case LANG_OROMO:
         switch (sub)
           {
-          case SUBLANG_DEFAULT: return "om_ET";
+          case SUBLANG_DEFAULT: return N("om_ET");
           }
-        return "om";
+        return N("om");
       case LANG_PAPIAMENTU:
         switch (sub)
           {
-          case SUBLANG_DEFAULT: return "pap_AN";
+          case SUBLANG_DEFAULT: return N("pap_AN");
           }
-        return "pap";
+        return N("pap");
       case LANG_PASHTO:
         switch (sub)
           {
-          case SUBLANG_PASHTO_AFGHANISTAN: return "ps_AF";
+          case SUBLANG_PASHTO_AFGHANISTAN: return N("ps_AF");
           }
-        return "ps"; /* Ambiguous: could be "ps_PK" or "ps_AF".  */
+        return N("ps"); /* Ambiguous: could be "ps_PK" or "ps_AF".  */
       case LANG_POLISH:
         switch (sub)
           {
-          case SUBLANG_POLISH_POLAND: return "pl_PL";
+          case SUBLANG_POLISH_POLAND: return N("pl_PL");
           }
-        return "pl";
+        return N("pl");
       case LANG_PORTUGUESE:
         switch (sub)
           {
           /* Hmm. SUBLANG_PORTUGUESE_BRAZILIAN == SUBLANG_DEFAULT.
              Same phenomenon as SUBLANG_ENGLISH_US == SUBLANG_DEFAULT. */
-          case SUBLANG_PORTUGUESE_BRAZILIAN: return "pt_BR";
-          case SUBLANG_PORTUGUESE: return "pt_PT";
+          case SUBLANG_PORTUGUESE_BRAZILIAN: return N("pt_BR");
+          case SUBLANG_PORTUGUESE: return N("pt_PT");
           }
-        return "pt";
+        return N("pt");
       case LANG_PUNJABI:
         switch (sub)
           {
-          case SUBLANG_PUNJABI_INDIA: return "pa_IN"; /* Gurmukhi script */
-          case SUBLANG_PUNJABI_PAKISTAN: return "pa_PK"; /* Arabic script */
+          case SUBLANG_PUNJABI_INDIA: return N("pa_IN"); /* Gurmukhi script */
+          case SUBLANG_PUNJABI_PAKISTAN: return N("pa_PK"); /* Arabic script */
           }
-        return "pa";
+        return N("pa");
       case LANG_QUECHUA:
         /* Note: Microsoft uses the non-ISO language code "quz".  */
         switch (sub)
           {
-          case SUBLANG_QUECHUA_BOLIVIA: return "qu_BO";
-          case SUBLANG_QUECHUA_ECUADOR: return "qu_EC";
-          case SUBLANG_QUECHUA_PERU: return "qu_PE";
+          case SUBLANG_QUECHUA_BOLIVIA: return N("qu_BO");
+          case SUBLANG_QUECHUA_ECUADOR: return N("qu_EC");
+          case SUBLANG_QUECHUA_PERU: return N("qu_PE");
           }
-        return "qu";
+        return N("qu");
       case LANG_ROMANIAN:
         switch (sub)
           {
-          case SUBLANG_ROMANIAN_ROMANIA: return "ro_RO";
-          case SUBLANG_ROMANIAN_MOLDOVA: return "ro_MD";
+          case SUBLANG_ROMANIAN_ROMANIA: return N("ro_RO");
+          case SUBLANG_ROMANIAN_MOLDOVA: return N("ro_MD");
           }
-        return "ro";
+        return N("ro");
       case LANG_ROMANSH:
         switch (sub)
           {
-          case SUBLANG_ROMANSH_SWITZERLAND: return "rm_CH";
+          case SUBLANG_ROMANSH_SWITZERLAND: return N("rm_CH");
           }
-        return "rm";
+        return N("rm");
       case LANG_RUSSIAN:
         switch (sub)
           {
-          case SUBLANG_RUSSIAN_RUSSIA: return "ru_RU";
-          case SUBLANG_RUSSIAN_MOLDAVIA: return "ru_MD";
+          case SUBLANG_RUSSIAN_RUSSIA: return N("ru_RU");
+          case SUBLANG_RUSSIAN_MOLDAVIA: return N("ru_MD");
           }
-        return "ru"; /* Ambiguous: could be "ru_RU" or "ru_UA" or "ru_MD".  */
+        return N("ru"); /* Ambiguous: could be "ru_RU" or "ru_UA" or "ru_MD".  */
       case LANG_SAMI:
         switch (sub)
           {
           /* Northern Sami */
-          case 0x00: return "se";
-          case SUBLANG_SAMI_NORTHERN_NORWAY: return "se_NO";
-          case SUBLANG_SAMI_NORTHERN_SWEDEN: return "se_SE";
-          case SUBLANG_SAMI_NORTHERN_FINLAND: return "se_FI";
+          case 0x00: return N("se");
+          case SUBLANG_SAMI_NORTHERN_NORWAY: return N("se_NO");
+          case SUBLANG_SAMI_NORTHERN_SWEDEN: return N("se_SE");
+          case SUBLANG_SAMI_NORTHERN_FINLAND: return N("se_FI");
           /* Lule Sami */
-          case 0x1f: return "smj";
-          case SUBLANG_SAMI_LULE_NORWAY: return "smj_NO";
-          case SUBLANG_SAMI_LULE_SWEDEN: return "smj_SE";
+          case 0x1f: return N("smj");
+          case SUBLANG_SAMI_LULE_NORWAY: return N("smj_NO");
+          case SUBLANG_SAMI_LULE_SWEDEN: return N("smj_SE");
           /* Southern Sami */
-          case 0x1e: return "sma";
-          case SUBLANG_SAMI_SOUTHERN_NORWAY: return "sma_NO";
-          case SUBLANG_SAMI_SOUTHERN_SWEDEN: return "sma_SE";
+          case 0x1e: return N("sma");
+          case SUBLANG_SAMI_SOUTHERN_NORWAY: return N("sma_NO");
+          case SUBLANG_SAMI_SOUTHERN_SWEDEN: return N("sma_SE");
           /* Skolt Sami */
-          case 0x1d: return "sms";
-          case SUBLANG_SAMI_SKOLT_FINLAND: return "sms_FI";
+          case 0x1d: return N("sms");
+          case SUBLANG_SAMI_SKOLT_FINLAND: return N("sms_FI");
           /* Inari Sami */
-          case 0x1c: return "smn";
-          case SUBLANG_SAMI_INARI_FINLAND: return "smn_FI";
+          case 0x1c: return N("smn");
+          case SUBLANG_SAMI_INARI_FINLAND: return N("smn_FI");
           }
-        return "se"; /* or "smi"? */
+        return N("se"); /* or "smi"? */
       case LANG_SANSKRIT:
         switch (sub)
           {
-          case SUBLANG_SANSKRIT_INDIA: return "sa_IN";
+          case SUBLANG_SANSKRIT_INDIA: return N("sa_IN");
           }
-        return "sa";
+        return N("sa");
       case LANG_SCOTTISH_GAELIC:
         switch (sub)
           {
-          case SUBLANG_DEFAULT: return "gd_GB";
+          case SUBLANG_DEFAULT: return N("gd_GB");
           }
-        return "gd";
+        return N("gd");
       case LANG_SINDHI:
         switch (sub)
           {
-          case SUBLANG_SINDHI_INDIA: return "sd_IN";
-          case SUBLANG_SINDHI_PAKISTAN: return "sd_PK";
-          /*case SUBLANG_SINDHI_AFGHANISTAN: return "sd_AF";*/
+          case SUBLANG_SINDHI_INDIA: return N("sd_IN");
+          case SUBLANG_SINDHI_PAKISTAN: return N("sd_PK");
+          /*case SUBLANG_SINDHI_AFGHANISTAN: return N("sd_AF");*/
           }
-        return "sd";
+        return N("sd");
       case LANG_SINHALESE:
         switch (sub)
           {
-          case SUBLANG_SINHALESE_SRI_LANKA: return "si_LK";
+          case SUBLANG_SINHALESE_SRI_LANKA: return N("si_LK");
           }
-        return "si";
+        return N("si");
       case LANG_SLOVAK:
         switch (sub)
           {
-          case SUBLANG_SLOVAK_SLOVAKIA: return "sk_SK";
+          case SUBLANG_SLOVAK_SLOVAKIA: return N("sk_SK");
           }
-        return "sk";
+        return N("sk");
       case LANG_SLOVENIAN:
         switch (sub)
           {
-          case SUBLANG_SLOVENIAN_SLOVENIA: return "sl_SI";
+          case SUBLANG_SLOVENIAN_SLOVENIA: return N("sl_SI");
           }
-        return "sl";
+        return N("sl");
       case LANG_SOMALI:
         switch (sub)
           {
-          case SUBLANG_DEFAULT: return "so_SO";
+          case SUBLANG_DEFAULT: return N("so_SO");
           }
-        return "so";
+        return N("so");
       case LANG_SORBIAN:
         switch (sub)
           {
           /* Upper Sorbian */
-          case 0x00: return "hsb";
-          case SUBLANG_UPPER_SORBIAN_GERMANY: return "hsb_DE";
+          case 0x00: return N("hsb");
+          case SUBLANG_UPPER_SORBIAN_GERMANY: return N("hsb_DE");
           /* Lower Sorbian */
-          case 0x1f: return "dsb";
-          case SUBLANG_LOWER_SORBIAN_GERMANY: return "dsb_DE";
+          case 0x1f: return N("dsb");
+          case SUBLANG_LOWER_SORBIAN_GERMANY: return N("dsb_DE");
           }
-        return "wen";
+        return N("wen");
       case LANG_SOTHO:
         /* <https://docs.microsoft.com/en-us/windows/desktop/Intl/language-identifier-constants-and-strings>
            calls it "Sesotho sa Leboa"; according to
@@ -2303,240 +2308,241 @@ gl_locale_name_from_win32_LANGID (LANGID langid)
            it's the same as Northern Sotho.  */
         switch (sub)
           {
-          case SUBLANG_SOTHO_SOUTH_AFRICA: return "nso_ZA";
+          case SUBLANG_SOTHO_SOUTH_AFRICA: return N("nso_ZA");
           }
-        return "nso";
+        return N("nso");
       case LANG_SPANISH:
         switch (sub)
           {
-          case SUBLANG_SPANISH: return "es_ES";
-          case SUBLANG_SPANISH_MEXICAN: return "es_MX";
+          case SUBLANG_SPANISH: return N("es_ES");
+          case SUBLANG_SPANISH_MEXICAN: return N("es_MX");
           case SUBLANG_SPANISH_MODERN:
-            return "es_ES@modern";      /* not seen on Unix */
-          case SUBLANG_SPANISH_GUATEMALA: return "es_GT";
-          case SUBLANG_SPANISH_COSTA_RICA: return "es_CR";
-          case SUBLANG_SPANISH_PANAMA: return "es_PA";
-          case SUBLANG_SPANISH_DOMINICAN_REPUBLIC: return "es_DO";
-          case SUBLANG_SPANISH_VENEZUELA: return "es_VE";
-          case SUBLANG_SPANISH_COLOMBIA: return "es_CO";
-          case SUBLANG_SPANISH_PERU: return "es_PE";
-          case SUBLANG_SPANISH_ARGENTINA: return "es_AR";
-          case SUBLANG_SPANISH_ECUADOR: return "es_EC";
-          case SUBLANG_SPANISH_CHILE: return "es_CL";
-          case SUBLANG_SPANISH_URUGUAY: return "es_UY";
-          case SUBLANG_SPANISH_PARAGUAY: return "es_PY";
-          case SUBLANG_SPANISH_BOLIVIA: return "es_BO";
-          case SUBLANG_SPANISH_EL_SALVADOR: return "es_SV";
-          case SUBLANG_SPANISH_HONDURAS: return "es_HN";
-          case SUBLANG_SPANISH_NICARAGUA: return "es_NI";
-          case SUBLANG_SPANISH_PUERTO_RICO: return "es_PR";
-          case SUBLANG_SPANISH_US: return "es_US";
-          }
-        return "es";
+            return N("es_ES@modern");      /* not seen on Unix */
+          case SUBLANG_SPANISH_GUATEMALA: return N("es_GT");
+          case SUBLANG_SPANISH_COSTA_RICA: return N("es_CR");
+          case SUBLANG_SPANISH_PANAMA: return N("es_PA");
+          case SUBLANG_SPANISH_DOMINICAN_REPUBLIC: return N("es_DO");
+          case SUBLANG_SPANISH_VENEZUELA: return N("es_VE");
+          case SUBLANG_SPANISH_COLOMBIA: return N("es_CO");
+          case SUBLANG_SPANISH_PERU: return N("es_PE");
+          case SUBLANG_SPANISH_ARGENTINA: return N("es_AR");
+          case SUBLANG_SPANISH_ECUADOR: return N("es_EC");
+          case SUBLANG_SPANISH_CHILE: return N("es_CL");
+          case SUBLANG_SPANISH_URUGUAY: return N("es_UY");
+          case SUBLANG_SPANISH_PARAGUAY: return N("es_PY");
+          case SUBLANG_SPANISH_BOLIVIA: return N("es_BO");
+          case SUBLANG_SPANISH_EL_SALVADOR: return N("es_SV");
+          case SUBLANG_SPANISH_HONDURAS: return N("es_HN");
+          case SUBLANG_SPANISH_NICARAGUA: return N("es_NI");
+          case SUBLANG_SPANISH_PUERTO_RICO: return N("es_PR");
+          case SUBLANG_SPANISH_US: return N("es_US");
+          }
+        return N("es");
       case LANG_SUTU:
         switch (sub)
           {
-          case SUBLANG_DEFAULT: return "bnt_TZ"; /* or "st_LS" or "nso_ZA"? */
+          case SUBLANG_DEFAULT: return N("bnt_TZ"); /* or "st_LS" or "nso_ZA"? */
           }
-        return "bnt";
+        return N("bnt");
       case LANG_SWAHILI:
         switch (sub)
           {
-          case SUBLANG_SWAHILI_KENYA: return "sw_KE";
+          case SUBLANG_SWAHILI_KENYA: return N("sw_KE");
           }
-        return "sw";
+        return N("sw");
       case LANG_SWEDISH:
         switch (sub)
           {
-          case SUBLANG_SWEDISH_SWEDEN: return "sv_SE";
-          case SUBLANG_SWEDISH_FINLAND: return "sv_FI";
+          case SUBLANG_SWEDISH_SWEDEN: return N("sv_SE");
+          case SUBLANG_SWEDISH_FINLAND: return N("sv_FI");
           }
-        return "sv";
+        return N("sv");
       case LANG_SYRIAC:
         switch (sub)
           {
-          case SUBLANG_SYRIAC_SYRIA: return "syr_SY"; /* An extinct language.  */
+          case SUBLANG_SYRIAC_SYRIA: return N("syr_SY"); /* An extinct language.  */
           }
-        return "syr";
+        return N("syr");
       case LANG_TAGALOG:
         switch (sub)
           {
-          case SUBLANG_TAGALOG_PHILIPPINES: return "tl_PH"; /* or "fil_PH"? */
+          case SUBLANG_TAGALOG_PHILIPPINES: return N("tl_PH"); /* or "fil_PH"? */
           }
-        return "tl"; /* or "fil"? */
+        return N("tl"); /* or "fil"? */
       case LANG_TAJIK:
         switch (sub)
           {
-          case 0x1f: return "tg";
-          case SUBLANG_TAJIK_TAJIKISTAN: return "tg_TJ";
+          case 0x1f: return N("tg");
+          case SUBLANG_TAJIK_TAJIKISTAN: return N("tg_TJ");
           }
-        return "tg";
+        return N("tg");
       case LANG_TAMAZIGHT:
         /* Note: Microsoft uses the non-ISO language code "tmz".  */
         switch (sub)
           {
-          case SUBLANG_TAMAZIGHT_ARABIC: return "ber_MA";
-          case 0x1f: return "ber@latin";
-          case SUBLANG_TAMAZIGHT_ALGERIA_LATIN: return "ber_DZ";
+          case SUBLANG_TAMAZIGHT_ARABIC: return N("ber_MA");
+          case 0x1f: return N("ber@latin");
+          case SUBLANG_TAMAZIGHT_ALGERIA_LATIN: return N("ber_DZ");
           }
-        return "ber";
+        return N("ber");
       case LANG_TAMIL:
         switch (sub)
           {
-          case SUBLANG_TAMIL_INDIA: return "ta_IN";
+          case SUBLANG_TAMIL_INDIA: return N("ta_IN");
           }
-        return "ta"; /* Ambiguous: could be "ta_IN" or "ta_LK" or "ta_SG".  */
+        return N("ta"); /* Ambiguous: could be "ta_IN" or "ta_LK" or "ta_SG".  */
       case LANG_TATAR:
         switch (sub)
           {
-          case SUBLANG_TATAR_RUSSIA: return "tt_RU";
+          case SUBLANG_TATAR_RUSSIA: return N("tt_RU");
           }
-        return "tt";
+        return N("tt");
       case LANG_TELUGU:
         switch (sub)
           {
-          case SUBLANG_TELUGU_INDIA: return "te_IN";
+          case SUBLANG_TELUGU_INDIA: return N("te_IN");
           }
-        return "te";
+        return N("te");
       case LANG_THAI:
         switch (sub)
           {
-          case SUBLANG_THAI_THAILAND: return "th_TH";
+          case SUBLANG_THAI_THAILAND: return N("th_TH");
           }
-        return "th";
+        return N("th");
       case LANG_TIBETAN:
         switch (sub)
           {
           case SUBLANG_TIBETAN_PRC:
             /* Most Tibetans would not like "bo_CN".  But Tibet does not yet
                have a country code of its own.  */
-            return "bo";
-          case SUBLANG_TIBETAN_BHUTAN: return "bo_BT";
+            return N("bo");
+          case SUBLANG_TIBETAN_BHUTAN: return N("bo_BT");
           }
-        return "bo";
+        return N("bo");
       case LANG_TIGRINYA:
         switch (sub)
           {
-          case SUBLANG_TIGRINYA_ETHIOPIA: return "ti_ET";
-          case SUBLANG_TIGRINYA_ERITREA: return "ti_ER";
+          case SUBLANG_TIGRINYA_ETHIOPIA: return N("ti_ET");
+          case SUBLANG_TIGRINYA_ERITREA: return N("ti_ER");
           }
-        return "ti";
+        return N("ti");
       case LANG_TSONGA:
         switch (sub)
           {
-          case SUBLANG_DEFAULT: return "ts_ZA";
+          case SUBLANG_DEFAULT: return N("ts_ZA");
           }
-        return "ts";
+        return N("ts");
       case LANG_TSWANA:
         /* Spoken in South Africa, Botswana.  */
         switch (sub)
           {
-          case SUBLANG_TSWANA_SOUTH_AFRICA: return "tn_ZA";
+          case SUBLANG_TSWANA_SOUTH_AFRICA: return N("tn_ZA");
           }
-        return "tn";
+        return N("tn");
       case LANG_TURKISH:
         switch (sub)
           {
-          case SUBLANG_TURKISH_TURKEY: return "tr_TR";
+          case SUBLANG_TURKISH_TURKEY: return N("tr_TR");
           }
-        return "tr";
+        return N("tr");
       case LANG_TURKMEN:
         switch (sub)
           {
-          case SUBLANG_TURKMEN_TURKMENISTAN: return "tk_TM";
+          case SUBLANG_TURKMEN_TURKMENISTAN: return N("tk_TM");
           }
-        return "tk";
+        return N("tk");
       case LANG_UIGHUR:
         switch (sub)
           {
-          case SUBLANG_UIGHUR_PRC: return "ug_CN";
+          case SUBLANG_UIGHUR_PRC: return N("ug_CN");
           }
-        return "ug";
+        return N("ug");
       case LANG_UKRAINIAN:
         switch (sub)
           {
-          case SUBLANG_UKRAINIAN_UKRAINE: return "uk_UA";
+          case SUBLANG_UKRAINIAN_UKRAINE: return N("uk_UA");
           }
-        return "uk";
+        return N("uk");
       case LANG_URDU:
         switch (sub)
           {
-          case SUBLANG_URDU_PAKISTAN: return "ur_PK";
-          case SUBLANG_URDU_INDIA: return "ur_IN";
+          case SUBLANG_URDU_PAKISTAN: return N("ur_PK");
+          case SUBLANG_URDU_INDIA: return N("ur_IN");
           }
-        return "ur";
+        return N("ur");
       case LANG_UZBEK:
         switch (sub)
           {
-          case 0x1f: return "uz";
-          case SUBLANG_UZBEK_LATIN: return "uz_UZ";
-          case 0x1e: return "uz@cyrillic";
-          case SUBLANG_UZBEK_CYRILLIC: return "uz_UZ@cyrillic";
+          case 0x1f: return N("uz");
+          case SUBLANG_UZBEK_LATIN: return N("uz_UZ");
+          case 0x1e: return N("uz@cyrillic");
+          case SUBLANG_UZBEK_CYRILLIC: return N("uz_UZ@cyrillic");
           }
-        return "uz";
+        return N("uz");
       case LANG_VENDA:
         switch (sub)
           {
-          case SUBLANG_DEFAULT: return "ve_ZA";
+          case SUBLANG_DEFAULT: return N("ve_ZA");
           }
-        return "ve";
+        return N("ve");
       case LANG_VIETNAMESE:
         switch (sub)
           {
-          case SUBLANG_VIETNAMESE_VIETNAM: return "vi_VN";
+          case SUBLANG_VIETNAMESE_VIETNAM: return N("vi_VN");
           }
-        return "vi";
+        return N("vi");
       case LANG_WELSH:
         switch (sub)
           {
-          case SUBLANG_WELSH_UNITED_KINGDOM: return "cy_GB";
+          case SUBLANG_WELSH_UNITED_KINGDOM: return N("cy_GB");
           }
-        return "cy";
+        return N("cy");
       case LANG_WOLOF:
         switch (sub)
           {
-          case SUBLANG_WOLOF_SENEGAL: return "wo_SN";
+          case SUBLANG_WOLOF_SENEGAL: return N("wo_SN");
           }
-        return "wo";
+        return N("wo");
       case LANG_XHOSA:
         switch (sub)
           {
-          case SUBLANG_XHOSA_SOUTH_AFRICA: return "xh_ZA";
+          case SUBLANG_XHOSA_SOUTH_AFRICA: return N("xh_ZA");
           }
-        return "xh";
+        return N("xh");
       case LANG_YAKUT:
         switch (sub)
           {
-          case SUBLANG_YAKUT_RUSSIA: return "sah_RU";
+          case SUBLANG_YAKUT_RUSSIA: return N("sah_RU");
           }
-        return "sah";
+        return N("sah");
       case LANG_YI:
         switch (sub)
           {
-          case SUBLANG_YI_PRC: return "ii_CN";
+          case SUBLANG_YI_PRC: return N("ii_CN");
           }
-        return "ii";
+        return N("ii");
       case LANG_YIDDISH:
         switch (sub)
           {
-          case SUBLANG_DEFAULT: return "yi_IL";
+          case SUBLANG_DEFAULT: return N("yi_IL");
           }
-        return "yi";
+        return N("yi");
       case LANG_YORUBA:
         switch (sub)
           {
-          case SUBLANG_YORUBA_NIGERIA: return "yo_NG";
+          case SUBLANG_YORUBA_NIGERIA: return N("yo_NG");
           }
-        return "yo";
+        return N("yo");
       case LANG_ZULU:
         switch (sub)
           {
-          case SUBLANG_ZULU_SOUTH_AFRICA: return "zu_ZA";
+          case SUBLANG_ZULU_SOUTH_AFRICA: return N("zu_ZA");
           }
-        return "zu";
-      default: return "C";
+        return N("zu");
+      default: return N("C");
       }
   }
+  #undef N
 }
 
 # if !defined IN_LIBINTL
-- 
2.43.0

>From e63eea8ea358041610c3f9a9ed4d5a1e44be5cc4 Mon Sep 17 00:00:00 2001
From: Bruno Haible <br...@clisp.org>
Date: Mon, 23 Dec 2024 16:57:02 +0100
Subject: [PATCH 4/7] localename tests: Test in the UTF-8 environment on native
 Windows.

* tests/test-localename-w32utf8.sh: New file.
* tests/test-localename-w32utf8.c: New file.
* modules/localename-tests (Files): Add these files and
m4/windows-rc.m4, tests/windows-utf8.rc, tests/windows-utf8.manifest.
(Depends-on): Add test-xfail.
(configure.ac): Invoke gl_WINDOWS_RC.
(Makefile.am): Arrange to compile test-localename-w32utf8 and run
test-localename-w32utf8.sh.
---
 ChangeLog                        | 10 +++++++
 modules/localename-tests         | 15 ++++++++++
 tests/test-localename-w32utf8.c  | 47 ++++++++++++++++++++++++++++++++
 tests/test-localename-w32utf8.sh |  7 +++++
 4 files changed, 79 insertions(+)
 create mode 100644 tests/test-localename-w32utf8.c
 create mode 100755 tests/test-localename-w32utf8.sh

diff --git a/ChangeLog b/ChangeLog
index d9f282c21e..fd3cf9f7ca 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,15 @@
 2024-12-23  Bruno Haible  <br...@clisp.org>
 
+	localename tests: Test in the UTF-8 environment on native Windows.
+	* tests/test-localename-w32utf8.sh: New file.
+	* tests/test-localename-w32utf8.c: New file.
+	* modules/localename-tests (Files): Add these files and
+	m4/windows-rc.m4, tests/windows-utf8.rc, tests/windows-utf8.manifest.
+	(Depends-on): Add test-xfail.
+	(configure.ac): Invoke gl_WINDOWS_RC.
+	(Makefile.am): Arrange to compile test-localename-w32utf8 and run
+	test-localename-w32utf8.sh.
+
 	localename-unsafe: Support the UTF-8 environment on native Windows.
 	* lib/localename-unsafe.c (gl_locale_name_from_win32_LANGID): Append a
 	suffix ".UTF-8" to the result if GetACP() is UTF-8.
diff --git a/modules/localename-tests b/modules/localename-tests
index 0c24d5b4b6..cf4d586806 100644
--- a/modules/localename-tests
+++ b/modules/localename-tests
@@ -1,7 +1,12 @@
 Files:
 tests/test-localename.c
+tests/test-localename-w32utf8.sh
+tests/test-localename-w32utf8.c
+tests/windows-utf8.rc
+tests/windows-utf8.manifest
 tests/macros.h
 m4/musl.m4
+m4/windows-rc.m4
 
 Depends-on:
 locale
@@ -9,13 +14,23 @@ setenv
 unsetenv
 setlocale
 strdup
+test-xfail
 
 configure.ac:
 gl_CHECK_FUNCS_ANDROID([newlocale], [[#include <locale.h>]])
 gl_MUSL_LIBC
+gl_WINDOWS_RC
 
 Makefile.am:
 TESTS += test-localename
 check_PROGRAMS += test-localename
 test_localename_LDADD = $(LDADD) $(SETLOCALE_LIB) @INTL_MACOSX_LIBS@ $(LIBTHREAD)
 
+if OS_IS_NATIVE_WINDOWS
+TESTS += test-localename-w32utf8.sh
+noinst_PROGRAMS += test-localename-w32utf8
+test_localename_w32utf8_LDADD = $(LDADD) test-localename-windows-utf8.res $(SETLOCALE_LIB)
+test-localename-windows-utf8.res : $(srcdir)/windows-utf8.rc
+	$(WINDRES) -i $(srcdir)/windows-utf8.rc -o test-localename-windows-utf8.res --output-format=coff
+MOSTLYCLEANFILES += test-localename-windows-utf8.res
+endif
diff --git a/tests/test-localename-w32utf8.c b/tests/test-localename-w32utf8.c
new file mode 100644
index 0000000000..72a01c0749
--- /dev/null
+++ b/tests/test-localename-w32utf8.c
@@ -0,0 +1,47 @@
+/* Test of gl_locale_name function and its variants
+   on native Windows in the UTF-8 environment.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation, either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */
+
+/* Written by Bruno Haible <br...@clisp.org>, 2024.  */
+
+#include <config.h>
+
+#include "localename.h"
+
+#include <stdio.h>
+#include <string.h>
+
+#include "macros.h"
+
+int
+main (void)
+{
+#ifdef _UCRT
+  const char *name = gl_locale_name_default ();
+
+  ASSERT (name != NULL);
+
+  /* With the legacy system settings, expect "C.UTF-8", not "C", because "C" is
+     a single-byte locale.
+     With the modern system settings, expect some "ll_CC.UTF-8" name.  */
+  ASSERT (strlen (name) > 6 && strcmp (name + strlen (name)- 6, ".UTF-8") == 0);
+
+  return test_exit_status;
+#else
+  fputs ("Skipping test: not using the UCRT runtime\n", stderr);
+  return 77;
+#endif
+}
diff --git a/tests/test-localename-w32utf8.sh b/tests/test-localename-w32utf8.sh
new file mode 100755
index 0000000000..de7629c3a7
--- /dev/null
+++ b/tests/test-localename-w32utf8.sh
@@ -0,0 +1,7 @@
+#!/bin/sh
+
+# Test the UTF-8 environment on native Windows.
+unset LC_ALL
+unset LC_CTYPE
+unset LANG
+${CHECKER} ./test-localename-w32utf8${EXEEXT}
-- 
2.43.0

>From 00211fc69c926d6c8f6e3f3cf1d8802623db2af9 Mon Sep 17 00:00:00 2001
From: Bruno Haible <br...@clisp.org>
Date: Mon, 23 Dec 2024 16:57:15 +0100
Subject: [PATCH 5/7] setlocale: Support the UTF-8 environment on native
 Windows.

* lib/setlocale.c: Include <windows.h>.
(setlocale_unixlike): In the UTF-8 environment, append a suffix ".65001"
to the locale names passed to the native setlocale().
---
 ChangeLog       |  7 +++++++
 lib/setlocale.c | 51 ++++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 53 insertions(+), 5 deletions(-)

diff --git a/ChangeLog b/ChangeLog
index fd3cf9f7ca..9f89cb8718 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,10 @@
+2024-12-23  Bruno Haible  <br...@clisp.org>
+
+	setlocale: Support the UTF-8 environment on native Windows.
+	* lib/setlocale.c: Include <windows.h>.
+	(setlocale_unixlike): In the UTF-8 environment, append a suffix ".65001"
+	to the locale names passed to the native setlocale().
+
 2024-12-23  Bruno Haible  <br...@clisp.org>
 
 	localename tests: Test in the UTF-8 environment on native Windows.
diff --git a/lib/setlocale.c b/lib/setlocale.c
index 62dce81de3..3cb711d8e1 100644
--- a/lib/setlocale.c
+++ b/lib/setlocale.c
@@ -47,6 +47,11 @@
 extern void gl_locale_name_canonicalize (char *name);
 #endif
 
+#if defined _WIN32 && !defined __CYGWIN__
+# define WIN32_LEAN_AND_MEAN
+# include <windows.h>
+#endif
+
 #if 1
 
 # undef setlocale
@@ -672,6 +677,7 @@ search (const struct table_entry *table, size_t table_size, const char *string,
 static char *
 setlocale_unixlike (int category, const char *locale)
 {
+  int is_utf8 = (GetACP () == 65001);
   char *result;
   char llCC_buf[64];
   char ll_buf[64];
@@ -682,6 +688,15 @@ setlocale_unixlike (int category, const char *locale)
   if (locale != NULL && strcmp (locale, "POSIX") == 0)
     locale = "C";
 
+  /* The native Windows implementation of setlocale, in the UTF-8 environment,
+     does not understand the locale names "C.UTF-8" or "C.utf8" or "C.65001",
+     but it understands "English_United States.65001", which is functionally
+     equivalent.  */
+  if (locale != NULL
+      && ((is_utf8 && strcmp (locale, "C") == 0)
+          || strcmp (locale, "C.UTF-8") == 0))
+    locale = "English_United States.65001";
+
   /* First, try setlocale with the original argument unchanged.  */
   result = setlocale_mtsafe (category, locale);
   if (result != NULL)
@@ -714,7 +729,15 @@ setlocale_unixlike (int category, const char *locale)
        */
       if (strcmp (llCC_buf, locale) != 0)
         {
-          result = setlocale (category, llCC_buf);
+          if (is_utf8)
+            {
+              char buf[64+6];
+              strcpy (buf, llCC_buf);
+              strcat (buf, ".65001");
+              result = setlocale (category, buf);
+            }
+          else
+            result = setlocale (category, llCC_buf);
           if (result != NULL)
             return result;
         }
@@ -731,7 +754,15 @@ setlocale_unixlike (int category, const char *locale)
         for (i = range.lo; i < range.hi; i++)
           {
             /* Try the replacement in language_table[i].  */
-            result = setlocale (category, language_table[i].english);
+            if (is_utf8)
+              {
+                char buf[64+6];
+                strcpy (buf, language_table[i].english);
+                strcat (buf, ".65001");
+                result = setlocale (category, buf);
+              }
+            else
+              result = setlocale (category, language_table[i].english);
             if (result != NULL)
               return result;
           }
@@ -785,13 +816,15 @@ setlocale_unixlike (int category, const char *locale)
                             size_t part1_len = strlen (part1);
                             const char *part2 = country_table[j].english;
                             size_t part2_len = strlen (part2) + 1;
-                            char buf[64+64];
+                            char buf[64+64+6];
 
                             if (!(part1_len + 1 + part2_len <= sizeof (buf)))
                               abort ();
                             memcpy (buf, part1, part1_len);
                             buf[part1_len] = '_';
                             memcpy (buf + part1_len + 1, part2, part2_len);
+                            if (is_utf8)
+                              strcat (buf, ".65001");
 
                             /* Try the concatenated replacements.  */
                             result = setlocale (category, buf);
@@ -809,8 +842,16 @@ setlocale_unixlike (int category, const char *locale)
                     for (i = language_range.lo; i < language_range.hi; i++)
                       {
                         /* Try only the language replacement.  */
-                        result =
-                          setlocale (category, language_table[i].english);
+                        if (is_utf8)
+                          {
+                            char buf[64+6];
+                            strcpy (buf, language_table[i].english);
+                            strcat (buf, ".65001");
+                            result = setlocale (category, buf);
+                          }
+                        else
+                          result =
+                            setlocale (category, language_table[i].english);
                         if (result != NULL)
                           return result;
                       }
-- 
2.43.0

>From 2f4391fde8620749fb3859c568f952a958e2ca2c Mon Sep 17 00:00:00 2001
From: Bruno Haible <br...@clisp.org>
Date: Mon, 23 Dec 2024 16:58:53 +0100
Subject: [PATCH 6/7] setlocale tests: Test in the UTF-8 environment on native
 Windows.

* tests/test-setlocale-w32utf8.sh: New file.
* tests/test-setlocale-w32utf8.c: New file.
* modules/setlocale-tests (Files): Add these files and
m4/windows-rc.m4, tests/windows-utf8.rc, tests/windows-utf8.manifest.
(Depends-on): Add test-xfail.
(configure.ac): Invoke gl_WINDOWS_RC.
(Makefile.am): Arrange to compile test-setlocale-w32utf8 and run
test-setlocale-w32utf8.sh.
---
 ChangeLog                       | 10 +++++
 modules/setlocale-tests         | 16 ++++++++
 tests/test-setlocale-w32utf8.c  | 69 +++++++++++++++++++++++++++++++++
 tests/test-setlocale-w32utf8.sh | 12 ++++++
 4 files changed, 107 insertions(+)
 create mode 100644 tests/test-setlocale-w32utf8.c
 create mode 100755 tests/test-setlocale-w32utf8.sh

diff --git a/ChangeLog b/ChangeLog
index 9f89cb8718..c5e2e8b1b2 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,5 +1,15 @@
 2024-12-23  Bruno Haible  <br...@clisp.org>
 
+	setlocale tests: Test in the UTF-8 environment on native Windows.
+	* tests/test-setlocale-w32utf8.sh: New file.
+	* tests/test-setlocale-w32utf8.c: New file.
+	* modules/setlocale-tests (Files): Add these files and
+	m4/windows-rc.m4, tests/windows-utf8.rc, tests/windows-utf8.manifest.
+	(Depends-on): Add test-xfail.
+	(configure.ac): Invoke gl_WINDOWS_RC.
+	(Makefile.am): Arrange to compile test-setlocale-w32utf8 and run
+	test-setlocale-w32utf8.sh.
+
 	setlocale: Support the UTF-8 environment on native Windows.
 	* lib/setlocale.c: Include <windows.h>.
 	(setlocale_unixlike): In the UTF-8 environment, append a suffix ".65001"
diff --git a/modules/setlocale-tests b/modules/setlocale-tests
index ad0a536bc6..23cc6ddd17 100644
--- a/modules/setlocale-tests
+++ b/modules/setlocale-tests
@@ -4,21 +4,28 @@ tests/test-setlocale1.c
 tests/test-setlocale2.sh
 tests/test-setlocale2.c
 tests/test-setlocale-w32.c
+tests/test-setlocale-w32utf8.sh
+tests/test-setlocale-w32utf8.c
+tests/windows-utf8.rc
+tests/windows-utf8.manifest
 tests/signature.h
 tests/macros.h
 m4/locale-fr.m4
 m4/locale-ja.m4
 m4/locale-zh.m4
 m4/codeset.m4
+m4/windows-rc.m4
 
 Depends-on:
 strdup
+test-xfail
 
 configure.ac:
 gt_LOCALE_FR
 gt_LOCALE_FR_UTF8
 gt_LOCALE_JA
 gt_LOCALE_ZH_CN
+gl_WINDOWS_RC
 
 Makefile.am:
 TESTS += test-setlocale1.sh test-setlocale2.sh test-setlocale-w32
@@ -31,3 +38,12 @@ check_PROGRAMS += test-setlocale1 test-setlocale2 test-setlocale-w32
 test_setlocale1_LDADD = $(LDADD) @SETLOCALE_LIB@
 test_setlocale2_LDADD = $(LDADD) @SETLOCALE_LIB@
 test_setlocale_w32_LDADD = $(LDADD) @SETLOCALE_LIB@
+
+if OS_IS_NATIVE_WINDOWS
+TESTS += test-setlocale-w32utf8.sh
+noinst_PROGRAMS += test-setlocale-w32utf8
+test_setlocale_w32utf8_LDADD = $(LDADD) test-setlocale-windows-utf8.res $(SETLOCALE_LIB)
+test-setlocale-windows-utf8.res : $(srcdir)/windows-utf8.rc
+	$(WINDRES) -i $(srcdir)/windows-utf8.rc -o test-setlocale-windows-utf8.res --output-format=coff
+MOSTLYCLEANFILES += test-setlocale-windows-utf8.res
+endif
diff --git a/tests/test-setlocale-w32utf8.c b/tests/test-setlocale-w32utf8.c
new file mode 100644
index 0000000000..f0bbce05b7
--- /dev/null
+++ b/tests/test-setlocale-w32utf8.c
@@ -0,0 +1,69 @@
+/* Test of setting the current locale
+   on native Windows in the UTF-8 environment.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation, either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */
+
+/* Written by Bruno Haible <br...@clisp.org>, 2024.  */
+
+#include <config.h>
+
+#include <locale.h>
+
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+int
+main (void)
+{
+#ifdef _UCRT
+  /* Test that setlocale() works as expected in a UTF-8 locale.  */
+  char *name;
+
+  /* This looks at all LC_*, LANG environment variables, which are all unset
+     at this point.  */
+  if (setlocale (LC_ALL, "") == NULL)
+    return 1;
+
+  name = setlocale (LC_ALL, NULL);
+  /* With the legacy system settings, expect some mixed locale, due to the
+     limitations of the native setlocale().
+     With the modern system settings, expect some "ll_CC.UTF-8" name.  */
+  if (!((strlen (name) > 6 && strcmp (name + strlen (name) - 6, ".UTF-8") == 0)
+        || strcmp (name, "LC_COLLATE=English_United States.65001;"
+                         "LC_CTYPE=English_United States.65001;"
+                         "LC_MONETARY=English_United States.65001;"
+                         "LC_NUMERIC=English_United States.65001;"
+                         "LC_TIME=English_United States.65001;"
+                         "LC_MESSAGES=C.UTF-8")
+           == 0
+        || strcmp (name, "LC_COLLATE=English_United States.utf8;"
+                         "LC_CTYPE=English_United States.utf8;"
+                         "LC_MONETARY=English_United States.utf8;"
+                         "LC_NUMERIC=English_United States.utf8;"
+                         "LC_TIME=English_United States.utf8;"
+                         "LC_MESSAGES=C.UTF-8")
+           == 0))
+    {
+      fprintf (stderr, "setlocale() returned \"%s\".\n", name);
+      exit (1);
+    }
+
+  return 0;
+#else
+  fputs ("Skipping test: not using the UCRT runtime\n", stderr);
+  return 77;
+#endif
+}
diff --git a/tests/test-setlocale-w32utf8.sh b/tests/test-setlocale-w32utf8.sh
new file mode 100755
index 0000000000..e8f7484cf0
--- /dev/null
+++ b/tests/test-setlocale-w32utf8.sh
@@ -0,0 +1,12 @@
+#!/bin/sh
+
+# Test the UTF-8 environment on native Windows.
+unset LC_ALL
+unset LC_CTYPE
+unset LC_MESSAGES
+unset LC_NUMERIC
+unset LC_COLLATE
+unset LC_MONETARY
+unset LC_TIME
+unset LANG
+${CHECKER} ./test-setlocale-w32utf8${EXEEXT}
-- 
2.43.0

From c11a2e675ccc8637e6322b98d878b0315a8bb7e6 Mon Sep 17 00:00:00 2001
From: Bruno Haible <br...@clisp.org>
Date: Mon, 23 Dec 2024 16:59:20 +0100
Subject: [PATCH 7/7] mbrtowc tests: Test in the UTF-8 environment on native
 Windows.

* tests/test-mbrtowc-w32utf8.sh: New file.
* tests/test-mbrtowc-w32utf8.c: New file.
* modules/mbrtowc-tests (Files): Add these files and
m4/windows-rc.m4, tests/windows-utf8.rc, tests/windows-utf8.manifest.
(Depends-on): Add test-xfail.
(configure.ac): Invoke gl_WINDOWS_RC.
(Makefile.am): Arrange to compile test-mbrtowc-w32utf8 and run
test-mbrtowc-w32utf8.sh.
---
 ChangeLog                     |  12 +++
 modules/mbrtowc-tests         |  16 ++++
 tests/test-mbrtowc-w32utf8.c  | 166 ++++++++++++++++++++++++++++++++++
 tests/test-mbrtowc-w32utf8.sh |  12 +++
 4 files changed, 206 insertions(+)
 create mode 100644 tests/test-mbrtowc-w32utf8.c
 create mode 100755 tests/test-mbrtowc-w32utf8.sh

diff --git a/ChangeLog b/ChangeLog
index c5e2e8b1b2..e6d2e1d592 100644
--- a/ChangeLog
+++ b/ChangeLog
@@ -1,3 +1,15 @@
+2024-12-23  Bruno Haible  <br...@clisp.org>
+
+	mbrtowc tests: Test in the UTF-8 environment on native Windows.
+	* tests/test-mbrtowc-w32utf8.sh: New file.
+	* tests/test-mbrtowc-w32utf8.c: New file.
+	* modules/mbrtowc-tests (Files): Add these files and
+	m4/windows-rc.m4, tests/windows-utf8.rc, tests/windows-utf8.manifest.
+	(Depends-on): Add test-xfail.
+	(configure.ac): Invoke gl_WINDOWS_RC.
+	(Makefile.am): Arrange to compile test-mbrtowc-w32utf8 and run
+	test-mbrtowc-w32utf8.sh.
+
 2024-12-23  Bruno Haible  <br...@clisp.org>
 
 	setlocale tests: Test in the UTF-8 environment on native Windows.
diff --git a/modules/mbrtowc-tests b/modules/mbrtowc-tests
index d152e2e472..d9add89fee 100644
--- a/modules/mbrtowc-tests
+++ b/modules/mbrtowc-tests
@@ -13,6 +13,10 @@ tests/test-mbrtowc-w32-6.sh
 tests/test-mbrtowc-w32-7.sh
 tests/test-mbrtowc-w32-8.sh
 tests/test-mbrtowc-w32.c
+tests/test-mbrtowc-w32utf8.sh
+tests/test-mbrtowc-w32utf8.c
+tests/windows-utf8.rc
+tests/windows-utf8.manifest
 tests/signature.h
 tests/macros.h
 m4/locale-en.m4
@@ -20,12 +24,14 @@ m4/locale-fr.m4
 m4/locale-ja.m4
 m4/locale-zh.m4
 m4/codeset.m4
+m4/windows-rc.m4
 
 Depends-on:
 mbsinit
 wctob
 setlocale
 localcharset
+test-xfail
 
 configure.ac:
 gt_LOCALE_EN_UTF8
@@ -33,6 +39,7 @@ gt_LOCALE_FR
 gt_LOCALE_FR_UTF8
 gt_LOCALE_JA
 gt_LOCALE_ZH_CN
+gl_WINDOWS_RC
 
 Makefile.am:
 TESTS += \
@@ -49,3 +56,12 @@ TESTS_ENVIRONMENT += \
   LOCALE_ZH_CN='@LOCALE_ZH_CN@'
 check_PROGRAMS += test-mbrtowc test-mbrtowc-w32
 test_mbrtowc_LDADD = $(LDADD) $(SETLOCALE_LIB) $(MBRTOWC_LIB)
+
+if OS_IS_NATIVE_WINDOWS
+TESTS += test-mbrtowc-w32utf8.sh
+noinst_PROGRAMS += test-mbrtowc-w32utf8
+test_mbrtowc_w32utf8_LDADD = $(LDADD) test-mbrtowc-windows-utf8.res $(SETLOCALE_LIB)
+test-mbrtowc-windows-utf8.res : $(srcdir)/windows-utf8.rc
+	$(WINDRES) -i $(srcdir)/windows-utf8.rc -o test-mbrtowc-windows-utf8.res --output-format=coff
+MOSTLYCLEANFILES += test-mbrtowc-windows-utf8.res
+endif
diff --git a/tests/test-mbrtowc-w32utf8.c b/tests/test-mbrtowc-w32utf8.c
new file mode 100644
index 0000000000..803c1638c0
--- /dev/null
+++ b/tests/test-mbrtowc-w32utf8.c
@@ -0,0 +1,166 @@
+/* Test of conversion of multibyte character to wide character
+   on native Windows in the UTF-8 environment.
+   Copyright (C) 2024 Free Software Foundation, Inc.
+
+   This program is free software: you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation, either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <https://www.gnu.org/licenses/>.  */
+
+/* Written by Bruno Haible <br...@clisp.org>, 2024.  */
+
+#include <config.h>
+
+#include <wchar.h>
+
+#include <errno.h>
+#include <locale.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <string.h>
+
+#include "macros.h"
+
+int
+main (void)
+{
+#ifdef _UCRT
+  /* Test that MB_CUR_MAX and mbrtowc() work as expected in a UTF-8 locale.  */
+  mbstate_t state;
+  wchar_t wc;
+  size_t ret;
+
+  if (setlocale (LC_ALL, "") == NULL)
+    return 1;
+
+  ASSERT (MB_CUR_MAX >= 4);
+
+  {
+    char input[] = "B\303\274\303\237er"; /* "B????er" */
+    memset (&state, '\0', sizeof (mbstate_t));
+
+    wc = (wchar_t) 0xBADFACE;
+    ret = mbrtowc (&wc, input, 1, &state);
+    ASSERT (ret == 1);
+    ASSERT (wc == 'B');
+    ASSERT (mbsinit (&state));
+    input[0] = '\0';
+
+    wc = (wchar_t) 0xBADFACE;
+    ret = mbrtowc (&wc, input + 1, 1, &state);
+    ASSERT (ret == (size_t)(-2));
+    ASSERT (wc == (wchar_t) 0xBADFACE);
+    ASSERT (!mbsinit (&state));
+    input[1] = '\0';
+
+    wc = (wchar_t) 0xBADFACE;
+    ret = mbrtowc (&wc, input + 2, 5, &state);
+    ASSERT (ret == 1);
+    ASSERT (wctob (wc) == EOF);
+    ASSERT (wc == 0x00FC);
+    ASSERT (mbsinit (&state));
+    input[2] = '\0';
+
+    /* Test support of NULL first argument.  */
+    ret = mbrtowc (NULL, input + 3, 4, &state);
+    ASSERT (ret == 2);
+    ASSERT (mbsinit (&state));
+
+    wc = (wchar_t) 0xBADFACE;
+    ret = mbrtowc (&wc, input + 3, 4, &state);
+    ASSERT (ret == 2);
+    ASSERT (wctob (wc) == EOF);
+    ASSERT (wc == 0x00DF);
+    ASSERT (mbsinit (&state));
+    input[3] = '\0';
+    input[4] = '\0';
+
+    wc = (wchar_t) 0xBADFACE;
+    ret = mbrtowc (&wc, input + 5, 2, &state);
+    ASSERT (ret == 1);
+    ASSERT (wc == 'e');
+    ASSERT (mbsinit (&state));
+    input[5] = '\0';
+
+    wc = (wchar_t) 0xBADFACE;
+    ret = mbrtowc (&wc, input + 6, 1, &state);
+    ASSERT (ret == 1);
+    ASSERT (wc == 'r');
+    ASSERT (mbsinit (&state));
+
+    /* Test some invalid input.  */
+    memset (&state, '\0', sizeof (mbstate_t));
+    wc = (wchar_t) 0xBADFACE;
+    ret = mbrtowc (&wc, "\377", 1, &state); /* 0xFF */
+    ASSERT (ret == (size_t)-1);
+    ASSERT (errno == EILSEQ);
+
+    memset (&state, '\0', sizeof (mbstate_t));
+    wc = (wchar_t) 0xBADFACE;
+    ret = mbrtowc (&wc, "\303\300", 2, &state); /* 0xC3 0xC0 */
+    ASSERT (ret == (size_t)-1);
+    ASSERT (errno == EILSEQ);
+
+    memset (&state, '\0', sizeof (mbstate_t));
+    wc = (wchar_t) 0xBADFACE;
+    ret = mbrtowc (&wc, "\343\300", 2, &state); /* 0xE3 0xC0 */
+    ASSERT (ret == (size_t)-1);
+    ASSERT (errno == EILSEQ);
+
+    memset (&state, '\0', sizeof (mbstate_t));
+    wc = (wchar_t) 0xBADFACE;
+    ret = mbrtowc (&wc, "\343\300\200", 3, &state); /* 0xE3 0xC0 0x80 */
+    ASSERT (ret == (size_t)-1);
+    ASSERT (errno == EILSEQ);
+
+    memset (&state, '\0', sizeof (mbstate_t));
+    wc = (wchar_t) 0xBADFACE;
+    ret = mbrtowc (&wc, "\343\200\300", 3, &state); /* 0xE3 0x80 0xC0 */
+    ASSERT (ret == (size_t)-1);
+    ASSERT (errno == EILSEQ);
+
+    memset (&state, '\0', sizeof (mbstate_t));
+    wc = (wchar_t) 0xBADFACE;
+    ret = mbrtowc (&wc, "\363\300", 2, &state); /* 0xF3 0xC0 */
+    ASSERT (ret == (size_t)-1);
+    ASSERT (errno == EILSEQ);
+
+    memset (&state, '\0', sizeof (mbstate_t));
+    wc = (wchar_t) 0xBADFACE;
+    ret = mbrtowc (&wc, "\363\300\200\200", 4, &state); /* 0xF3 0xC0 0x80 0x80 */
+    ASSERT (ret == (size_t)-1);
+    ASSERT (errno == EILSEQ);
+
+    memset (&state, '\0', sizeof (mbstate_t));
+    wc = (wchar_t) 0xBADFACE;
+    ret = mbrtowc (&wc, "\363\200\300", 3, &state); /* 0xF3 0x80 0xC0 */
+    ASSERT (ret == (size_t)-1);
+    ASSERT (errno == EILSEQ);
+
+    memset (&state, '\0', sizeof (mbstate_t));
+    wc = (wchar_t) 0xBADFACE;
+    ret = mbrtowc (&wc, "\363\200\300\200", 4, &state); /* 0xF3 0x80 0xC0 0x80 */
+    ASSERT (ret == (size_t)-1);
+    ASSERT (errno == EILSEQ);
+
+    memset (&state, '\0', sizeof (mbstate_t));
+    wc = (wchar_t) 0xBADFACE;
+    ret = mbrtowc (&wc, "\363\200\200\300", 4, &state); /* 0xF3 0x80 0x80 0xC0 */
+    ASSERT (ret == (size_t)-1);
+    ASSERT (errno == EILSEQ);
+  }
+
+  return test_exit_status;
+#else
+  fputs ("Skipping test: not using the UCRT runtime\n", stderr);
+  return 77;
+#endif
+}
diff --git a/tests/test-mbrtowc-w32utf8.sh b/tests/test-mbrtowc-w32utf8.sh
new file mode 100755
index 0000000000..d0a953486c
--- /dev/null
+++ b/tests/test-mbrtowc-w32utf8.sh
@@ -0,0 +1,12 @@
+#!/bin/sh
+
+# Test the UTF-8 environment on native Windows.
+unset LC_ALL
+unset LC_CTYPE
+unset LC_MESSAGES
+unset LC_NUMERIC
+unset LC_COLLATE
+unset LC_MONETARY
+unset LC_TIME
+unset LANG
+${CHECKER} ./test-mbrtowc-w32utf8${EXEEXT}
-- 
2.43.0

supporting in the UTF-8 environment on native Windows

Reply via email to