Ertan Küçükoglu offered to try to review and test this, so here's a rebase.

Some notes:

* it turned out that the Turkish i/I test problem I mentioned earlier
in this thread[1] was just always broken on Windows, we just didn't
ever test with UTF-8 before Meson took over; it's skipped now, see
commit cff4e5a3[2]

* it seems that you can't actually put encodings like .1252 on the end
(.UTF-8 must be a special case); I don't know if we should look into a
better UTF-8 mode for modern Windows, but that'd be a separate project

* this patch only benefits people who run initdb.exe without
explicitly specifying a locale; probably a good number of real systems
in the wild actually use EDB's graphical installer which initialises a
cluster and has its own way of choosing the locale, as discussed in
Ertan's thread[3]

[1] 
https://www.postgresql.org/message-id/flat/CA%2BhUKGJZskvCh%3DQm75UkHrY6c1QZUuC92Po9rponj1BbLmcMEA%40mail.gmail.com#3a00c08214a4285d2f3c4297b0ac2be2
[2] https://github.com/postgres/postgres/commit/cff4e5a3
[3] 
https://www.postgresql.org/message-id/flat/CAH2i4ydECHZPxEBB7gtRG3vROv7a0d3tqAFXzcJWQ9hRsc1znQ%40mail.gmail.com
From fb33b7eb5482bae31b70bb54dbe77325b543a89c Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Mon, 20 Nov 2023 14:24:35 +1300
Subject: [PATCH v5 1/2] MinGW has GetLocaleInfoEx().

To use BCP 47 locale names like "en-US" without a suffix ".encoding", we
need to be able to call GetLocaleInfoEx() to look up the encoding.  That
was previously gated for MSVC only, but MinGW has had the function for
many years.  Remove that gating, because otherwise our MinGW build farm
animals would fail when a later commit switches to using the new names by
default.

There are probably other places where _MSC_VER is being used as a proxy
for detecting MinGW with an out-of-date idea about missing functions.

Discussion: https://postgr.es/m/CA%2BhUKGLsV3vTjPp7bOZBr3JTKp3Brkr9V0Qfmc7UvpWcmAQL4A%40mail.gmail.com
---
 src/port/chklocale.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/src/port/chklocale.c b/src/port/chklocale.c
index 8cb81c8640e..a15b0d5349b 100644
--- a/src/port/chklocale.c
+++ b/src/port/chklocale.c
@@ -204,7 +204,6 @@ win32_langinfo(const char *ctype)
 	char	   *r = NULL;
 	char	   *codepage;
 
-#if defined(_MSC_VER)
 	uint32		cp;
 	WCHAR		wctype[LOCALE_NAME_MAX_LENGTH];
 
@@ -229,7 +228,6 @@ win32_langinfo(const char *ctype)
 		}
 	}
 	else
-#endif
 	{
 		/*
 		 * Locale format on Win32 is <Language>_<Country>.<CodePage>.  For
-- 
2.45.2

From dc726a61aace86bda62687e3aa1411753ba3f1a4 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Tue, 19 Jul 2022 06:31:17 +1200
Subject: [PATCH v5 2/2] Default to IETF BCP 47 locale names in initdb on
 Windows.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Avoid selecting traditional Windows locale names written with English
words, because (1) they are unstable and explicitly not recommended for
use in databases and (2) they may contain non-ASCII characters, which we
can't put in our shared catalogs.  Since setlocale() returns such names,
on Windows use GetUserDefaultLocaleName() if the user didn't provide an
explicit locale.  It returns BCP 47 strings like "en-US".

Also update the documentation to recommend BCP 47 over the traditional
names when providing explicit values to initdb.

Reviewed-by: Juan José Santamaría Flecha <juanjo.santama...@gmail.com>
Reviewed-by:
Discussion: https://postgr.es/m/CA%2BhUKGJ%3DXThErgAQRoqfCy1bKPxXVuF0%3D2zDbB%2BSxDs59pv7Fw%40mail.gmail.com
---
 doc/src/sgml/charset.sgml | 13 +++++++++++--
 src/bin/initdb/initdb.c   | 31 +++++++++++++++++++++++++++++--
 2 files changed, 40 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 834cb30c85a..adb21eb0799 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -83,8 +83,17 @@ initdb --locale=sv_SE
     system under what names depends on what was provided by the operating
     system vendor and what was installed.  On most Unix systems, the command
     <literal>locale -a</literal> will provide a list of available locales.
-    Windows uses more verbose locale names, such as <literal>German_Germany</literal>
-    or <literal>Swedish_Sweden.1252</literal>, but the principles are the same.
+   </para>
+
+   <para>
+    Windows uses BCP 47 language tags, like ICU.
+    For example, <literal>sv-SE</literal> represents Swedish as spoken in Sweden.
+    Windows also supports more verbose locale names based on full names
+    such as <literal>German_Germany</literal> or <literal>Swedish_Sweden.1252</literal>,
+    but these are not recommended because they are not stable across operating
+    system updates due to changes in geographical names, and may contain
+    non-ASCII characters which are not supported in PostgreSQL's shared
+    catalogs.
    </para>
 
    <para>
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index f00718a0150..393232b6cec 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -64,6 +64,10 @@
 #include "sys/mman.h"
 #endif
 
+#ifdef WIN32
+#include <winnls.h>
+#endif
+
 #include "access/xlog_internal.h"
 #include "catalog/pg_authid_d.h"
 #include "catalog/pg_class_d.h" /* pgrminclude ignore */
@@ -2132,6 +2136,7 @@ locale_date_order(const char *locale)
 static void
 check_locale_name(int category, const char *locale, char **canonname)
 {
+	char	   *locale_copy;
 	char	   *save;
 	char	   *res;
 
@@ -2147,10 +2152,30 @@ check_locale_name(int category, const char *locale, char **canonname)
 
 	/* for setlocale() call */
 	if (!locale)
-		locale = "";
+	{
+#ifdef WIN32
+		wchar_t		wide_name[LOCALE_NAME_MAX_LENGTH];
+		char		name[LOCALE_NAME_MAX_LENGTH];
+
+		/* use Windows API to find the default in BCP47 format */
+		if (GetUserDefaultLocaleName(wide_name, LOCALE_NAME_MAX_LENGTH) == 0)
+			pg_fatal("failed to get default locale name: error code %lu",
+					 GetLastError());
+		if (WideCharToMultiByte(CP_ACP, 0, wide_name, -1, name,
+								LOCALE_NAME_MAX_LENGTH, NULL, NULL) == 0)
+			pg_fatal("failed to convert locale name: error code %lu",
+					 GetLastError());
+		locale_copy = pg_strdup(name);
+#else
+		/* use environment to find the default */
+		locale_copy = pg_strdup("");
+#endif
+	}
+	else
+		locale_copy = pg_strdup(locale);
 
 	/* set the locale with setlocale, to see if it accepts it. */
-	res = setlocale(category, locale);
+	res = setlocale(category, locale_copy);
 
 	/* save canonical name if requested. */
 	if (res && canonname)
@@ -2183,6 +2208,8 @@ check_locale_name(int category, const char *locale, char **canonname)
 			pg_fatal("invalid locale settings; check LANG and LC_* environment variables");
 		}
 	}
+
+	free(locale_copy);
 }
 
 /*
-- 
2.45.2

Reply via email to