On Mon, Mar 15, 2021 at 2:25 PM Thomas Munro <thomas.mu...@gmail.com> wrote:
> FYI I have added this as an open item for PostgreSQL 14.  My default
> action will be to document this limitation, if we can't come up with
> something better in time.

Here is a short doc update to explain the situation on Windows and
close that open item.

PS  While trying to find official names to use to refer to the "en-US"
and "English_United States.1252" forms, I came across these sentences
in the Windows documentation[1], which support the idea already
discussed of trying to prevent the latter format from ever entering
our catalogs, in some future release:

"The locale-name form is a short, IETF-standardized string; for
example, en-US for English (United States) or bs-Cyrl-BA for Bosnian
(Cyrillic, Bosnia and Herzegovina).  These forms are preferred. [...]"

"The language[_country-region[.code-page]] form is stored in the
locale setting for a category when a language string, or language
string and country or region string, is used to create the locale.
[...] We do not recommend this form for locale strings embedded in
code or serialized to storage, because these strings are more likely
to be changed by an operating system update than the locale name
form."

[1] 
https://docs.microsoft.com/en-us/cpp/c-runtime-library/locale-names-languages-and-country-region-strings?view=msvc-160
From fd6c376dba21fdb0020d9b9de08fb878bb66f23d Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Fri, 16 Apr 2021 10:21:48 +1200
Subject: [PATCH] Doc: Document known problem with Windows collation versions.

Warn users that locales with traditional Windows NLS names like
"English_United States.1252" won't provide version information, and that
something like initdb --lc-collate=en-US would be needed to fix that
problem for the database default collation.

Discussion: https://postgr.es/m/CA%2BhUKGJ_hk3rU%3D%3Dg2FpAMChb_4i%2BTJacpjjqFsinY-tRM3FBmA%40mail.gmail.com
---
 doc/src/sgml/charset.sgml | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 1b00e543a6..9630b18988 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -985,6 +985,15 @@ CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-tr
      approach is imperfect as maintainers are free to back-port newer
      collation definitions to older C library releases.
     </para>
+    <para>
+     When using Windows collations, version information is only available for
+     collations defined with IETF BCP47 locale names such as
+     <literal>en-US</literal>.  Currently, <command>initdb</command> selects
+     a default locale using a traditional Windows language and country
+     string such as <literal>English_United States.1252</literal>.  The
+     <literal>--lc-collate</literal> option can be used to provide an explicit
+     locale name in IETF-standardized form.
+    </para>
    </note>
   </sect2>
  </sect1>
-- 
2.30.1

Reply via email to