On Thu, 2023-05-18 at 20:11 +0200, Matthias van de Meent wrote:
> As I complain about in [0], since 5cd1a5af --no-locale has been
> broken
> / bahiving outside it's description: Instead of being equivalent to
> `--locale=C` it now also overrides `--locale-provider=libc`,
> resulting
> in undocumented behaviour.

I agree that 5cd1a5af is incomplete.

Posting updated patches. Feedback on the approaches below would be
appreciated.

For context, in version 15:

  $ initdb -D data --locale-provider=icu --icu-locale=en
  => create database clocale template template0 locale='C';
  => select datname, datlocprovider, daticulocale
     from pg_database where datname='clocale';
   datname | datlocprovider | daticulocale 
  ---------+----------------+--------------
   clocale | i              | en
  (1 row)

That behavior is confusing, and when I made ICU the default provider in
v16, the confusion was extended into more cases.

If we leave the CREATE DATABASE (and createdb and initdb) syntax in
place, such that LOCALE (and --locale) do not apply to ICU at all, then
I don't see a path to a good ICU user experience.

Therefore I conclude that we need LOCALE (and --locale) to apply to ICU
somehow. (The LOCALE option already applies to ICU during CREATE
COLLATION, just not CREATE DATABASE or initdb.)

Patch 0003 does this. It's fairly straightforward and I believe we need
this patch.

But to actually fix your complaint we also need --no-locale to be
equivalent to --locale=C and for those options to both use memcmp()
semantics. There are several approaches to accomplish this, and I think
this is the part where I most need some feedback. There are only so
many approaches, and each one has some potential downsides, but I
believe we need to select one:


(1) Give up and leave the existing CREATE DATABASE (and createdb, and
initdb) semantics in place, along with the confusing behavior in v15.

This is a last resort, in my opinion. It gives us no path toward a good
user experience with ICU, and leaves us with all of the problems of the
OS as a collation provider.

(2) Automatically change the provider to libc when locale=C.

Almost works, but it's not clear how we handle the case "provider=icu
lc_collate='fr_FR.utf8' locale=C".

If we change it to "provider=libc lc_collate=C", we've overridden the
specified lc_collate. If we ignore the locale=C, that would be
surprising to users. If we throw an error, that would be a backwards
compatibility issue.

One possible solution would be to change the catalog representation to
allow setting the default collation locale separately from datcollate
even for the libc provider. For instance, rename daticulocale to
datdeflocale, and store the default collation locale there for both
libc and ICU. Then, "provider=icu lc_collate='fr_FR.utf8' locale=C"
could be changed into "provider=libc lc_collate='fr_FR.utf8'
deflocale=C". It may be confusing that datcollate is a different
concept from datdeflocale; but then again they are different concepts
and it's confusing that they are currently combined into one.

(3) Support iculocale=C in the ICU provider using the memcmp() path.

In other words, if provider=icu and iculocale=C, lc_collate_is_c() and
lc_ctpye_is_c() would both return true.

There's a potential problem for users who've misused ICU in the past
(15 or earlier) by using provider=icu and iculocale=C. ICU would accept
such a locale name, but not recognize it and fall back to the root
locale, so it never worked as the user intended it. But if we redefine
C to be memcmp(), then such users will have broken indexes if they
upgrade.

We could add a check at pg_upgrade time for iculocale=C in versions 15
and earlier, and cause the check (and therefore the upgrade) to fail.
That may be reasonable considering that it never really worked in the
past, and perhaps very few users actually ever created such a
collation. But if some user runs into that problem, we'd have to resort
to a hack like telling them to "update pg_collation set iculocale='und'
where iculocale='C'" and then try the upgrade again, which is not a
great answer (as far as I can tell it would be a correct answer and
should not break their indexes, but it feels pretty dangerous).

There may be some other resolutions to this problem, such as catalog
hacks that allow for different representations of iculocale=C pre-16
and post-16. That doesn't sound great though, and we'd have to figure
out what to do with pg_dump.

(4) Create a new "none" provider (which has no locale and always memcmp
semantics), and automatically change the provider to "none" if
provider=icu and iculocale=C.

This solves the problem case in #2 and the potential upgrade problem in
#3. It also makes the documentation a bit more natural, in my opinion,
even if we retain the special case for provider=libc collate=C.


#4 is the approach I chose (patches 0001 and 0002), but I'd like to
hear what others think.


For historical reasons, users may assume that LC_COLLATE controls the
default collation order because that's true in libc. And if their
provider is ICU, they may be surprised that it doesn't. I believe we
could extend each of the above approaches to use LC_COLLATE as the
default for ICU_LOCALE if the former is specified and the latter is
not, and that may make things smoother.


-- 
Jeff Davis
PostgreSQL Contributor Team - AWS


From de37bfb02dcc41c2e932a788ba10a05e5a539870 Mon Sep 17 00:00:00 2001
From: Jeff Davis <j...@j-davis.com>
Date: Mon, 1 May 2023 15:38:29 -0700
Subject: [PATCH v6 1/5] Introduce collation provider "none".

Provides locale-unaware semantics that are implemented as fast byte
operations in Postgres, independent of the operating system or any
provider libraries.

Equivalent (in semantics and implementation) to the libc provider with
locale "C", except that LC_COLLATE and LC_CTYPE can be set
independently.

Use provider "none" for built-in collation "ucs_basic" instead of
libc.
---
 doc/src/sgml/charset.sgml              | 87 +++++++++++++++++++++-----
 doc/src/sgml/ref/create_collation.sgml |  2 +-
 doc/src/sgml/ref/create_database.sgml  |  2 +-
 doc/src/sgml/ref/createdb.sgml         |  2 +-
 doc/src/sgml/ref/initdb.sgml           |  2 +-
 src/backend/catalog/pg_collation.c     |  7 ++-
 src/backend/commands/collationcmds.c   | 84 +++++++++++++++++++++----
 src/backend/commands/dbcommands.c      | 69 +++++++++++++++++---
 src/backend/utils/adt/pg_locale.c      | 27 +++++++-
 src/backend/utils/init/postinit.c      | 10 ++-
 src/bin/initdb/initdb.c                | 33 +++++++++-
 src/bin/initdb/t/001_initdb.pl         | 29 +++++++++
 src/bin/pg_dump/pg_dump.c              |  8 ++-
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 18 +++++-
 src/bin/psql/describe.c                |  2 +-
 src/bin/scripts/createdb.c             |  2 +-
 src/bin/scripts/t/020_createdb.pl      | 29 +++++++++
 src/include/catalog/pg_collation.dat   |  3 +-
 src/include/catalog/pg_collation.h     |  3 +
 src/test/regress/expected/collate.out  | 10 ++-
 src/test/regress/sql/collate.sql       |  6 ++
 21 files changed, 373 insertions(+), 62 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 9db14649aa..7a791a2b7c 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -342,22 +342,14 @@ initdb --locale=sv_SE
    <title>Locale Providers</title>
 
    <para>
-    <productname>PostgreSQL</productname> supports multiple <firstterm>locale
-    providers</firstterm>.  This specifies which library supplies the locale
-    data.  One standard provider name is <literal>libc</literal>, which uses
-    the locales provided by the operating system C library.  These are the
-    locales used by most tools provided by the operating system.  Another
-    provider is <literal>icu</literal>, which uses the external
-    ICU<indexterm><primary>ICU</primary></indexterm> library.  ICU locales can
-    only be used if support for ICU was configured when PostgreSQL was built.
+    A locale provider specifies which library defines the locale behavior for
+    collations and character classifications.
    </para>
 
    <para>
     The commands and tools that select the locale settings, as described
-    above, each have an option to select the locale provider.  The examples
-    shown earlier all use the <literal>libc</literal> provider, which is the
-    default.  Here is an example to initialize a database cluster using the
-    ICU provider:
+    above, each have an option to select the locale provider. Here is an
+    example to initialize a database cluster using the ICU provider:
 <programlisting>
 initdb --locale-provider=icu --icu-locale=en
 </programlisting>
@@ -370,12 +362,73 @@ initdb --locale-provider=icu --icu-locale=en
    </para>
 
    <para>
-    Which locale provider to use depends on individual requirements.  For most
-    basic uses, either provider will give adequate results.  For the libc
-    provider, it depends on what the operating system offers; some operating
-    systems are better than others.  For advanced uses, ICU offers more locale
-    variants and customization options.
+    Regardless of the locale provider, the operating system is still used to
+    provide some locale-aware behavior, such as messages (see <xref
+    linkend="guc-lc-messages"/>).
    </para>
+
+   <para>
+    The available locale providers are listed below.
+   </para>
+
+   <sect3 id="locale-provider-none">
+    <title>None</title>
+    <para>
+     The <literal>none</literal> provider uses simple built-in operations
+     which are not locale-aware.
+    </para>
+    <para>
+     The collation and character classification behavior is equivalent to
+     using the <literal>libc</literal> provider with locale
+     <literal>C</literal>, except that <literal>LC_COLLATE</literal> and
+     <literal>LC_CTYPE</literal> can be set independently.
+    </para>
+    <note>
+     <para>
+      When using the <literal>none</literal> locale provider, behavior may
+      depend on the database encoding.
+     </para>
+    </note>
+   </sect3>
+   <sect3 id="locale-provider-icu">
+    <title>ICU</title>
+    <para>
+     The <literal>icu</literal> provider uses the external
+     ICU<indexterm><primary>ICU</primary></indexterm>
+     library. <productname>PostgreSQL</productname> must have been configured
+     with support.
+    </para>
+    <para>
+     ICU provides collation and character classification behavior that is
+     independent of the operating system and database encoding, which is
+     preferable if you expect to transition to other platforms without any
+     change in results. <literal>LC_COLLATE</literal> and
+     <literal>LC_CTYPE</literal> can be set independently of the ICU locale.
+    </para>
+    <note>
+     <para>
+      For the ICU provider, results may depend on the version of the ICU
+      library used, as it is updated to reflect changes in natural language
+      over time.
+     </para>
+    </note>
+   </sect3>
+   <sect3 id="locale-provider-libc">
+    <title>libc</title>
+    <para>
+     The <literal>libc</literal> provider uses the operating system's C
+     library. The collation and character classification behavior is
+     controlled by the settings <literal>LC_COLLATE</literal> and
+     <literal>LC_CTYPE</literal>, so they cannot be set independently.
+    </para>
+    <note>
+     <para>
+      The same locale name may have different behavior on different platforms
+      when using the libc provider.
+     </para>
+    </note>
+   </sect3>
+
   </sect2>
   <sect2 id="icu-locales">
    <title>ICU Locales</title>
diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index f6353da5c1..5489ae7413 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -120,7 +120,7 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
      <listitem>
       <para>
        Specifies the provider to use for locale services associated with this
-       collation.  Possible values are
+       collation.  Possible values are <literal>none</literal>,
        <literal>icu</literal><indexterm><primary>ICU</primary></indexterm>
        (if the server was built with ICU support) or <literal>libc</literal>.
        <literal>libc</literal> is the default.  See <xref
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 13793bb6b7..60b9da0952 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -212,7 +212,7 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
       <listitem>
        <para>
         Specifies the provider to use for the default collation in this
-        database.  Possible values are
+        database.  Possible values are <literal>none</literal>,
         <literal>icu</literal><indexterm><primary>ICU</primary></indexterm>
         (if the server was built with ICU support) or <literal>libc</literal>.
         By default, the provider is the same as that of the <xref
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index e23419ba6c..326a371d34 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -168,7 +168,7 @@ PostgreSQL documentation
      </varlistentry>
 
      <varlistentry>
-      <term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
+      <term><option>--locale-provider={<literal>none</literal>|<literal>libc</literal>|<literal>icu</literal>}</option></term>
       <listitem>
        <para>
         Specifies the locale provider for the database's default collation.
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 87945b4b62..e604ab48b7 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -323,7 +323,7 @@ PostgreSQL documentation
      </varlistentry>
 
      <varlistentry id="app-initdb-option-locale-provider">
-      <term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
+      <term><option>--locale-provider={<literal>none</literal>|<literal>libc</literal>|<literal>icu</literal>}</option></term>
       <listitem>
        <para>
         This option sets the locale provider for databases created in the new
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index fd022e6fc2..86b6ba2375 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -68,7 +68,12 @@ CollationCreate(const char *collname, Oid collnamespace,
 	Assert(collname);
 	Assert(collnamespace);
 	Assert(collowner);
-	Assert((collcollate && collctype) || colliculocale);
+	Assert((collprovider == COLLPROVIDER_NONE &&
+			!collcollate && !collctype && !colliculocale) ||
+		   (collprovider == COLLPROVIDER_LIBC &&
+			 collcollate &&  collctype && !colliculocale) ||
+		   (collprovider == COLLPROVIDER_ICU &&
+			!collcollate && !collctype &&  colliculocale));
 
 	/*
 	 * Make sure there is no existing collation of same name & encoding.
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index c91fe66d9b..aeaf6c419e 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -215,7 +215,9 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 
 		if (collproviderstr)
 		{
-			if (pg_strcasecmp(collproviderstr, "icu") == 0)
+			if (pg_strcasecmp(collproviderstr, "none") == 0)
+				collprovider = COLLPROVIDER_NONE;
+			else if (pg_strcasecmp(collproviderstr, "icu") == 0)
 				collprovider = COLLPROVIDER_ICU;
 			else if (pg_strcasecmp(collproviderstr, "libc") == 0)
 				collprovider = COLLPROVIDER_LIBC;
@@ -228,6 +230,13 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		else
 			collprovider = COLLPROVIDER_LIBC;
 
+		if (collprovider == COLLPROVIDER_NONE
+			&& (localeEl || lccollateEl || lcctypeEl))
+		{
+			ereport(ERROR,
+					(errmsg("collation provider \"none\" does not support LOCALE, LC_COLLATE, or LC_CTYPE")));
+		}
+
 		if (localeEl)
 		{
 			if (collprovider == COLLPROVIDER_LIBC)
@@ -302,6 +311,16 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 					 errmsg("ICU rules cannot be specified unless locale provider is ICU")));
 
+		if (collprovider == COLLPROVIDER_NONE)
+		{
+			/*
+			 * Behavior may be different in different encodings, so set
+			 * collencoding to the current database encoding. No validation is
+			 * required, because the "none" provider is compatible with any
+			 * encoding.
+			 */
+			collencoding = GetDatabaseEncoding();
+		}
 		if (collprovider == COLLPROVIDER_ICU)
 		{
 #ifdef USE_ICU
@@ -331,7 +350,18 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	}
 
 	if (!collversion)
-		collversion = get_collation_actual_version(collprovider, collprovider == COLLPROVIDER_ICU ? colliculocale : collcollate);
+	{
+		char *locale;
+
+		if (collprovider == COLLPROVIDER_ICU)
+			locale = colliculocale;
+		else if (collprovider == COLLPROVIDER_LIBC)
+			locale = collcollate;
+		else
+			locale = NULL; /* COLLPROVIDER_NONE */
+
+		collversion = get_collation_actual_version(collprovider, locale);
+	}
 
 	newoid = CollationCreate(collName,
 							 collNamespace,
@@ -406,6 +436,7 @@ AlterCollation(AlterCollationStmt *stmt)
 	Form_pg_collation collForm;
 	Datum		datum;
 	bool		isnull;
+	char	   *locale;
 	char	   *oldversion;
 	char	   *newversion;
 	ObjectAddress address;
@@ -430,8 +461,20 @@ AlterCollation(AlterCollationStmt *stmt)
 	datum = SysCacheGetAttr(COLLOID, tup, Anum_pg_collation_collversion, &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(datum);
 
-	datum = SysCacheGetAttrNotNull(COLLOID, tup, collForm->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate);
-	newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
+	if (collForm->collprovider == COLLPROVIDER_ICU)
+	{
+		datum = SysCacheGetAttrNotNull(COLLOID, tup, Anum_pg_collation_colliculocale);
+		locale = TextDatumGetCString(datum);
+	}
+	else if (collForm->collprovider == COLLPROVIDER_LIBC)
+	{
+		datum = SysCacheGetAttrNotNull(COLLOID, tup, Anum_pg_collation_collcollate);
+		locale = TextDatumGetCString(datum);
+	}
+	else
+		locale = NULL; /* COLLPROVIDER_NONE */
+
+	newversion = get_collation_actual_version(collForm->collprovider, locale);
 
 	/* cannot change from NULL to non-NULL or vice versa */
 	if ((!oldversion && newversion) || (oldversion && !newversion))
@@ -494,11 +537,18 @@ pg_collation_actual_version(PG_FUNCTION_ARGS)
 
 		provider = ((Form_pg_database) GETSTRUCT(dbtup))->datlocprovider;
 
-		datum = SysCacheGetAttrNotNull(DATABASEOID, dbtup,
-									   provider == COLLPROVIDER_ICU ?
-									   Anum_pg_database_daticulocale : Anum_pg_database_datcollate);
-
-		locale = TextDatumGetCString(datum);
+		if (provider == COLLPROVIDER_ICU)
+		{
+			datum = SysCacheGetAttrNotNull(DATABASEOID, dbtup, Anum_pg_database_daticulocale);
+			locale = TextDatumGetCString(datum);
+		}
+		else if (provider == COLLPROVIDER_LIBC)
+		{
+			datum = SysCacheGetAttrNotNull(DATABASEOID, dbtup, Anum_pg_database_datcollate);
+			locale = TextDatumGetCString(datum);
+		}
+		else
+			locale = NULL; /* COLLPROVIDER_NONE */
 
 		ReleaseSysCache(dbtup);
 	}
@@ -514,11 +564,19 @@ pg_collation_actual_version(PG_FUNCTION_ARGS)
 
 		provider = ((Form_pg_collation) GETSTRUCT(colltp))->collprovider;
 		Assert(provider != COLLPROVIDER_DEFAULT);
-		datum = SysCacheGetAttrNotNull(COLLOID, colltp,
-									   provider == COLLPROVIDER_ICU ?
-									   Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate);
 
-		locale = TextDatumGetCString(datum);
+		if (provider == COLLPROVIDER_ICU)
+		{
+			datum = SysCacheGetAttrNotNull(COLLOID, colltp, Anum_pg_collation_colliculocale);
+			locale = TextDatumGetCString(datum);
+		}
+		else if (provider == COLLPROVIDER_LIBC)
+		{
+			datum = SysCacheGetAttrNotNull(COLLOID, colltp, Anum_pg_collation_collcollate);
+			locale = TextDatumGetCString(datum);
+		}
+		else
+			locale = NULL; /* COLLPROVIDER_NONE */
 
 		ReleaseSysCache(colltp);
 	}
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 2e242eeff2..9e73f54803 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -909,7 +909,9 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	{
 		char	   *locproviderstr = defGetString(dlocprovider);
 
-		if (pg_strcasecmp(locproviderstr, "icu") == 0)
+		if (pg_strcasecmp(locproviderstr, "none") == 0)
+			dblocprovider = COLLPROVIDER_NONE;
+		else if (pg_strcasecmp(locproviderstr, "icu") == 0)
 			dblocprovider = COLLPROVIDER_ICU;
 		else if (pg_strcasecmp(locproviderstr, "libc") == 0)
 			dblocprovider = COLLPROVIDER_LIBC;
@@ -1177,9 +1179,17 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	 */
 	if (src_collversion && !dcollversion)
 	{
-		char	   *actual_versionstr;
+		char	*actual_versionstr;
+		char	*locale;
 
-		actual_versionstr = get_collation_actual_version(dblocprovider, dblocprovider == COLLPROVIDER_ICU ? dbiculocale : dbcollate);
+		if (dblocprovider == COLLPROVIDER_ICU)
+			locale = dbiculocale;
+		else if (dblocprovider == COLLPROVIDER_LIBC)
+			locale = dbcollate;
+		else
+			locale = NULL; /* COLLPROVIDER_NONE */
+
+		actual_versionstr = get_collation_actual_version(dblocprovider, locale);
 		if (!actual_versionstr)
 			ereport(ERROR,
 					(errmsg("template database \"%s\" has a collation version, but no actual collation version could be determined",
@@ -1207,7 +1217,18 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	 * collation version, which is normally only the case for template0.
 	 */
 	if (dbcollversion == NULL)
-		dbcollversion = get_collation_actual_version(dblocprovider, dblocprovider == COLLPROVIDER_ICU ? dbiculocale : dbcollate);
+	{
+		char *locale;
+
+		if (dblocprovider == COLLPROVIDER_ICU)
+			locale = dbiculocale;
+		else if (dblocprovider == COLLPROVIDER_LIBC)
+			locale = dbcollate;
+		else
+			locale = NULL; /* COLLPROVIDER_NONE */
+
+		dbcollversion = get_collation_actual_version(dblocprovider, locale);
+	}
 
 	/* Resolve default tablespace for new database */
 	if (dtablespacename && dtablespacename->arg)
@@ -2403,6 +2424,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 	ObjectAddress address;
 	Datum		datum;
 	bool		isnull;
+	char	   *locale;
 	char	   *oldversion;
 	char	   *newversion;
 
@@ -2429,10 +2451,24 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 	datum = heap_getattr(tuple, Anum_pg_database_datcollversion, RelationGetDescr(rel), &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(datum);
 
-	datum = heap_getattr(tuple, datForm->datlocprovider == COLLPROVIDER_ICU ? Anum_pg_database_daticulocale : Anum_pg_database_datcollate, RelationGetDescr(rel), &isnull);
-	if (isnull)
-		elog(ERROR, "unexpected null in pg_database");
-	newversion = get_collation_actual_version(datForm->datlocprovider, TextDatumGetCString(datum));
+	if (datForm->datlocprovider == COLLPROVIDER_ICU)
+	{
+		datum = heap_getattr(tuple, Anum_pg_database_daticulocale, RelationGetDescr(rel), &isnull);
+		if (isnull)
+			elog(ERROR, "unexpected null in pg_database");
+		locale = TextDatumGetCString(datum);
+	}
+	else if (datForm->datlocprovider == COLLPROVIDER_LIBC)
+	{
+		datum = heap_getattr(tuple, Anum_pg_database_datcollate, RelationGetDescr(rel), &isnull);
+		if (isnull)
+			elog(ERROR, "unexpected null in pg_database");
+		locale = TextDatumGetCString(datum);
+	}
+	else
+		locale = NULL; /* COLLPROVIDER_NONE */
+
+	newversion = get_collation_actual_version(datForm->datlocprovider, locale);
 
 	/* cannot change from NULL to non-NULL or vice versa */
 	if ((!oldversion && newversion) || (oldversion && !newversion))
@@ -2617,6 +2653,7 @@ pg_database_collation_actual_version(PG_FUNCTION_ARGS)
 	HeapTuple	tp;
 	char		datlocprovider;
 	Datum		datum;
+	char	   *locale;
 	char	   *version;
 
 	tp = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(dbid));
@@ -2627,8 +2664,20 @@ pg_database_collation_actual_version(PG_FUNCTION_ARGS)
 
 	datlocprovider = ((Form_pg_database) GETSTRUCT(tp))->datlocprovider;
 
-	datum = SysCacheGetAttrNotNull(DATABASEOID, tp, datlocprovider == COLLPROVIDER_ICU ? Anum_pg_database_daticulocale : Anum_pg_database_datcollate);
-	version = get_collation_actual_version(datlocprovider, TextDatumGetCString(datum));
+	if (datlocprovider == COLLPROVIDER_ICU)
+	{
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp, Anum_pg_database_daticulocale);
+		locale = TextDatumGetCString(datum);
+	}
+	else if (datlocprovider == COLLPROVIDER_LIBC)
+	{
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp, Anum_pg_database_datcollate);
+		locale = TextDatumGetCString(datum);
+	}
+	else
+		locale = NULL; /* COLLPROVIDER_NONE */
+
+	version = get_collation_actual_version(datlocprovider, locale);
 
 	ReleaseSysCache(tp);
 
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index eea1d1ae0f..95eb5cf464 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1228,7 +1228,12 @@ lookup_collation_cache(Oid collation, bool set_flags)
 			elog(ERROR, "cache lookup failed for collation %u", collation);
 		collform = (Form_pg_collation) GETSTRUCT(tp);
 
-		if (collform->collprovider == COLLPROVIDER_LIBC)
+		if (collform->collprovider == COLLPROVIDER_NONE)
+		{
+			cache_entry->collate_is_c = true;
+			cache_entry->ctype_is_c = true;
+		}
+		else if (collform->collprovider == COLLPROVIDER_LIBC)
 		{
 			Datum		datum;
 			const char *collcollate;
@@ -1281,6 +1286,9 @@ lc_collate_is_c(Oid collation)
 		static int	result = -1;
 		char	   *localeptr;
 
+		if (default_locale.provider == COLLPROVIDER_NONE)
+			return true;
+
 		if (default_locale.provider == COLLPROVIDER_ICU)
 			return false;
 
@@ -1334,6 +1342,9 @@ lc_ctype_is_c(Oid collation)
 		static int	result = -1;
 		char	   *localeptr;
 
+		if (default_locale.provider == COLLPROVIDER_NONE)
+			return true;
+
 		if (default_locale.provider == COLLPROVIDER_ICU)
 			return false;
 
@@ -1487,8 +1498,10 @@ pg_newlocale_from_collation(Oid collid)
 	{
 		if (default_locale.provider == COLLPROVIDER_ICU)
 			return &default_locale;
-		else
+		else if (default_locale.provider == COLLPROVIDER_LIBC)
 			return (pg_locale_t) 0;
+		else
+			elog(ERROR, "cannot open collation with provider \"none\"");
 	}
 
 	cache_entry = lookup_collation_cache(collid, false);
@@ -1513,7 +1526,11 @@ pg_newlocale_from_collation(Oid collid)
 		result.provider = collform->collprovider;
 		result.deterministic = collform->collisdeterministic;
 
-		if (collform->collprovider == COLLPROVIDER_LIBC)
+		if (collform->collprovider == COLLPROVIDER_NONE)
+		{
+			elog(ERROR, "cannot open collation with provider \"none\"");
+		}
+		else if (collform->collprovider == COLLPROVIDER_LIBC)
 		{
 #ifdef HAVE_LOCALE_T
 			const char *collcollate;
@@ -1599,6 +1616,7 @@ pg_newlocale_from_collation(Oid collid)
 
 			collversionstr = TextDatumGetCString(datum);
 
+			Assert(collform->collprovider != COLLPROVIDER_NONE);
 			datum = SysCacheGetAttrNotNull(COLLOID, tp, collform->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate);
 
 			actual_versionstr = get_collation_actual_version(collform->collprovider,
@@ -1650,6 +1668,9 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 {
 	char	   *collversion = NULL;
 
+	if (collprovider == COLLPROVIDER_NONE)
+		return NULL;
+
 #ifdef USE_ICU
 	if (collprovider == COLLPROVIDER_ICU)
 	{
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 53420f4974..8053642fd3 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -461,10 +461,18 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 	{
 		char	   *actual_versionstr;
 		char	   *collversionstr;
+		char	   *locale;
 
 		collversionstr = TextDatumGetCString(datum);
 
-		actual_versionstr = get_collation_actual_version(dbform->datlocprovider, dbform->datlocprovider == COLLPROVIDER_ICU ? iculocale : collate);
+		if (dbform->datlocprovider == COLLPROVIDER_ICU)
+			locale = iculocale;
+		else if (dbform->datlocprovider == COLLPROVIDER_LIBC)
+			locale = collate;
+		else
+			locale = NULL; /* COLLPROVIDER_NONE */
+
+		actual_versionstr = get_collation_actual_version(dbform->datlocprovider, locale);
 		if (!actual_versionstr)
 			/* should not happen */
 			elog(WARNING,
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 30b576932f..4a6cad3cb9 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2408,6 +2408,22 @@ setlocales(void)
 
 	/* set empty lc_* values to locale config if set */
 
+	if (locale_provider == COLLPROVIDER_NONE)
+	{
+		if (!lc_ctype)
+			lc_ctype = "C";
+		if (!lc_collate)
+			lc_collate = "C";
+		if (!lc_numeric)
+			lc_numeric = "C";
+		if (!lc_time)
+			lc_time = "C";
+		if (!lc_monetary)
+			lc_monetary = "C";
+		if (!lc_messages)
+			lc_messages = "C";
+	}
+
 	if (locale)
 	{
 		if (!lc_ctype)
@@ -2502,7 +2518,7 @@ usage(const char *progname)
 			 "                            set default locale in the respective category for\n"
 			 "                            new databases (default taken from environment)\n"));
 	printf(_("      --no-locale           equivalent to --locale=C\n"));
-	printf(_("      --locale-provider={libc|icu}\n"
+	printf(_("      --locale-provider={none|libc|icu}\n"
 			 "                            set default locale provider for new databases\n"));
 	printf(_("      --pwfile=FILE         read password for the new superuser from file\n"));
 	printf(_("  -T, --text-search-config=CFG\n"
@@ -2652,7 +2668,15 @@ setup_locale_encoding(void)
 {
 	setlocales();
 
-	if (locale_provider == COLLPROVIDER_LIBC &&
+	if (locale_provider == COLLPROVIDER_NONE &&
+		strcmp(lc_ctype, "C") == 0 &&
+		strcmp(lc_collate, "C") == 0 &&
+		strcmp(lc_time, "C") == 0 &&
+		strcmp(lc_numeric, "C") == 0 &&
+		strcmp(lc_monetary, "C") == 0 &&
+		strcmp(lc_messages, "C") == 0)
+		printf(_("The database cluster will be initialized with no locale.\n"));
+	else if (locale_provider == COLLPROVIDER_LIBC &&
 		strcmp(lc_ctype, lc_collate) == 0 &&
 		strcmp(lc_ctype, lc_time) == 0 &&
 		strcmp(lc_ctype, lc_numeric) == 0 &&
@@ -3326,7 +3350,9 @@ main(int argc, char *argv[])
 										 "-c debug_discard_caches=1");
 				break;
 			case 15:
-				if (strcmp(optarg, "icu") == 0)
+				if (strcmp(optarg, "none") == 0)
+					locale_provider = COLLPROVIDER_NONE;
+				else if (strcmp(optarg, "icu") == 0)
 					locale_provider = COLLPROVIDER_ICU;
 				else if (strcmp(optarg, "libc") == 0)
 					locale_provider = COLLPROVIDER_LIBC;
@@ -3365,6 +3391,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+
 	if (icu_locale && locale_provider != COLLPROVIDER_ICU)
 		pg_fatal("%s cannot be specified unless locale provider \"%s\" is chosen",
 				 "--icu-locale", "icu");
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 17a444d80c..fe6d224e5b 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -154,6 +154,35 @@ else
 		'locale provider ICU fails since no ICU support');
 }
 
+command_ok(
+	[ 'initdb', '--no-sync', '--locale-provider=none', "$tempdir/data6" ],
+	'locale provider none');
+
+command_ok(
+	[ 'initdb', '--no-sync', '--locale-provider=none', '--locale=C',
+	  "$tempdir/data7" ],
+	'locale provider none with --locale');
+
+command_ok(
+	[ 'initdb', '--no-sync', '--locale-provider=none', '--lc-collate=C',
+	  "$tempdir/data8" ],
+	'locale provider none with --lc-collate');
+
+command_ok(
+	[ 'initdb', '--no-sync', '--locale-provider=none', '--lc-ctype=C',
+	  "$tempdir/data9" ],
+	'locale provider none with --lc-ctype');
+
+command_fails(
+	[ 'initdb', '--no-sync', '--locale-provider=none', '--icu-locale=en',
+	  "$tempdir/dataX" ],
+	'fails for locale provider none with ICU locale');
+
+command_fails(
+	[ 'initdb', '--no-sync', '--locale-provider=none', '--icu-rules=""',
+	  "$tempdir/dataX" ],
+	'fails for locale provider none with ICU rules');
+
 command_fails(
 	[ 'initdb', '--no-sync', '--locale-provider=xyz', "$tempdir/dataX" ],
 	'fails for invalid locale provider');
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index f9cbeb65ab..ddc8a5f71f 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -3070,7 +3070,9 @@ dumpDatabase(Archive *fout)
 	}
 
 	appendPQExpBufferStr(creaQry, " LOCALE_PROVIDER = ");
-	if (datlocprovider[0] == 'c')
+	if (datlocprovider[0] == 'n')
+		appendPQExpBufferStr(creaQry, "none");
+	else if (datlocprovider[0] == 'c')
 		appendPQExpBufferStr(creaQry, "libc");
 	else if (datlocprovider[0] == 'i')
 		appendPQExpBufferStr(creaQry, "icu");
@@ -13429,7 +13431,9 @@ dumpCollation(Archive *fout, const CollInfo *collinfo)
 					  fmtQualifiedDumpable(collinfo));
 
 	appendPQExpBufferStr(q, "provider = ");
-	if (collprovider[0] == 'c')
+	if (collprovider[0] == 'n')
+		appendPQExpBufferStr(q, "none");
+	else if (collprovider[0] == 'c')
 		appendPQExpBufferStr(q, "libc");
 	else if (collprovider[0] == 'i')
 		appendPQExpBufferStr(q, "icu");
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 4a7895a756..6d58f6103e 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -114,12 +114,20 @@ my $original_locale = "C";
 my $original_iculocale = "";
 my $provider_field = "'c' AS datlocprovider";
 my $iculocale_field = "NULL AS daticulocale";
-if ($oldnode->pg_version >= 15 && $ENV{with_icu} eq 'yes')
+if ($oldnode->pg_version >= 15)
 {
 	$provider_field = "datlocprovider";
 	$iculocale_field = "daticulocale";
-	$original_provider = "i";
-	$original_iculocale = "fr-CA";
+
+	if ($ENV{with_icu} eq 'yes')
+	{
+		$original_provider = "i";
+		$original_iculocale = "fr-CA";
+	}
+	else
+	{
+		$original_provider = "n";
+	}
 }
 
 my @initdb_params = @custom_opts;
@@ -131,6 +139,10 @@ if ($original_provider eq "i")
 	push @initdb_params, ('--locale-provider', 'icu');
 	push @initdb_params, ('--icu-locale', 'fr-CA');
 }
+elsif ($original_provider eq "n")
+{
+	push @initdb_params, ('--locale-provider', 'none');
+}
 
 $node_params{extra} = \@initdb_params;
 $oldnode->init(%node_params);
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index ab4279ed58..c842a62ae9 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -932,7 +932,7 @@ listAllDbs(const char *pattern, bool verbose)
 					  gettext_noop("Encoding"));
 	if (pset.sversion >= 150000)
 		appendPQExpBuffer(&buf,
-						  "  CASE d.datlocprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
+						  "  CASE d.datlocprovider WHEN 'n' THEN 'none' WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
 						  gettext_noop("Locale Provider"));
 	else
 		appendPQExpBuffer(&buf,
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index b4205c4fa5..79367d933b 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -299,7 +299,7 @@ help(const char *progname)
 	printf(_("      --lc-ctype=LOCALE        LC_CTYPE setting for the database\n"));
 	printf(_("      --icu-locale=LOCALE      ICU locale setting for the database\n"));
 	printf(_("      --icu-rules=RULES        ICU rules setting for the database\n"));
-	printf(_("      --locale-provider={libc|icu}\n"
+	printf(_("      --locale-provider={none|libc|icu}\n"
 			 "                               locale provider for the database's default collation\n"));
 	printf(_("  -O, --owner=OWNER            database user to own the new database\n"));
 	printf(_("  -S, --strategy=STRATEGY      database creation strategy wal_log or file_copy\n"));
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index af3b1492e3..5aa658b671 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -83,6 +83,35 @@ else
 		'create database with ICU fails since no ICU support');
 }
 
+$node->command_ok(
+	[ 'createdb', '-T', 'template0', '--locale-provider=none', 'testnone1' ],
+	'create database with provider "none"');
+
+$node->command_ok(
+	[ 'createdb', '-T', 'template0', '--locale-provider=none', '--locale=C',
+	  'testnone2' ],
+	'create database with provider "none" and locale "C"');
+
+$node->command_ok(
+	[ 'createdb', '-T', 'template0', '--locale-provider=none', '--lc-collate=C',
+	  'testnone3' ],
+	'create database with provider "none" and LC_COLLATE=C');
+
+$node->command_ok(
+	[ 'createdb', '-T', 'template0', '--locale-provider=none', '--lc-ctype=C',
+	  'testnone4' ],
+	'create database with provider "none" and LC_CTYPE=C');
+
+$node->command_fails(
+	[ 'createdb', '-T', 'template0', '--locale-provider=none', '--icu-locale=en',
+	  'testnone5' ],
+	'create database with provider "none" and ICU_LOCALE="en"');
+
+$node->command_fails(
+	[ 'createdb', '-T', 'template0', '--locale-provider=none', '--icu-rules=""',
+	  'testnone6' ],
+	'create database with provider "none" and ICU_RULES=""');
+
 $node->command_fails([ 'createdb', 'foobar1' ],
 	'fails if database already exists');
 
diff --git a/src/include/catalog/pg_collation.dat b/src/include/catalog/pg_collation.dat
index b6a69d1d42..40d62416ea 100644
--- a/src/include/catalog/pg_collation.dat
+++ b/src/include/catalog/pg_collation.dat
@@ -24,8 +24,7 @@
   collname => 'POSIX', collprovider => 'c', collencoding => '-1',
   collcollate => 'POSIX', collctype => 'POSIX' },
 { oid => '962', descr => 'sorts by Unicode code point',
-  collname => 'ucs_basic', collprovider => 'c', collencoding => '6',
-  collcollate => 'C', collctype => 'C' },
+  collname => 'ucs_basic', collprovider => 'n', collencoding => '6' },
 { oid => '963',
   descr => 'sorts using the Unicode Collation Algorithm with default settings',
   collname => 'unicode', collprovider => 'i', collencoding => '-1',
diff --git a/src/include/catalog/pg_collation.h b/src/include/catalog/pg_collation.h
index bfa3568451..29be3f8d94 100644
--- a/src/include/catalog/pg_collation.h
+++ b/src/include/catalog/pg_collation.h
@@ -64,6 +64,7 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_collation_oid_index, 3085, CollationOidIndexId, on
 
 #ifdef EXPOSE_TO_CLIENT_CODE
 
+#define COLLPROVIDER_NONE		'n'
 #define COLLPROVIDER_DEFAULT	'd'
 #define COLLPROVIDER_ICU		'i'
 #define COLLPROVIDER_LIBC		'c'
@@ -73,6 +74,8 @@ collprovider_name(char c)
 {
 	switch (c)
 	{
+		case COLLPROVIDER_NONE:
+			return "none";
 		case COLLPROVIDER_ICU:
 			return "icu";
 		case COLLPROVIDER_LIBC:
diff --git a/src/test/regress/expected/collate.out b/src/test/regress/expected/collate.out
index 0649564485..b7603c9f6c 100644
--- a/src/test/regress/expected/collate.out
+++ b/src/test/regress/expected/collate.out
@@ -650,6 +650,13 @@ EXPLAIN (COSTS OFF)
 (3 rows)
 
 -- CREATE/DROP COLLATION
+CREATE COLLATION none ( PROVIDER = none );
+CREATE COLLATION none2 ( PROVIDER = none, LOCALE="POSIX" ); -- fails
+ERROR:  collation provider "none" does not support LOCALE, LC_COLLATE, or LC_CTYPE
+CREATE COLLATION none2 ( PROVIDER = none, LC_CTYPE="POSIX" ); -- fails
+ERROR:  collation provider "none" does not support LOCALE, LC_COLLATE, or LC_CTYPE
+CREATE COLLATION none2 ( PROVIDER = none, LC_COLLATE="POSIX" ); -- fails
+ERROR:  collation provider "none" does not support LOCALE, LC_COLLATE, or LC_CTYPE
 CREATE COLLATION mycoll1 FROM "C";
 CREATE COLLATION mycoll2 ( LC_COLLATE = "POSIX", LC_CTYPE = "POSIX" );
 CREATE COLLATION mycoll3 FROM "default";  -- intentionally unsupported
@@ -754,7 +761,7 @@ DETAIL:  FROM cannot be specified together with any other options.
 -- must get rid of them.
 --
 DROP SCHEMA collate_tests CASCADE;
-NOTICE:  drop cascades to 19 other objects
+NOTICE:  drop cascades to 20 other objects
 DETAIL:  drop cascades to table collate_test1
 drop cascades to table collate_test_like
 drop cascades to table collate_test2
@@ -771,6 +778,7 @@ drop cascades to function dup(anyelement)
 drop cascades to table collate_test20
 drop cascades to table collate_test21
 drop cascades to table collate_test22
+drop cascades to collation "none"
 drop cascades to collation mycoll2
 drop cascades to table collate_test23
 drop cascades to view collate_on_int
diff --git a/src/test/regress/sql/collate.sql b/src/test/regress/sql/collate.sql
index c3d40fc195..e2dceb8dff 100644
--- a/src/test/regress/sql/collate.sql
+++ b/src/test/regress/sql/collate.sql
@@ -244,6 +244,12 @@ EXPLAIN (COSTS OFF)
 
 -- CREATE/DROP COLLATION
 
+CREATE COLLATION none ( PROVIDER = none );
+
+CREATE COLLATION none2 ( PROVIDER = none, LOCALE="POSIX" ); -- fails
+CREATE COLLATION none2 ( PROVIDER = none, LC_CTYPE="POSIX" ); -- fails
+CREATE COLLATION none2 ( PROVIDER = none, LC_COLLATE="POSIX" ); -- fails
+
 CREATE COLLATION mycoll1 FROM "C";
 CREATE COLLATION mycoll2 ( LC_COLLATE = "POSIX", LC_CTYPE = "POSIX" );
 CREATE COLLATION mycoll3 FROM "default";  -- intentionally unsupported
-- 
2.34.1

From dc7200153a9ac65c2518b32b789d1a9dc4454850 Mon Sep 17 00:00:00 2001
From: Jeff Davis <j...@j-davis.com>
Date: Mon, 8 May 2023 13:48:01 -0700
Subject: [PATCH v6 2/5] ICU: for locale "C", automatically use "none" provider
 instead.

Postgres expects locale C to be optimizable to simple locale-unaware
byte operations; while ICU does not recognize the locale "C" at all,
and falls back to the root locale.

If the user specifies locale "C" when creating a new collation or a
new database with the ICU provider, automatically switch it to the
"none" provider.

If provider is libc, behavior is unchanged.
---
 doc/src/sgml/charset.sgml                     |  6 +++
 doc/src/sgml/ref/create_collation.sgml        |  6 +++
 doc/src/sgml/ref/create_database.sgml         |  5 +++
 doc/src/sgml/ref/createdb.sgml                |  5 +++
 doc/src/sgml/ref/initdb.sgml                  |  5 +++
 src/backend/commands/collationcmds.c          | 17 ++++++++
 src/backend/commands/dbcommands.c             | 21 ++++++++++
 src/bin/initdb/initdb.c                       | 10 +++++
 src/bin/initdb/t/001_initdb.pl                | 39 +++++++++++++++++++
 src/bin/scripts/createdb.c                    | 11 ++++++
 src/bin/scripts/t/020_createdb.pl             | 12 ++++++
 .../regress/expected/collate.icu.utf8.out     | 14 +++++--
 src/test/regress/sql/collate.icu.utf8.sql     |  6 +++
 13 files changed, 154 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 7a791a2b7c..68bad646e9 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -405,6 +405,12 @@ initdb --locale-provider=icu --icu-locale=en
      change in results. <literal>LC_COLLATE</literal> and
      <literal>LC_CTYPE</literal> can be set independently of the ICU locale.
     </para>
+    <para>
+     The ICU provider does not accept the <literal>C</literal>
+     locale. Commands that create collations or database with the
+     <literal>icu</literal> provider and ICU locale <literal>C</literal> use
+     the provider <literal>none</literal> instead.
+    </para>
     <note>
      <para>
       For the ICU provider, results may depend on the version of the ICU
diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index 5489ae7413..1ac41831d8 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -126,6 +126,12 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
        <literal>libc</literal> is the default.  See <xref
        linkend="locale-providers"/> for details.
       </para>
+      <para>
+       If the provider is <literal>icu</literal> and the locale is
+       <literal>C</literal> or <literal>POSIX</literal>, the provider is
+       automatically set to <literal>none</literal>; as the ICU provider
+       doesn't support an ICU locale of <literal>C</literal>.
+      </para>
      </listitem>
     </varlistentry>
 
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 60b9da0952..c730d02e15 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -190,6 +190,11 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
        <para>
         Specifies the ICU locale ID if the ICU locale provider is used.
        </para>
+       <para>
+        If specified as <literal>C</literal> or <literal>POSIX</literal>, the
+        provider is automatically set to <literal>none</literal>, as the ICU
+        provider doesn't support an ICU locale of <literal>C</literal>.
+       </para>
       </listitem>
      </varlistentry>
 
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index 326a371d34..7c573e848a 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -154,6 +154,11 @@ PostgreSQL documentation
         Specifies the ICU locale ID to be used in this database, if the
         ICU locale provider is selected.
        </para>
+       <para>
+        If specified as <literal>C</literal> or <literal>POSIX</literal>, the
+        provider is automatically set to <literal>none</literal>, as the ICU
+        provider doesn't support an ICU locale of <literal>C</literal>.
+       </para>
       </listitem>
      </varlistentry>
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index e604ab48b7..76993acdfe 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -250,6 +250,11 @@ PostgreSQL documentation
         Specifies the ICU locale when the ICU provider is used. Locale support
         is described in <xref linkend="locale"/>.
        </para>
+       <para>
+        If specified as <literal>C</literal> or <literal>POSIX</literal>, the
+        provider is automatically set to <literal>none</literal>, as the ICU
+        provider doesn't support an ICU locale of <literal>C</literal>.
+       </para>
        <para>
         If this option is not specified, the locale is inherited from the
         environment in which <command>initdb</command> runs. The environment's
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index aeaf6c419e..8bc6f8347d 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -254,6 +254,23 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		if (lcctypeEl)
 			collctype = defGetString(lcctypeEl);
 
+		/*
+		 * Postgres defines the "C" (and equivalently, "POSIX") locales to be
+		 * optimizable to byte operations (memcmp(), pg_ascii_tolower(),
+		 * etc.); transform into the "none" provider. Don't transform during
+		 * binary upgrade.
+		 */
+		if (!IsBinaryUpgrade && collprovider == COLLPROVIDER_ICU &&
+			colliculocale && (pg_strcasecmp(colliculocale, "C") == 0 ||
+							  pg_strcasecmp(colliculocale, "POSIX") == 0))
+		{
+			ereport(NOTICE,
+					(errmsg("using locale provider \"none\" for ICU locale \"%s\"",
+							colliculocale)));
+			colliculocale = NULL;
+			collprovider = COLLPROVIDER_NONE;
+		}
+
 		if (collprovider == COLLPROVIDER_LIBC)
 		{
 			if (!collcollate)
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 9e73f54803..6dc737aebb 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1043,6 +1043,27 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 
 	check_encoding_locale_matches(encoding, dbcollate, dbctype);
 
+	/*
+	 * Postgres defines the "C" (and equivalently, "POSIX") locales to be
+	 * optimizable to byte operations (memcmp(), pg_ascii_tolower(), etc.);
+	 * transform into the "none" provider.
+	 *
+	 * Don't transform during binary upgrade or when both the provider and ICU
+	 * locale are unchanged from the template.
+	 */
+	if (!IsBinaryUpgrade && dblocprovider == COLLPROVIDER_ICU &&
+		(src_locprovider != COLLPROVIDER_ICU ||
+		 strcmp(dbiculocale, src_iculocale) != 0) &&
+		dbiculocale && (pg_strcasecmp(dbiculocale, "C") == 0 ||
+						pg_strcasecmp(dbiculocale, "POSIX") == 0))
+	{
+		ereport(NOTICE,
+				(errmsg("using locale provider \"none\" for ICU locale \"%s\"",
+						dbiculocale)));
+		dbiculocale = NULL;
+		dblocprovider = COLLPROVIDER_NONE;
+	}
+
 	if (dblocprovider == COLLPROVIDER_ICU)
 	{
 		if (!(is_encoding_supported_by_icu(encoding)))
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 4a6cad3cb9..e5ec2a243e 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2440,6 +2440,16 @@ setlocales(void)
 			lc_messages = locale;
 	}
 
+	if (icu_locale && locale_provider == COLLPROVIDER_ICU &&
+		(pg_strcasecmp(icu_locale, "C") == 0 ||
+		 pg_strcasecmp(icu_locale, "POSIX") == 0))
+	{
+		pg_log_info("using locale provider \"none\" for ICU locale \"%s\"",
+					 icu_locale);
+		icu_locale = NULL;
+		locale_provider = COLLPROVIDER_NONE;
+	}
+
 	/*
 	 * canonicalize locale names, and obtain any missing values from our
 	 * current environment
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index fe6d224e5b..ea92b08511 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -111,6 +111,45 @@ if ($ENV{with_icu} eq 'yes')
 		],
 		'option --icu-locale');
 
+	# transformed to provider=none
+	command_ok(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--icu-locale=C',
+			"$tempdir/data4a"
+		],
+		'option --icu-locale=C');
+
+	# transformed to provider=none
+	command_ok(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--icu-locale=C',
+			'--locale=C',
+			"$tempdir/data4b"
+		],
+		'option --icu-locale=C --locale=C');
+
+	# transformed to provider=none
+	command_ok(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--icu-locale=C',
+			'--lc-collate=C',
+			"$tempdir/data4c"
+		],
+		'option --icu-locale=C --lc-collate=C');
+
+	# transformed to provider=none
+	command_ok(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--icu-locale=C',
+			'--lc-ctype=C',
+			"$tempdir/data4d"
+		],
+		'option --icu-locale=C --lc-ctype=C');
+
 	command_fails_like(
 		[
 			'initdb',                '--no-sync',
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index 79367d933b..9caf9190cf 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -172,6 +172,17 @@ main(int argc, char *argv[])
 			lc_collate = locale;
 	}
 
+	if (locale_provider && pg_strcasecmp(locale_provider, "icu") == 0 &&
+		icu_locale &&
+		(pg_strcasecmp(icu_locale, "C") == 0 ||
+		 pg_strcasecmp(icu_locale, "POSIX") == 0))
+	{
+		pg_log_info("using locale provider \"none\" for ICU locale \"%s\"",
+					 icu_locale);
+		icu_locale = NULL;
+		locale_provider = "none";
+	}
+
 	if (encoding)
 	{
 		if (pg_char_to_encoding(encoding) < 0)
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index 5aa658b671..eb3682f0fd 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -75,6 +75,18 @@ if ($ENV{with_icu} eq 'yes')
 	$node2->command_ok(
 		[ 'createdb', '-T', 'template0', '--icu-locale', 'en-US', 'foobar56' ],
 		'create database with icu locale from template database with icu provider');
+
+	# transformed into provider "none"
+	$node->command_ok(
+		[ 'createdb', '-T', 'template0', '--locale-provider=icu', '--icu-locale=C',
+		  'test_none_icu1' ],
+		'create database with provider "icu" and ICU_LOCALE="C"');
+
+	# transformed into provider "none"
+	$node->command_ok(
+		[ 'createdb', '-T', 'template0', '--locale-provider=icu', '--icu-locale=C',
+		  '--lc-ctype=C', 'test_none_icu_2' ],
+		'create database with provider "icu" and ICU_LOCALE="C" and LC_CTYPE=C');
 }
 else
 {
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index c658ee1404..7c186e9f69 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1035,6 +1035,9 @@ BEGIN
 END
 $$;
 RESET client_min_messages;
+-- uses "none" provider instead
+CREATE COLLATION testc (provider = icu, locale='C');
+NOTICE:  using locale provider "none" for ICU locale "C"
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 ERROR:  parameter "locale" must be specified
 SET icu_validation_level = ERROR;
@@ -1058,8 +1061,11 @@ SELECT collname FROM pg_collation WHERE collname LIKE 'test%' ORDER BY 1;
  test0
  test1
  test5
-(3 rows)
+ testc
+(4 rows)
 
+DROP COLLATION test1;
+CREATE COLLATION test1 (provider = icu, locale = 'und');
 ALTER COLLATION test1 RENAME TO test11;
 ALTER COLLATION test0 RENAME TO test11; -- fail
 ERROR:  collation "test11" already exists in schema "collate_tests"
@@ -1079,7 +1085,8 @@ SELECT collname, nspname, obj_description(pg_collation.oid, 'pg_collation')
  test0    | collate_tests | US English
  test11   | test_schema   | 
  test5    | collate_tests | 
-(3 rows)
+ testc    | collate_tests | 
+(4 rows)
 
 DROP COLLATION test0, test_schema.test11, test5;
 DROP COLLATION test0; -- fail
@@ -1089,7 +1096,8 @@ NOTICE:  collation "test0" does not exist, skipping
 SELECT collname FROM pg_collation WHERE collname LIKE 'test%';
  collname 
 ----------
-(0 rows)
+ testc
+(1 row)
 
 DROP SCHEMA test_schema;
 DROP ROLE regress_test_role;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 7bd0901281..e59200df9a 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -375,6 +375,9 @@ $$;
 
 RESET client_min_messages;
 
+-- uses "none" provider instead
+CREATE COLLATION testc (provider = icu, locale='C');
+
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 SET icu_validation_level = ERROR;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
@@ -388,6 +391,9 @@ CREATE COLLATION test5 FROM test0;
 
 SELECT collname FROM pg_collation WHERE collname LIKE 'test%' ORDER BY 1;
 
+DROP COLLATION test1;
+CREATE COLLATION test1 (provider = icu, locale = 'und');
+
 ALTER COLLATION test1 RENAME TO test11;
 ALTER COLLATION test0 RENAME TO test11; -- fail
 ALTER COLLATION test1 RENAME TO test22; -- fail
-- 
2.34.1

From c04053021eaa6db480143393a7de83525a8f4f7e Mon Sep 17 00:00:00 2001
From: Jeff Davis <j...@j-davis.com>
Date: Tue, 25 Apr 2023 15:01:55 -0700
Subject: [PATCH v6 3/5] Make LOCALE apply to ICU_LOCALE for CREATE DATABASE.

LOCALE is now an alias for LC_COLLATE, LC_CTYPE, and (if the provider
is ICU) ICU_LOCALE. The ICU provider accepts more locale names than
libc (e.g. language tags and locale names containing collation
attributes), so in some cases LC_COLLATE, LC_CTYPE, and ICU_LOCALE
will still need to be specified separately.

Previously, LOCALE applied only to LC_COLLATE and LC_CTYPE (and
similarly for --locale in initdb and createdb). That could lead to
confusion when the provider is implicit, such as when it is inherited
from the template database, or when ICU was made default at initdb
time in commit 27b62377b4.

Reverts incomplete fix 5cd1a5af4d.

Discussion: https://postgr.es/m/3391932.1682107...@sss.pgh.pa.us
---
 doc/src/sgml/ref/create_database.sgml         |  6 +++--
 doc/src/sgml/ref/createdb.sgml                |  5 +++-
 doc/src/sgml/ref/initdb.sgml                  |  7 +++---
 src/backend/commands/collationcmds.c          |  2 +-
 src/backend/commands/dbcommands.c             | 15 ++++++++----
 src/bin/initdb/initdb.c                       | 11 ++++++---
 src/bin/scripts/createdb.c                    | 13 ++++-------
 src/bin/scripts/t/020_createdb.pl             |  4 ++--
 src/test/icu/t/010_database.pl                | 23 ++++++++++++-------
 .../regress/expected/collate.icu.utf8.out     | 22 +++++++++---------
 10 files changed, 65 insertions(+), 43 deletions(-)

diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index c730d02e15..dc57ba0c8b 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -145,8 +145,10 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
       <term><replaceable class="parameter">locale</replaceable></term>
       <listitem>
        <para>
-        This is a shortcut for setting <symbol>LC_COLLATE</symbol>
-        and <symbol>LC_CTYPE</symbol> at once.
+        This is a shortcut for setting <symbol>LC_COLLATE</symbol>,
+        <symbol>LC_CTYPE</symbol> and <symbol>ICU_LOCALE</symbol> at
+        once. Some locales are only valid for ICU, and must be set separately
+        with <symbol>ICU_LOCALE</symbol>.
        </para>
        <tip>
         <para>
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index 7c573e848a..7991153ecc 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -124,7 +124,10 @@ PostgreSQL documentation
       <listitem>
        <para>
         Specifies the locale to be used in this database.  This is equivalent
-        to specifying both <option>--lc-collate</option> and <option>--lc-ctype</option>.
+        to specifying <option>--lc-collate</option>,
+        <option>--lc-ctype</option>, and <option>--icu-locale</option> to the
+        same value. Some locales are only valid for ICU and must be set with
+        <option>--icu-locale</option>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 76993acdfe..d9ef21c422 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -116,9 +116,10 @@ PostgreSQL documentation
   <para>
    To choose a different locale for the cluster, use the option
    <option>--locale</option>.  There are also individual options
-   <option>--lc-*</option> (see below) to set values for the individual locale
-   categories.  Note that inconsistent settings for different locale
-   categories can give nonsensical results, so this should be used with care.
+   <option>--lc-*</option> and <option>--icu-locale</option> (see below) to
+   set values for the individual locale categories.  Note that inconsistent
+   settings for different locale categories can give nonsensical results, so
+   this should be used with care.
   </para>
 
   <para>
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 8bc6f8347d..21615746f9 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -302,7 +302,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 				if (langtag && strcmp(colliculocale, langtag) != 0)
 				{
 					ereport(NOTICE,
-							(errmsg("using standard form \"%s\" for locale \"%s\"",
+							(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 									langtag, colliculocale)));
 
 					colliculocale = langtag;
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 6dc737aebb..154f20573c 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1019,7 +1019,12 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	if (dblocprovider == '\0')
 		dblocprovider = src_locprovider;
 	if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
-		dbiculocale = src_iculocale;
+	{
+		if (dlocale && dlocale->arg)
+			dbiculocale = defGetString(dlocale);
+		else
+			dbiculocale = src_iculocale;
+	}
 	if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
 		dbicurules = src_icurules;
 
@@ -1033,12 +1038,14 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	if (!check_locale(LC_COLLATE, dbcollate, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbcollate)));
+				 errmsg("invalid LC_COLLATE locale name: \"%s\"", dbcollate),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbcollate = canonname;
 	if (!check_locale(LC_CTYPE, dbctype, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbctype)));
+				 errmsg("invalid LC_CTYPE locale name: \"%s\"", dbctype),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbctype = canonname;
 
 	check_encoding_locale_matches(encoding, dbcollate, dbctype);
@@ -1094,7 +1101,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 			if (langtag && strcmp(dbiculocale, langtag) != 0)
 			{
 				ereport(NOTICE,
-						(errmsg("using standard form \"%s\" for locale \"%s\"",
+						(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 								langtag, dbiculocale)));
 
 				dbiculocale = langtag;
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index e5ec2a243e..f0827154cd 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2164,7 +2164,11 @@ check_locale_name(int category, const char *locale, char **canonname)
 	if (res == NULL)
 	{
 		if (*locale)
-			pg_fatal("invalid locale name \"%s\"", locale);
+		{
+			pg_log_error("invalid locale name \"%s\"", locale);
+			pg_log_error_hint("If the locale name is specific to ICU, use --icu-locale.");
+			exit(1);
+		}
 		else
 		{
 			/*
@@ -2406,7 +2410,7 @@ setlocales(void)
 {
 	char	   *canonname;
 
-	/* set empty lc_* values to locale config if set */
+	/* set empty lc_* and iculocale values to locale config if set */
 
 	if (locale_provider == COLLPROVIDER_NONE)
 	{
@@ -2438,6 +2442,8 @@ setlocales(void)
 			lc_monetary = locale;
 		if (!lc_messages)
 			lc_messages = locale;
+		if (!icu_locale && locale_provider == COLLPROVIDER_ICU)
+			icu_locale = locale;
 	}
 
 	if (icu_locale && locale_provider == COLLPROVIDER_ICU &&
@@ -3331,7 +3337,6 @@ main(int argc, char *argv[])
 				break;
 			case 8:
 				locale = "C";
-				locale_provider = COLLPROVIDER_LIBC;
 				break;
 			case 9:
 				pwfilename = pg_strdup(optarg);
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index 9caf9190cf..51c4bb3592 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -164,14 +164,6 @@ main(int argc, char *argv[])
 			exit(1);
 	}
 
-	if (locale)
-	{
-		if (!lc_ctype)
-			lc_ctype = locale;
-		if (!lc_collate)
-			lc_collate = locale;
-	}
-
 	if (locale_provider && pg_strcasecmp(locale_provider, "icu") == 0 &&
 		icu_locale &&
 		(pg_strcasecmp(icu_locale, "C") == 0 ||
@@ -230,6 +222,11 @@ main(int argc, char *argv[])
 		appendPQExpBuffer(&sql, " STRATEGY %s", fmtId(strategy));
 	if (template)
 		appendPQExpBuffer(&sql, " TEMPLATE %s", fmtId(template));
+	if (locale)
+	{
+		appendPQExpBufferStr(&sql, " LOCALE ");
+		appendStringLiteralConn(&sql, locale, conn);
+	}
 	if (lc_collate)
 	{
 		appendPQExpBufferStr(&sql, " LC_COLLATE ");
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index eb3682f0fd..81a9931c09 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -167,7 +167,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_COLLATE locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
 	],
 	'createdb with incorrect --lc-collate');
 $node->command_checks_all(
@@ -175,7 +175,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_CTYPE locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
 	],
 	'createdb with incorrect --lc-ctype');
 
diff --git a/src/test/icu/t/010_database.pl b/src/test/icu/t/010_database.pl
index 715b1bffd6..df4af00afe 100644
--- a/src/test/icu/t/010_database.pl
+++ b/src/test/icu/t/010_database.pl
@@ -51,16 +51,23 @@ b),
 	'sort by explicit collation upper first');
 
 
-# Test error cases in CREATE DATABASE involving locale-related options
+# Test that LOCALE='C' works for ICU
 
-my ($ret, $stdout, $stderr) = $node1->psql('postgres',
-	q{CREATE DATABASE dbicu LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8});
-isnt($ret, 0,
-	"ICU locale must be specified for ICU provider: exit code not 0");
+my $ret1 = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu2 LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8});
+is($ret1, 0,
+	"C locale works for ICU");
+
+# Test that ICU-specific locale string must be specified with ICU_LOCALE,
+# not LOCALE
+
+my ($ret2, $stdout, $stderr) = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu3 LOCALE_PROVIDER icu LOCALE '@colStrength=primary' TEMPLATE template0 ENCODING UTF8});
+isnt($ret2, 0,
+	"ICU-specific locale must be specified with ICU_LOCALE: exit code not 0");
 like(
 	$stderr,
-	qr/ERROR:  ICU locale must be specified/,
-	"ICU locale must be specified for ICU provider: error message");
-
+	qr/ERROR:  invalid LC_COLLATE locale name/,
+	"ICU-specific locale must be specified with ICU_LOCALE: error message");
 
 done_testing();
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 7c186e9f69..cf1852c89d 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1202,9 +1202,9 @@ SELECT 'coté' < 'côte' COLLATE "und-x-icu", 'coté' > 'côte' COLLATE testcoll
 (1 row)
 
 CREATE COLLATION testcoll_lower_first (provider = icu, locale = '@colCaseFirst=lower');
-NOTICE:  using standard form "und-u-kf-lower" for locale "@colCaseFirst=lower"
+NOTICE:  using standard form "und-u-kf-lower" for ICU locale "@colCaseFirst=lower"
 CREATE COLLATION testcoll_upper_first (provider = icu, locale = '@colCaseFirst=upper');
-NOTICE:  using standard form "und-u-kf-upper" for locale "@colCaseFirst=upper"
+NOTICE:  using standard form "und-u-kf-upper" for ICU locale "@colCaseFirst=upper"
 SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcoll_upper_first;
  ?column? | ?column? 
 ----------+----------
@@ -1212,7 +1212,7 @@ SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcol
 (1 row)
 
 CREATE COLLATION testcoll_shifted (provider = icu, locale = '@colAlternate=shifted');
-NOTICE:  using standard form "und-u-ka-shifted" for locale "@colAlternate=shifted"
+NOTICE:  using standard form "und-u-ka-shifted" for ICU locale "@colAlternate=shifted"
 SELECT 'de-luge' < 'deanza' COLLATE "und-x-icu", 'de-luge' > 'deanza' COLLATE testcoll_shifted;
  ?column? | ?column? 
 ----------+----------
@@ -1229,12 +1229,12 @@ SELECT 'A-21' > 'A-123' COLLATE "und-x-icu", 'A-21' < 'A-123' COLLATE testcoll_n
 (1 row)
 
 CREATE COLLATION testcoll_error1 (provider = icu, locale = '@colNumeric=lower');
-NOTICE:  using standard form "und-u-kn-lower" for locale "@colNumeric=lower"
+NOTICE:  using standard form "und-u-kn-lower" for ICU locale "@colNumeric=lower"
 ERROR:  could not open collator for locale "und-u-kn-lower": U_ILLEGAL_ARGUMENT_ERROR
 -- test that attributes not handled by icu_set_collation_attributes()
 -- (handled by ucol_open() directly) also work
 CREATE COLLATION testcoll_de_phonebook (provider = icu, locale = 'de@collation=phonebook');
-NOTICE:  using standard form "de-u-co-phonebk" for locale "de@collation=phonebook"
+NOTICE:  using standard form "de-u-co-phonebk" for ICU locale "de@collation=phonebook"
 SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE testcoll_de_phonebook;
  ?column? | ?column? 
 ----------+----------
@@ -1243,7 +1243,7 @@ SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE tes
 
 -- rules
 CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test7 (a text);
 -- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
 INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
@@ -1271,13 +1271,13 @@ SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
 
 DROP TABLE test7;
 CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 ERROR:  could not open collator for locale "und" with rules "!!wrong!!": U_INVALID_FORMAT_ERROR
 -- nondeterministic collations
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION ctest_nondet (provider = icu, locale = '', deterministic = false);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test6 (a int, b text);
 -- same string in different normal forms
 INSERT INTO test6 VALUES (1, U&'\00E4bc');
@@ -1327,9 +1327,9 @@ SELECT * FROM test6a WHERE b = ARRAY['äbc'] COLLATE ctest_nondet;
 (2 rows)
 
 CREATE COLLATION case_sensitive (provider = icu, locale = '');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION case_insensitive (provider = icu, locale = '@colStrength=secondary', deterministic = false);
-NOTICE:  using standard form "und-u-ks-level2" for locale "@colStrength=secondary"
+NOTICE:  using standard form "und-u-ks-level2" for ICU locale "@colStrength=secondary"
 SELECT 'abc' <= 'ABC' COLLATE case_sensitive, 'abc' >= 'ABC' COLLATE case_sensitive;
  ?column? | ?column? 
 ----------+----------
-- 
2.34.1

From 2a857a2cb080dbc015c59b89acbb195ae7991a99 Mon Sep 17 00:00:00 2001
From: Jeff Davis <j...@j-davis.com>
Date: Thu, 11 May 2023 12:54:31 -0700
Subject: [PATCH v6 4/5] Add default_collation_provider GUC.

Controls default collation provider for CREATE COLLATION. Does not
affect CREATE DATABASE, which gets its default from the template
database.
---
 doc/src/sgml/config.sgml                      | 17 +++++++++++++
 doc/src/sgml/ref/create_collation.sgml        | 15 ++++++++---
 src/backend/commands/collationcmds.c          |  8 +++++-
 src/backend/utils/misc/guc_tables.c           | 18 +++++++++++++
 src/backend/utils/misc/postgresql.conf.sample |  4 +++
 src/include/commands/collationcmds.h          |  2 ++
 .../regress/expected/collate.icu.utf8.out     | 25 +++++++++++++++++++
 src/test/regress/sql/collate.icu.utf8.sql     | 13 ++++++++++
 8 files changed, 97 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 18ce06729b..58a1046340 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9820,6 +9820,23 @@ SET XML OPTION { DOCUMENT | CONTENT };
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-default-collation-provider" xreflabel="default_collation_provider">
+      <term><varname>default_collation_provider</varname> (<type>enum</type>)
+      <indexterm>
+       <primary><varname>default_collation_provider</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Default collation provider for <command>CREATE
+        COLLATION</command>. Does not affect <command>CREATE
+        DATABASE</command>, which gets the default collation provider from the
+        template database. Valid values are <literal>icu</literal> and
+        <literal>libc</literal>. The default is <literal>libc</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-icu-validation-level" xreflabel="icu_validation_level">
       <term><varname>icu_validation_level</varname> (<type>enum</type>)
       <indexterm>
diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index 1ac41831d8..c9b3e6e218 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -121,10 +121,17 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
       <para>
        Specifies the provider to use for locale services associated with this
        collation.  Possible values are <literal>none</literal>,
-       <literal>icu</literal><indexterm><primary>ICU</primary></indexterm>
-       (if the server was built with ICU support) or <literal>libc</literal>.
-       <literal>libc</literal> is the default.  See <xref
-       linkend="locale-providers"/> for details.
+       <literal>icu</literal><indexterm><primary>ICU</primary></indexterm> (if
+       the server was built with ICU support) or <literal>libc</literal>.  See
+       <xref linkend="locale-providers"/> for details.
+      </para>
+      <para>
+       If <replaceable>provider</replaceable> is not specified, and
+       <replaceable>lc_collate</replaceable> or
+       <replaceable>lc_ctype</replaceable> is specified, the
+       <literal>libc</literal> provider is used. Otherwise, the default
+       provider is controlled by <xref
+       linkend="guc-default-collation-provider"/>.
       </para>
       <para>
        If the provider is <literal>icu</literal> and the locale is
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 21615746f9..25e8d32fd9 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -47,6 +47,7 @@ typedef struct
 	int			enc;			/* encoding */
 } CollAliasData;
 
+int		default_collation_provider = (int) COLLPROVIDER_LIBC;
 
 /*
  * CREATE COLLATION
@@ -228,7 +229,12 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 								collproviderstr)));
 		}
 		else
-			collprovider = COLLPROVIDER_LIBC;
+		{
+			if (lccollateEl || lcctypeEl)
+				collprovider = COLLPROVIDER_LIBC;
+			else
+				collprovider = (char) default_collation_provider;
+		}
 
 		if (collprovider == COLLPROVIDER_NONE
 			&& (localeEl || lccollateEl || lcctypeEl))
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 844781a7f5..901cfda819 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -35,8 +35,10 @@
 #include "access/xlogrecovery.h"
 #include "archive/archive_module.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_collation.h"
 #include "catalog/storage.h"
 #include "commands/async.h"
+#include "commands/collationcmds.h"
 #include "commands/tablespace.h"
 #include "commands/trigger.h"
 #include "commands/user.h"
@@ -166,6 +168,12 @@ static const struct config_enum_entry intervalstyle_options[] = {
 	{NULL, 0, false}
 };
 
+static const struct config_enum_entry collation_provider_options[] = {
+	{"icu", (int) 'i', false},
+	{"libc", (int) 'c', false},
+	{NULL, 0, false}
+};
+
 static const struct config_enum_entry icu_validation_level_options[] = {
 	{"disabled", -1, false},
 	{"debug5", DEBUG5, false},
@@ -4683,6 +4691,16 @@ struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"default_collation_provider", PGC_USERSET, CLIENT_CONN_LOCALE,
+		 gettext_noop("Default collation provider for CREATE COLLATION."),
+		 NULL
+		},
+		&default_collation_provider,
+		(int) COLLPROVIDER_LIBC, collation_provider_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"icu_validation_level", PGC_USERSET, CLIENT_CONN_LOCALE,
 		 gettext_noop("Log level for reporting invalid ICU locale strings."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index c8018da04a..c1f247378d 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -734,6 +734,10 @@
 #lc_numeric = 'C'			# locale for number formatting
 #lc_time = 'C'				# locale for time formatting
 
+#default_collation_provider = 'libc'	# default collation provider
+					# for CREATE COLLATION
+					# (none, icu, libc)
+
 #icu_validation_level = WARNING		# report ICU locale validation
 					# errors at the given level
 
diff --git a/src/include/commands/collationcmds.h b/src/include/commands/collationcmds.h
index b76c7b3dc3..f54389525d 100644
--- a/src/include/commands/collationcmds.h
+++ b/src/include/commands/collationcmds.h
@@ -18,6 +18,8 @@
 #include "catalog/objectaddress.h"
 #include "parser/parse_node.h"
 
+extern PGDLLIMPORT int default_collation_provider;
+
 extern ObjectAddress DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_exists);
 extern void IsThereCollationInNamespace(const char *collname, Oid nspOid);
 extern ObjectAddress AlterCollation(AlterCollationStmt *stmt);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index cf1852c89d..ea96e27f45 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1038,6 +1038,31 @@ RESET client_min_messages;
 -- uses "none" provider instead
 CREATE COLLATION testc (provider = icu, locale='C');
 NOTICE:  using locale provider "none" for ICU locale "C"
+SET default_collation_provider = 'libc';
+CREATE COLLATION def_libc (LOCALE = 'C');
+SELECT collname, collprovider FROM pg_collation WHERE collname='def_libc';
+ collname | collprovider 
+----------+--------------
+ def_libc | c
+(1 row)
+
+DROP COLLATION def_libc;
+SET default_collation_provider = 'icu';
+CREATE COLLATION def_icu (LOCALE = 'und');
+SELECT collname, collprovider FROM pg_collation WHERE collname='def_icu';
+ collname | collprovider 
+----------+--------------
+ def_icu  | i
+(1 row)
+
+CREATE COLLATION def_libc (LC_COLLATE = 'C', LC_CTYPE='C');
+SELECT collname, collprovider FROM pg_collation WHERE collname='def_libc';
+ collname | collprovider 
+----------+--------------
+ def_libc | c
+(1 row)
+
+RESET default_collation_provider;
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 ERROR:  parameter "locale" must be specified
 SET icu_validation_level = ERROR;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index e59200df9a..ee607ca3a5 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -378,6 +378,19 @@ RESET client_min_messages;
 -- uses "none" provider instead
 CREATE COLLATION testc (provider = icu, locale='C');
 
+SET default_collation_provider = 'libc';
+CREATE COLLATION def_libc (LOCALE = 'C');
+SELECT collname, collprovider FROM pg_collation WHERE collname='def_libc';
+DROP COLLATION def_libc;
+
+SET default_collation_provider = 'icu';
+CREATE COLLATION def_icu (LOCALE = 'und');
+SELECT collname, collprovider FROM pg_collation WHERE collname='def_icu';
+CREATE COLLATION def_libc (LC_COLLATE = 'C', LC_CTYPE='C');
+SELECT collname, collprovider FROM pg_collation WHERE collname='def_libc';
+
+RESET default_collation_provider;
+
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 SET icu_validation_level = ERROR;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
-- 
2.34.1

From 274a887f8970647b2c932ee55c4783095719985d Mon Sep 17 00:00:00 2001
From: Jeff Davis <j...@j-davis.com>
Date: Fri, 28 Apr 2023 12:22:41 -0700
Subject: [PATCH v6 5/5] ICU: fix up old libc-style locale strings.

Before transforming a locale string into a language tag, fix up old
libc-style locale strings such as 'fr_FR@euro'. Older ICU versions did
this automatically, but ICU version 64 removed that support.

Discussion: https://postgr.es/m/654a49f7ff7461bcf47be4181430678d45f93858.camel%40j-davis.com
---
 src/backend/utils/adt/pg_locale.c             | 57 ++++++++++++++++-
 src/bin/initdb/initdb.c                       | 61 ++++++++++++++++++-
 .../regress/expected/collate.icu.utf8.out     | 11 ++++
 src/test/regress/sql/collate.icu.utf8.sql     |  7 +++
 4 files changed, 134 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 95eb5cf464..2ee81e9804 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -2787,6 +2787,58 @@ icu_set_collation_attributes(UCollator *collator, const char *loc,
 
 	pfree(lower_str);
 }
+
+static const char *icu_variant_map[][2] = {
+	{ "@EURO",   "@currency=EUR" },
+	{ "@PINYIN", "@collation=pinyin" },
+	{ "@STROKE", "@collation=stroke" },
+};
+
+/*
+ * ICU version 64 removed the ability to transform locale strings of the form
+ * '...@VARIANT' into proper language tags. Perform the transformation from
+ * within Postgres so that ICU supports any libc locale name consistently,
+ * regardless of the ICU version.
+ */
+static char *
+icu_fix_variants(const char *loc_str)
+{
+	const char *old_variant = strrchr(loc_str, '@');
+
+	/*
+	 * Extract a variant of the form '...@VARIANT', and replace with
+	 * the appropriate '...@keyword=value' if found in the map.
+	 */
+	if (old_variant)
+	{
+		size_t prefix_len = old_variant - loc_str; /* bytes before the '@' */
+
+		for (int i = 0; i < lengthof(icu_variant_map); i++)
+		{
+			const char *map_variant = icu_variant_map[i][0];
+			const char *map_replacement = icu_variant_map[i][1];
+
+			if (pg_strcasecmp(old_variant, map_variant) == 0)
+			{
+				size_t	 replacement_len = strlen(map_replacement);
+				size_t	 result_len;
+				char	*result;
+
+				result_len = prefix_len + replacement_len + 1;
+				result = palloc(result_len);
+
+				memcpy(result, loc_str, prefix_len);
+				memcpy(result + prefix_len, map_replacement, replacement_len);
+				result[prefix_len + replacement_len] = '\0';
+
+				return result;
+			}
+		}
+	}
+
+	return pstrdup(loc_str);
+}
+
 #endif
 
 /*
@@ -2803,6 +2855,7 @@ icu_language_tag(const char *loc_str, int elevel)
 {
 #ifdef USE_ICU
 	UErrorCode	 status;
+	char		*fixed_loc_str = icu_fix_variants(loc_str);
 	char		 lang[ULOC_LANG_CAPACITY];
 	char		*langtag;
 	size_t		 buflen = 32;	/* arbitrary starting buffer size */
@@ -2833,7 +2886,7 @@ icu_language_tag(const char *loc_str, int elevel)
 	while (true)
 	{
 		status = U_ZERO_ERROR;
-		uloc_toLanguageTag(loc_str, langtag, buflen, strict, &status);
+		uloc_toLanguageTag(fixed_loc_str, langtag, buflen, strict, &status);
 
 		/* try again if the buffer is not large enough */
 		if ((status == U_BUFFER_OVERFLOW_ERROR ||
@@ -2848,6 +2901,8 @@ icu_language_tag(const char *loc_str, int elevel)
 		break;
 	}
 
+	pfree(fixed_loc_str);
+
 	if (U_FAILURE(status))
 	{
 		pfree(langtag);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index f0827154cd..1304a235ce 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2240,6 +2240,61 @@ check_icu_locale_encoding(int user_enc)
 	return true;
 }
 
+#ifdef USE_ICU
+
+static const char *icu_variant_map[][2] = {
+	{ "@EURO",   "@currency=EUR" },
+	{ "@PINYIN", "@collation=pinyin" },
+	{ "@STROKE", "@collation=stroke" },
+};
+
+/*
+ * ICU version 64 removed the ability to transform locale strings of the form
+ * '...@VARIANT' into proper language tags. Perform the transformation from
+ * within Postgres so that ICU supports any libc locale name consistently,
+ * regardless of the ICU version.
+ */
+static char *
+icu_fix_variants(const char *loc_str)
+{
+	const char *old_variant = strrchr(loc_str, '@');
+
+	/*
+	 * Extract a variant of the form '...@VARIANT', and replace with
+	 * the appropriate '...@keyword=value' if found in the map.
+	 */
+	if (old_variant)
+	{
+		size_t prefix_len = old_variant - loc_str; /* bytes before the '@' */
+
+		for (int i = 0; i < lengthof(icu_variant_map); i++)
+		{
+			const char *map_variant = icu_variant_map[i][0];
+			const char *map_replacement = icu_variant_map[i][1];
+
+			if (pg_strcasecmp(old_variant, map_variant) == 0)
+			{
+				size_t	 replacement_len = strlen(map_replacement);
+				size_t	 result_len;
+				char	*result;
+
+				result_len = prefix_len + replacement_len + 1;
+				result = pg_malloc(result_len);
+
+				memcpy(result, loc_str, prefix_len);
+				memcpy(result + prefix_len, map_replacement, replacement_len);
+				result[prefix_len + replacement_len] = '\0';
+
+				return result;
+			}
+		}
+	}
+
+	return pg_strdup(loc_str);
+}
+
+#endif
+
 /*
  * Convert to canonical BCP47 language tag. Must be consistent with
  * icu_language_tag().
@@ -2249,6 +2304,7 @@ icu_language_tag(const char *loc_str)
 {
 #ifdef USE_ICU
 	UErrorCode	 status;
+	char		*fixed_loc_str = icu_fix_variants(loc_str);
 	char		 lang[ULOC_LANG_CAPACITY];
 	char		*langtag;
 	size_t		 buflen = 32;	/* arbitrary starting buffer size */
@@ -2277,7 +2333,8 @@ icu_language_tag(const char *loc_str)
 	while (true)
 	{
 		status = U_ZERO_ERROR;
-		uloc_toLanguageTag(loc_str, langtag, buflen, strict, &status);
+
+		uloc_toLanguageTag(fixed_loc_str, langtag, buflen, strict, &status);
 
 		/* try again if the buffer is not large enough */
 		if (status == U_BUFFER_OVERFLOW_ERROR ||
@@ -2291,6 +2348,8 @@ icu_language_tag(const char *loc_str)
 		break;
 	}
 
+	pg_free(fixed_loc_str);
+
 	if (U_FAILURE(status))
 	{
 		pg_free(langtag);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index ea96e27f45..692e8cdf18 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1071,12 +1071,23 @@ ERROR:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 ERROR:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); -- fails
+ERROR:  could not convert locale name "@ASDF" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 RESET icu_validation_level;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 WARNING:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); DROP COLLATION testx;
+WARNING:  could not convert locale name "@ASDF" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+-- test special variants
+CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
+NOTICE:  using standard form "und-u-cu-eur" for ICU locale "@EURO"
+CREATE COLLATION testx (provider = icu, locale = '@pinyin'); DROP COLLATION testx;
+NOTICE:  using standard form "und-u-co-pinyin" for ICU locale "@pinyin"
+CREATE COLLATION testx (provider = icu, locale = '@stroke'); DROP COLLATION testx;
+NOTICE:  using standard form "und-u-co-stroke" for ICU locale "@stroke"
 CREATE COLLATION test4 FROM nonsense;
 ERROR:  collation "nonsense" for encoding "UTF8" does not exist
 CREATE COLLATION test5 FROM test0;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index ee607ca3a5..0b90e2a5b9 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -395,9 +395,16 @@ CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, nee
 SET icu_validation_level = ERROR;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); -- fails
 RESET icu_validation_level;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); DROP COLLATION testx;
+
+-- test special variants
+CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = '@pinyin'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = '@stroke'); DROP COLLATION testx;
 
 CREATE COLLATION test4 FROM nonsense;
 CREATE COLLATION test5 FROM test0;
-- 
2.34.1

Reply via email to