New patch series attached.

=== 0001: fix bug that allows creating hidden collations

Bug:
https://www.postgresql.org/message-id/051c9395cf880307865ee8b17acdbf7f838c1e39.ca...@j-davis.com

=== 0002: handle some kinds of libc-stlye locale strings

ICU used to handle libc locale strings like 'fr_FR@euro', but doesn't
in later versions. Handle them in postgres for consistency.

=== 0003: reduce icu_validation_level to WARNING

Given that we've seen some inconsistency in which locale names are
accepted in different ICU versions, it seems best not to be too strict.
Peter Eisentraut suggested that it be set to ERROR originally, but a
WARNING should be sufficient to see problems without introducing risks
migrating to version 16.

I don't expect objections to 0003, so I may commit this soon, but I'll
give it a little time in case someone has an opinion.

=== 0004-0006: 

To solve the issues that have come up in this thread, we need CREATE
DATABASE (and createdb and initdb) to use LOCALE to mean the collation
locale regardless of which provider is in use (which is what 0006
does).

0006 depends on ICU handling libc locale names. It already does a good
job for most libc locale names (though patch 0002 fixes a few cases
where it doesn't). There may be more cases, but for the most part libc
names are interpreted in a reasonable way. But one important case is
missing: ICU does not handle the "C" locale as we expect (that is,
using memcmp()).

We've already allowed users to create ICU collations with the C locale
in the past, which uses the root collation (not memcmp()), and we need
to keep supporting that for upgraded clusters. So that leaves us with a
catalog representation problem. I mentioned upthread that we can solve
that by:

  1. Using iculocale=NULL to mean "C-as-in-memcmp", or having some
other catalog hack (like another field). That's not desirable because
the catalog representation is already complex and it may be hard for
users to tell what's happening.

  2. When provider=icu and locale=C, switch to provider=libc locale=C.
This is very messy, because currently the syntax allows specifying a
database with LOCALE_PROVIDER='icu' ICU_LOCALE='C' LC_COLLATE='en_US' -
- if the provider gets changed to libc, what would we set datcollate
to? I don't think this is workable without some breakage. We can't
simply override datcollate to be C in that case, because there are some
things other than the default collation that might need it set to en_US
as the user specified.

  3. Introduce collation provider "none", which is always memcmp-based
(patch 0004). It's equivalent to the libc locale=C, but it allows
specifying the LC_COLLATE and LC_CTYPE independently. A command like
CREATE DATABASE ... LOCALE_PROVIDER='icu' ICU_LOCALE='C'
LC_COLLATE='en_US' would get changed (with a NOTICE) to provider "none"
(patch 0005), so you'd have datlocprovider=none, datcollate=en_US. For
the database default collation, that would always use memcmp(), but the
server environment LC_COLLATE would be set to en_US as the user
specified.

For this patch series, I chose approach #3. I think it works out nicely
-- it provides a better place to document the "no locale" behavior
(including a warning that it depends on the database encoding), and I
think it's more clear to the user that locale=C is not actually using a
provider at all. It's more invasive, but feels like a better solution.
If others don't like it I can implement approach #1 instead.

=== 0007: Add a GUC to control the default collation provider

Having a GUC would make it easier to migrate to ICU without surprises.
This only affects the default for CREATE COLLATION, not CREATE DATABASE
(and obviously not initdb).


-- 
Jeff Davis
PostgreSQL Contributor Team - AWS


From fc66f02976bb11b629bcf71346c2858eccbcf1a3 Mon Sep 17 00:00:00 2001
From: Jeff Davis <j...@j-davis.com>
Date: Thu, 11 May 2023 10:36:04 -0700
Subject: [PATCH v5 1/7] For user-defined collations, never set
 collencoding=-1.

For new user-defined collations, always set collencoding to the
current database encoding so that it is never shadowed by a built-in
collation.

Built in collations that work with any encoding may have
collencoding=-1, and if a user defines a collation with the same name,
it will shadow the built-in collation.

Previously it was possible to create an ICU collation (which was
assigned collencoding=-1) that was shadowed by a built-in collation
and completely inaccessible.
---
 src/backend/commands/collationcmds.c          | 28 +++++++++++++------
 .../regress/expected/collate.icu.utf8.out     |  2 +-
 2 files changed, 21 insertions(+), 9 deletions(-)

diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index c91fe66d9b..a53700256b 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -302,16 +302,29 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 					(errcode(ERRCODE_INVALID_OBJECT_DEFINITION),
 					 errmsg("ICU rules cannot be specified unless locale provider is ICU")));
 
+		/*
+		 * The collencoding is used to hide built-in collations that are
+		 * incompatible with the current database encoding, allowing users to
+		 * define a compatible collation with the same name if
+		 * desired. Built-in collations that work with any encoding have
+		 * collencoding=-1.
+		 *
+		 * A collation that's a match to the current database encoding will
+		 * shadow a collation with the same name and collencoding=-1. We never
+		 * want a user-created collation to be shadowed by a built-in
+		 * collation, so for user-created collations, always set collencoding
+		 * to the current database encoding.
+		 */
+		collencoding = GetDatabaseEncoding();
+
 		if (collprovider == COLLPROVIDER_ICU)
 		{
 #ifdef USE_ICU
 			/*
-			 * We could create ICU collations with collencoding == database
-			 * encoding, but it seems better to use -1 so that it matches the
-			 * way initdb would create ICU collations.  However, only allow
-			 * one to be created when the current database's encoding is
-			 * supported.  Otherwise the collation is useless, plus we get
-			 * surprising behaviors like not being able to drop the collation.
+			 * Only allow an ICU collation to be created when the current
+			 * database's encoding is supported.  Otherwise the collation is
+			 * useless, plus we get surprising behaviors like not being able
+			 * to drop the collation.
 			 *
 			 * Skip this test when !USE_ICU, because the error we want to
 			 * throw for that isn't thrown till later.
@@ -321,11 +334,10 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 						(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 						 errmsg("current database's encoding is not supported with this provider")));
 #endif
-			collencoding = -1;
 		}
 		else
 		{
-			collencoding = GetDatabaseEncoding();
+			Assert(collprovider == COLLPROVIDER_LIBC);
 			check_encoding_locale_matches(collencoding, collcollate, collctype);
 		}
 	}
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index b5a221b030..9c9e1e4f48 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1062,7 +1062,7 @@ SELECT collname FROM pg_collation WHERE collname LIKE 'test%' ORDER BY 1;
 
 ALTER COLLATION test1 RENAME TO test11;
 ALTER COLLATION test0 RENAME TO test11; -- fail
-ERROR:  collation "test11" already exists in schema "collate_tests"
+ERROR:  collation "test11" for encoding "UTF8" already exists in schema "collate_tests"
 ALTER COLLATION test1 RENAME TO test22; -- fail
 ERROR:  collation "test1" for encoding "UTF8" does not exist
 ALTER COLLATION test11 OWNER TO regress_test_role;
-- 
2.34.1

From 25824dc213272c739eecd16b17a3458fc5f81339 Mon Sep 17 00:00:00 2001
From: Jeff Davis <j...@j-davis.com>
Date: Fri, 28 Apr 2023 12:22:41 -0700
Subject: [PATCH v5 2/7] ICU: fix up old libc-style locale strings.

Before transforming a locale string into a language tag, fix up old
libc-style locale strings such as 'fr_FR@euro'. Older ICU versions did
this automatically, but ICU version 64 removed that support.

Discussion: https://postgr.es/m/654a49f7ff7461bcf47be4181430678d45f93858.camel%40j-davis.com
---
 src/backend/utils/adt/pg_locale.c             | 59 ++++++++++++++++-
 src/bin/initdb/initdb.c                       | 63 ++++++++++++++++++-
 .../regress/expected/collate.icu.utf8.out     | 11 ++++
 src/test/regress/sql/collate.icu.utf8.sql     |  7 +++
 4 files changed, 138 insertions(+), 2 deletions(-)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index f0b6567da1..e7b166461b 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -2766,6 +2766,60 @@ icu_set_collation_attributes(UCollator *collator, const char *loc,
 	pfree(lower_str);
 }
 
+static const char *icu_variant_map[][2] = {
+	{ "@EURO",   "@currency=EUR" },
+	{ "@PINYIN", "@collation=pinyin" },
+	{ "@STROKE", "@collation=stroke" },
+};
+
+#define ICU_VARIANT_MAP_SIZE \
+	(sizeof(icu_variant_map)/sizeof(icu_variant_map[0]))
+
+/*
+ * ICU version 64 removed the ability to transform locale strings of the form
+ * '...@VARIANT' into proper language tags. Perform the transformation from
+ * within Postgres so that ICU supports any libc locale name consistently,
+ * regardless of the ICU version.
+ */
+static char *
+icu_fix_variants(const char *loc_str)
+{
+	const char *old_variant = strrchr(loc_str, '@');
+
+	/*
+	 * Extract a variant of the form '...@VARIANT', and replace with
+	 * the appropriate '...@keyword=value' if found in the map.
+	 */
+	if (old_variant)
+	{
+		size_t prefix_len = old_variant - loc_str; /* bytes before the '@' */
+
+		for (int i = 0; i < ICU_VARIANT_MAP_SIZE; i++)
+		{
+			const char *map_variant = icu_variant_map[i][0];
+			const char *map_replacement = icu_variant_map[i][1];
+
+			if (pg_strcasecmp(old_variant, map_variant) == 0)
+			{
+				size_t	 replacement_len = strlen(map_replacement);
+				size_t	 result_len;
+				char	*result;
+
+				result_len = prefix_len + replacement_len + 1;
+				result = palloc(result_len);
+
+				memcpy(result, loc_str, prefix_len);
+				memcpy(result + prefix_len, map_replacement, replacement_len);
+				result[prefix_len + replacement_len] = '\0';
+
+				return result;
+			}
+		}
+	}
+
+	return pstrdup(loc_str);
+}
+
 #endif
 
 /*
@@ -2782,6 +2836,7 @@ icu_language_tag(const char *loc_str, int elevel)
 {
 #ifdef USE_ICU
 	UErrorCode	 status;
+	char		*fixed_loc_str = icu_fix_variants(loc_str);
 	char		 lang[ULOC_LANG_CAPACITY];
 	char		*langtag;
 	size_t		 buflen = 32;	/* arbitrary starting buffer size */
@@ -2814,7 +2869,7 @@ icu_language_tag(const char *loc_str, int elevel)
 		int32_t		len;
 
 		status = U_ZERO_ERROR;
-		len = uloc_toLanguageTag(loc_str, langtag, buflen, strict, &status);
+		len = uloc_toLanguageTag(fixed_loc_str, langtag, buflen, strict, &status);
 
 		/*
 		 * If the result fits in the buffer exactly (len == buflen),
@@ -2834,6 +2889,8 @@ icu_language_tag(const char *loc_str, int elevel)
 		break;
 	}
 
+	pfree(fixed_loc_str);
+
 	if (U_FAILURE(status))
 	{
 		pfree(langtag);
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 2c208ead01..2b5cc30955 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2229,6 +2229,64 @@ check_icu_locale_encoding(int user_enc)
 	return true;
 }
 
+#ifdef USE_ICU
+
+static const char *icu_variant_map[][2] = {
+	{ "@EURO",   "@currency=EUR" },
+	{ "@PINYIN", "@collation=pinyin" },
+	{ "@STROKE", "@collation=stroke" },
+};
+
+#define ICU_VARIANT_MAP_SIZE \
+	(sizeof(icu_variant_map)/sizeof(icu_variant_map[0]))
+
+/*
+ * ICU version 64 removed the ability to transform locale strings of the form
+ * '...@VARIANT' into proper language tags. Perform the transformation from
+ * within Postgres so that ICU supports any libc locale name consistently,
+ * regardless of the ICU version.
+ */
+static char *
+icu_fix_variants(const char *loc_str)
+{
+	const char *old_variant = strrchr(loc_str, '@');
+
+	/*
+	 * Extract a variant of the form '...@VARIANT', and replace with
+	 * the appropriate '...@keyword=value' if found in the map.
+	 */
+	if (old_variant)
+	{
+		size_t prefix_len = old_variant - loc_str; /* bytes before the '@' */
+
+		for (int i = 0; i < ICU_VARIANT_MAP_SIZE; i++)
+		{
+			const char *map_variant = icu_variant_map[i][0];
+			const char *map_replacement = icu_variant_map[i][1];
+
+			if (pg_strcasecmp(old_variant, map_variant) == 0)
+			{
+				size_t	 replacement_len = strlen(map_replacement);
+				size_t	 result_len;
+				char	*result;
+
+				result_len = prefix_len + replacement_len + 1;
+				result = pg_malloc(result_len);
+
+				memcpy(result, loc_str, prefix_len);
+				memcpy(result + prefix_len, map_replacement, replacement_len);
+				result[prefix_len + replacement_len] = '\0';
+
+				return result;
+			}
+		}
+	}
+
+	return pg_strdup(loc_str);
+}
+
+#endif
+
 /*
  * Convert to canonical BCP47 language tag. Must be consistent with
  * icu_language_tag().
@@ -2238,6 +2296,7 @@ icu_language_tag(const char *loc_str)
 {
 #ifdef USE_ICU
 	UErrorCode	 status;
+	char		*fixed_loc_str = icu_fix_variants(loc_str);
 	char		 lang[ULOC_LANG_CAPACITY];
 	char		*langtag;
 	size_t		 buflen = 32;	/* arbitrary starting buffer size */
@@ -2268,7 +2327,7 @@ icu_language_tag(const char *loc_str)
 		int32_t		len;
 
 		status = U_ZERO_ERROR;
-		len = uloc_toLanguageTag(loc_str, langtag, buflen, strict, &status);
+		len = uloc_toLanguageTag(fixed_loc_str, langtag, buflen, strict, &status);
 
 		/*
 		 * If the result fits in the buffer exactly (len == buflen),
@@ -2287,6 +2346,8 @@ icu_language_tag(const char *loc_str)
 		break;
 	}
 
+	pg_free(fixed_loc_str);
+
 	if (U_FAILURE(status))
 	{
 		pg_free(langtag);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 9c9e1e4f48..e0f11e3cd4 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1042,13 +1042,24 @@ ERROR:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 ERROR:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); -- fails
+ERROR:  could not convert locale name "@ASDF" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 WARNING:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); DROP COLLATION testx;
+WARNING:  could not convert locale name "@ASDF" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 RESET icu_validation_level;
+-- test special variants
+CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
+NOTICE:  using standard form "und-u-cu-eur" for locale "@EURO"
+CREATE COLLATION testx (provider = icu, locale = '@pinyin'); DROP COLLATION testx;
+NOTICE:  using standard form "und-u-co-pinyin" for locale "@pinyin"
+CREATE COLLATION testx (provider = icu, locale = '@stroke'); DROP COLLATION testx;
+NOTICE:  using standard form "und-u-co-stroke" for locale "@stroke"
 CREATE COLLATION test4 FROM nonsense;
 ERROR:  collation "nonsense" for encoding "UTF8" does not exist
 CREATE COLLATION test5 FROM test0;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 85e26951b6..8d5423bc17 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -378,11 +378,18 @@ RESET client_min_messages;
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); -- fails
 SET icu_validation_level = WARNING;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = '@ASDF'); DROP COLLATION testx;
 RESET icu_validation_level;
 
+-- test special variants
+CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = '@pinyin'); DROP COLLATION testx;
+CREATE COLLATION testx (provider = icu, locale = '@stroke'); DROP COLLATION testx;
+
 CREATE COLLATION test4 FROM nonsense;
 CREATE COLLATION test5 FROM test0;
 
-- 
2.34.1

From cd839f069cc09a71788bafa28730e4caf8f9d768 Mon Sep 17 00:00:00 2001
From: Jeff Davis <j...@j-davis.com>
Date: Wed, 10 May 2023 10:47:16 -0700
Subject: [PATCH v5 3/7] Reduce icu_validation_level default to WARNING.

---
 doc/src/sgml/config.sgml                       | 2 +-
 src/backend/utils/adt/pg_locale.c              | 2 +-
 src/backend/utils/misc/guc_tables.c            | 2 +-
 src/backend/utils/misc/postgresql.conf.sample  | 2 +-
 src/test/regress/expected/collate.icu.utf8.out | 4 ++--
 src/test/regress/sql/collate.icu.utf8.sql      | 4 ++--
 6 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index b56f073a91..c4a9dcb9ae 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9840,7 +9840,7 @@ SET XML OPTION { DOCUMENT | CONTENT };
        <para>
         If set to <literal>DISABLED</literal>, does not report validation
         problems at all. Otherwise reports problems at the given message
-        level. The default is <literal>ERROR</literal>.
+        level. The default is <literal>WARNING</literal>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index e7b166461b..bb4a8d84f6 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -96,7 +96,7 @@ char	   *locale_monetary;
 char	   *locale_numeric;
 char	   *locale_time;
 
-int			icu_validation_level = ERROR;
+int			icu_validation_level = WARNING;
 
 /*
  * lc_time localization cache.
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 2f42cebaf6..8c843f4ab6 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -4689,7 +4689,7 @@ struct config_enum ConfigureNamesEnum[] =
 		 NULL
 		},
 		&icu_validation_level,
-		ERROR, icu_validation_level_options,
+		WARNING, icu_validation_level_options,
 		NULL, NULL, NULL
 	},
 
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index b70c66ca87..87bad8ecbf 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -734,7 +734,7 @@
 #lc_numeric = 'C'			# locale for number formatting
 #lc_time = 'C'				# locale for time formatting
 
-#icu_validation_level = ERROR		# report ICU locale validation
+#icu_validation_level = WARNING		# report ICU locale validation
 					# errors at the given level
 
 # default configuration for text search
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index e0f11e3cd4..12afc3b65a 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1037,6 +1037,7 @@ $$;
 RESET client_min_messages;
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 ERROR:  parameter "locale" must be specified
+SET icu_validation_level = ERROR;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
 ERROR:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
@@ -1044,7 +1045,7 @@ CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=
 ERROR:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 CREATE COLLATION testx (provider = icu, locale = '@ASDF'); -- fails
 ERROR:  could not convert locale name "@ASDF" to language tag: U_ILLEGAL_ARGUMENT_ERROR
-SET icu_validation_level = WARNING;
+RESET icu_validation_level;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@colStrength=primary;nonsense=yes" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
@@ -1052,7 +1053,6 @@ WARNING:  ICU locale "nonsense-nowhere" has unknown language "nonsense"
 HINT:  To disable ICU locale validation, set parameter icu_validation_level to DISABLED.
 CREATE COLLATION testx (provider = icu, locale = '@ASDF'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@ASDF" to language tag: U_ILLEGAL_ARGUMENT_ERROR
-RESET icu_validation_level;
 -- test special variants
 CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
 NOTICE:  using standard form "und-u-cu-eur" for locale "@EURO"
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 8d5423bc17..655c965f46 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -376,14 +376,14 @@ $$;
 RESET client_min_messages;
 
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
+SET icu_validation_level = ERROR;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); -- fails
 CREATE COLLATION testx (provider = icu, locale = '@ASDF'); -- fails
-SET icu_validation_level = WARNING;
+RESET icu_validation_level;
 CREATE COLLATION testx (provider = icu, locale = '@colStrength=primary;nonsense=yes'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); DROP COLLATION testx;
 CREATE COLLATION testx (provider = icu, locale = '@ASDF'); DROP COLLATION testx;
-RESET icu_validation_level;
 
 -- test special variants
 CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
-- 
2.34.1

From a13a15988ab2e991e42569b8b1e0cd1d6e940baf Mon Sep 17 00:00:00 2001
From: Jeff Davis <j...@j-davis.com>
Date: Mon, 1 May 2023 15:38:29 -0700
Subject: [PATCH v5 4/7] Introduce collation provider "none".

Provides locale-unaware semantics that are implemented as fast byte
operations in Postgres, independent of the operating system or any
provider libraries.

Equivalent (in semantics and implementation) to the libc provider with
locale "C", except that LC_COLLATE and LC_CTYPE can be set
independently.

Use provider "none" for built-in collation "ucs_basic" instead of
libc.
---
 doc/src/sgml/charset.sgml              | 87 +++++++++++++++++++++-----
 doc/src/sgml/ref/create_collation.sgml |  2 +-
 doc/src/sgml/ref/create_database.sgml  |  2 +-
 doc/src/sgml/ref/createdb.sgml         |  2 +-
 doc/src/sgml/ref/initdb.sgml           |  2 +-
 src/backend/catalog/pg_collation.c     |  7 ++-
 src/backend/commands/collationcmds.c   | 84 ++++++++++++++++++++-----
 src/backend/commands/dbcommands.c      | 69 +++++++++++++++++---
 src/backend/utils/adt/pg_locale.c      | 27 +++++++-
 src/backend/utils/init/postinit.c      | 10 ++-
 src/bin/initdb/initdb.c                | 33 +++++++++-
 src/bin/initdb/t/001_initdb.pl         | 29 +++++++++
 src/bin/pg_dump/pg_dump.c              |  8 ++-
 src/bin/pg_upgrade/t/002_pg_upgrade.pl | 18 +++++-
 src/bin/psql/describe.c                |  2 +-
 src/bin/scripts/createdb.c             |  2 +-
 src/bin/scripts/t/020_createdb.pl      | 29 +++++++++
 src/include/catalog/pg_collation.dat   |  3 +-
 src/include/catalog/pg_collation.h     |  3 +
 src/test/regress/expected/collate.out  | 10 ++-
 src/test/regress/sql/collate.sql       |  6 ++
 21 files changed, 372 insertions(+), 63 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 6dd95b8966..de7c65ae35 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -342,22 +342,14 @@ initdb --locale=sv_SE
    <title>Locale Providers</title>
 
    <para>
-    <productname>PostgreSQL</productname> supports multiple <firstterm>locale
-    providers</firstterm>.  This specifies which library supplies the locale
-    data.  One standard provider name is <literal>libc</literal>, which uses
-    the locales provided by the operating system C library.  These are the
-    locales used by most tools provided by the operating system.  Another
-    provider is <literal>icu</literal>, which uses the external
-    ICU<indexterm><primary>ICU</primary></indexterm> library.  ICU locales can
-    only be used if support for ICU was configured when PostgreSQL was built.
+    A locale provider specifies which library defines the locale behavior for
+    collations and character classifications.
    </para>
 
    <para>
     The commands and tools that select the locale settings, as described
-    above, each have an option to select the locale provider.  The examples
-    shown earlier all use the <literal>libc</literal> provider, which is the
-    default.  Here is an example to initialize a database cluster using the
-    ICU provider:
+    above, each have an option to select the locale provider. Here is an
+    example to initialize a database cluster using the ICU provider:
 <programlisting>
 initdb --locale-provider=icu --icu-locale=en
 </programlisting>
@@ -370,12 +362,73 @@ initdb --locale-provider=icu --icu-locale=en
    </para>
 
    <para>
-    Which locale provider to use depends on individual requirements.  For most
-    basic uses, either provider will give adequate results.  For the libc
-    provider, it depends on what the operating system offers; some operating
-    systems are better than others.  For advanced uses, ICU offers more locale
-    variants and customization options.
+    Regardless of the locale provider, the operating system is still used to
+    provide some locale-aware behavior, such as messages (see <xref
+    linkend="guc-lc-messages"/>).
    </para>
+
+   <para>
+    The available locale providers are listed below.
+   </para>
+
+   <sect3 id="locale-provider-none">
+    <title>None</title>
+    <para>
+     The <literal>none</literal> provider uses simple built-in operations
+     which are not locale-aware.
+    </para>
+    <para>
+     The collation and character classification behavior is equivalent to
+     using the <literal>libc</literal> provider with locale
+     <literal>C</literal>, except that <literal>LC_COLLATE</literal> and
+     <literal>LC_CTYPE</literal> can be set independently.
+    </para>
+    <note>
+     <para>
+      When using the <literal>none</literal> locale provider, behavior may
+      depend on the database encoding.
+     </para>
+    </note>
+   </sect3>
+   <sect3 id="locale-provider-icu">
+    <title>ICU</title>
+    <para>
+     The <literal>icu</literal> provider uses the external
+     ICU<indexterm><primary>ICU</primary></indexterm>
+     library. <productname>PostgreSQL</productname> must have been configured
+     with support.
+    </para>
+    <para>
+     ICU provides collation and character classification behavior that is
+     independent of the operating system and database encoding, which is
+     preferable if you expect to transition to other platforms without any
+     change in results. <literal>LC_COLLATE</literal> and
+     <literal>LC_CTYPE</literal> can be set independently of the ICU locale.
+    </para>
+    <note>
+     <para>
+      For the ICU provider, results may depend on the version of the ICU
+      library used, as it is updated to reflect changes in natural language
+      over time.
+     </para>
+    </note>
+   </sect3>
+   <sect3 id="locale-provider-libc">
+    <title>libc</title>
+    <para>
+     The <literal>libc</literal> provider uses the operating system's C
+     library. The collation and character classification behavior is
+     controlled by the settings <literal>LC_COLLATE</literal> and
+     <literal>LC_CTYPE</literal>, so they cannot be set independently.
+    </para>
+    <note>
+     <para>
+      The same locale name may have different behavior on different platforms
+      when using the libc provider.
+     </para>
+    </note>
+   </sect3>
+
   </sect2>
 
   <sect2 id="locale-problems">
diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index f6353da5c1..5489ae7413 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -120,7 +120,7 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
      <listitem>
       <para>
        Specifies the provider to use for locale services associated with this
-       collation.  Possible values are
+       collation.  Possible values are <literal>none</literal>,
        <literal>icu</literal><indexterm><primary>ICU</primary></indexterm>
        (if the server was built with ICU support) or <literal>libc</literal>.
        <literal>libc</literal> is the default.  See <xref
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 13793bb6b7..60b9da0952 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -212,7 +212,7 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
       <listitem>
        <para>
         Specifies the provider to use for the default collation in this
-        database.  Possible values are
+        database.  Possible values are <literal>none</literal>,
         <literal>icu</literal><indexterm><primary>ICU</primary></indexterm>
         (if the server was built with ICU support) or <literal>libc</literal>.
         By default, the provider is the same as that of the <xref
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index e23419ba6c..326a371d34 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -168,7 +168,7 @@ PostgreSQL documentation
      </varlistentry>
 
      <varlistentry>
-      <term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
+      <term><option>--locale-provider={<literal>none</literal>|<literal>libc</literal>|<literal>icu</literal>}</option></term>
       <listitem>
        <para>
         Specifies the locale provider for the database's default collation.
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 87945b4b62..e604ab48b7 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -323,7 +323,7 @@ PostgreSQL documentation
      </varlistentry>
 
      <varlistentry id="app-initdb-option-locale-provider">
-      <term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term>
+      <term><option>--locale-provider={<literal>none</literal>|<literal>libc</literal>|<literal>icu</literal>}</option></term>
       <listitem>
        <para>
         This option sets the locale provider for databases created in the new
diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c
index fd022e6fc2..86b6ba2375 100644
--- a/src/backend/catalog/pg_collation.c
+++ b/src/backend/catalog/pg_collation.c
@@ -68,7 +68,12 @@ CollationCreate(const char *collname, Oid collnamespace,
 	Assert(collname);
 	Assert(collnamespace);
 	Assert(collowner);
-	Assert((collcollate && collctype) || colliculocale);
+	Assert((collprovider == COLLPROVIDER_NONE &&
+			!collcollate && !collctype && !colliculocale) ||
+		   (collprovider == COLLPROVIDER_LIBC &&
+			 collcollate &&  collctype && !colliculocale) ||
+		   (collprovider == COLLPROVIDER_ICU &&
+			!collcollate && !collctype &&  colliculocale));
 
 	/*
 	 * Make sure there is no existing collation of same name & encoding.
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index a53700256b..267a551818 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -215,7 +215,9 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 
 		if (collproviderstr)
 		{
-			if (pg_strcasecmp(collproviderstr, "icu") == 0)
+			if (pg_strcasecmp(collproviderstr, "none") == 0)
+				collprovider = COLLPROVIDER_NONE;
+			else if (pg_strcasecmp(collproviderstr, "icu") == 0)
 				collprovider = COLLPROVIDER_ICU;
 			else if (pg_strcasecmp(collproviderstr, "libc") == 0)
 				collprovider = COLLPROVIDER_LIBC;
@@ -228,6 +230,13 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		else
 			collprovider = COLLPROVIDER_LIBC;
 
+		if (collprovider == COLLPROVIDER_NONE
+			&& (localeEl || lccollateEl || lcctypeEl))
+		{
+			ereport(ERROR,
+					(errmsg("collation provider \"none\" does not support LOCALE, LC_COLLATE, or LC_CTYPE")));
+		}
+
 		if (localeEl)
 		{
 			if (collprovider == COLLPROVIDER_LIBC)
@@ -317,7 +326,15 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		 */
 		collencoding = GetDatabaseEncoding();
 
-		if (collprovider == COLLPROVIDER_ICU)
+		if (collprovider == COLLPROVIDER_NONE)
+		{
+			/*
+			 * The "none" provider works with all encodings, so no checking is
+			 * required. NB: the behavior may be different for different
+			 * encodings, though.
+			 */
+		}
+		else if (collprovider == COLLPROVIDER_ICU)
 		{
 #ifdef USE_ICU
 			/*
@@ -343,7 +360,18 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	}
 
 	if (!collversion)
-		collversion = get_collation_actual_version(collprovider, collprovider == COLLPROVIDER_ICU ? colliculocale : collcollate);
+	{
+		char *locale;
+
+		if (collprovider == COLLPROVIDER_ICU)
+			locale = colliculocale;
+		else if (collprovider == COLLPROVIDER_LIBC)
+			locale = collcollate;
+		else
+			locale = NULL; /* COLLPROVIDER_NONE */
+
+		collversion = get_collation_actual_version(collprovider, locale);
+	}
 
 	newoid = CollationCreate(collName,
 							 collNamespace,
@@ -418,6 +446,7 @@ AlterCollation(AlterCollationStmt *stmt)
 	Form_pg_collation collForm;
 	Datum		datum;
 	bool		isnull;
+	char	   *locale;
 	char	   *oldversion;
 	char	   *newversion;
 	ObjectAddress address;
@@ -442,8 +471,20 @@ AlterCollation(AlterCollationStmt *stmt)
 	datum = SysCacheGetAttr(COLLOID, tup, Anum_pg_collation_collversion, &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(datum);
 
-	datum = SysCacheGetAttrNotNull(COLLOID, tup, collForm->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate);
-	newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum));
+	if (collForm->collprovider == COLLPROVIDER_ICU)
+	{
+		datum = SysCacheGetAttrNotNull(COLLOID, tup, Anum_pg_collation_colliculocale);
+		locale = TextDatumGetCString(datum);
+	}
+	else if (collForm->collprovider == COLLPROVIDER_LIBC)
+	{
+		datum = SysCacheGetAttrNotNull(COLLOID, tup, Anum_pg_collation_collcollate);
+		locale = TextDatumGetCString(datum);
+	}
+	else
+		locale = NULL; /* COLLPROVIDER_NONE */
+
+	newversion = get_collation_actual_version(collForm->collprovider, locale);
 
 	/* cannot change from NULL to non-NULL or vice versa */
 	if ((!oldversion && newversion) || (oldversion && !newversion))
@@ -506,11 +547,18 @@ pg_collation_actual_version(PG_FUNCTION_ARGS)
 
 		provider = ((Form_pg_database) GETSTRUCT(dbtup))->datlocprovider;
 
-		datum = SysCacheGetAttrNotNull(DATABASEOID, dbtup,
-									   provider == COLLPROVIDER_ICU ?
-									   Anum_pg_database_daticulocale : Anum_pg_database_datcollate);
-
-		locale = TextDatumGetCString(datum);
+		if (provider == COLLPROVIDER_ICU)
+		{
+			datum = SysCacheGetAttrNotNull(DATABASEOID, dbtup, Anum_pg_database_daticulocale);
+			locale = TextDatumGetCString(datum);
+		}
+		else if (provider == COLLPROVIDER_LIBC)
+		{
+			datum = SysCacheGetAttrNotNull(DATABASEOID, dbtup, Anum_pg_database_datcollate);
+			locale = TextDatumGetCString(datum);
+		}
+		else
+			locale = NULL; /* COLLPROVIDER_NONE */
 
 		ReleaseSysCache(dbtup);
 	}
@@ -526,11 +574,19 @@ pg_collation_actual_version(PG_FUNCTION_ARGS)
 
 		provider = ((Form_pg_collation) GETSTRUCT(colltp))->collprovider;
 		Assert(provider != COLLPROVIDER_DEFAULT);
-		datum = SysCacheGetAttrNotNull(COLLOID, colltp,
-									   provider == COLLPROVIDER_ICU ?
-									   Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate);
 
-		locale = TextDatumGetCString(datum);
+		if (provider == COLLPROVIDER_ICU)
+		{
+			datum = SysCacheGetAttrNotNull(COLLOID, colltp, Anum_pg_collation_colliculocale);
+			locale = TextDatumGetCString(datum);
+		}
+		else if (provider == COLLPROVIDER_LIBC)
+		{
+			datum = SysCacheGetAttrNotNull(COLLOID, colltp, Anum_pg_collation_collcollate);
+			locale = TextDatumGetCString(datum);
+		}
+		else
+			locale = NULL; /* COLLPROVIDER_NONE */
 
 		ReleaseSysCache(colltp);
 	}
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 2e242eeff2..9e73f54803 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -909,7 +909,9 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	{
 		char	   *locproviderstr = defGetString(dlocprovider);
 
-		if (pg_strcasecmp(locproviderstr, "icu") == 0)
+		if (pg_strcasecmp(locproviderstr, "none") == 0)
+			dblocprovider = COLLPROVIDER_NONE;
+		else if (pg_strcasecmp(locproviderstr, "icu") == 0)
 			dblocprovider = COLLPROVIDER_ICU;
 		else if (pg_strcasecmp(locproviderstr, "libc") == 0)
 			dblocprovider = COLLPROVIDER_LIBC;
@@ -1177,9 +1179,17 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	 */
 	if (src_collversion && !dcollversion)
 	{
-		char	   *actual_versionstr;
+		char	*actual_versionstr;
+		char	*locale;
 
-		actual_versionstr = get_collation_actual_version(dblocprovider, dblocprovider == COLLPROVIDER_ICU ? dbiculocale : dbcollate);
+		if (dblocprovider == COLLPROVIDER_ICU)
+			locale = dbiculocale;
+		else if (dblocprovider == COLLPROVIDER_LIBC)
+			locale = dbcollate;
+		else
+			locale = NULL; /* COLLPROVIDER_NONE */
+
+		actual_versionstr = get_collation_actual_version(dblocprovider, locale);
 		if (!actual_versionstr)
 			ereport(ERROR,
 					(errmsg("template database \"%s\" has a collation version, but no actual collation version could be determined",
@@ -1207,7 +1217,18 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	 * collation version, which is normally only the case for template0.
 	 */
 	if (dbcollversion == NULL)
-		dbcollversion = get_collation_actual_version(dblocprovider, dblocprovider == COLLPROVIDER_ICU ? dbiculocale : dbcollate);
+	{
+		char *locale;
+
+		if (dblocprovider == COLLPROVIDER_ICU)
+			locale = dbiculocale;
+		else if (dblocprovider == COLLPROVIDER_LIBC)
+			locale = dbcollate;
+		else
+			locale = NULL; /* COLLPROVIDER_NONE */
+
+		dbcollversion = get_collation_actual_version(dblocprovider, locale);
+	}
 
 	/* Resolve default tablespace for new database */
 	if (dtablespacename && dtablespacename->arg)
@@ -2403,6 +2424,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 	ObjectAddress address;
 	Datum		datum;
 	bool		isnull;
+	char	   *locale;
 	char	   *oldversion;
 	char	   *newversion;
 
@@ -2429,10 +2451,24 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt)
 	datum = heap_getattr(tuple, Anum_pg_database_datcollversion, RelationGetDescr(rel), &isnull);
 	oldversion = isnull ? NULL : TextDatumGetCString(datum);
 
-	datum = heap_getattr(tuple, datForm->datlocprovider == COLLPROVIDER_ICU ? Anum_pg_database_daticulocale : Anum_pg_database_datcollate, RelationGetDescr(rel), &isnull);
-	if (isnull)
-		elog(ERROR, "unexpected null in pg_database");
-	newversion = get_collation_actual_version(datForm->datlocprovider, TextDatumGetCString(datum));
+	if (datForm->datlocprovider == COLLPROVIDER_ICU)
+	{
+		datum = heap_getattr(tuple, Anum_pg_database_daticulocale, RelationGetDescr(rel), &isnull);
+		if (isnull)
+			elog(ERROR, "unexpected null in pg_database");
+		locale = TextDatumGetCString(datum);
+	}
+	else if (datForm->datlocprovider == COLLPROVIDER_LIBC)
+	{
+		datum = heap_getattr(tuple, Anum_pg_database_datcollate, RelationGetDescr(rel), &isnull);
+		if (isnull)
+			elog(ERROR, "unexpected null in pg_database");
+		locale = TextDatumGetCString(datum);
+	}
+	else
+		locale = NULL; /* COLLPROVIDER_NONE */
+
+	newversion = get_collation_actual_version(datForm->datlocprovider, locale);
 
 	/* cannot change from NULL to non-NULL or vice versa */
 	if ((!oldversion && newversion) || (oldversion && !newversion))
@@ -2617,6 +2653,7 @@ pg_database_collation_actual_version(PG_FUNCTION_ARGS)
 	HeapTuple	tp;
 	char		datlocprovider;
 	Datum		datum;
+	char	   *locale;
 	char	   *version;
 
 	tp = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(dbid));
@@ -2627,8 +2664,20 @@ pg_database_collation_actual_version(PG_FUNCTION_ARGS)
 
 	datlocprovider = ((Form_pg_database) GETSTRUCT(tp))->datlocprovider;
 
-	datum = SysCacheGetAttrNotNull(DATABASEOID, tp, datlocprovider == COLLPROVIDER_ICU ? Anum_pg_database_daticulocale : Anum_pg_database_datcollate);
-	version = get_collation_actual_version(datlocprovider, TextDatumGetCString(datum));
+	if (datlocprovider == COLLPROVIDER_ICU)
+	{
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp, Anum_pg_database_daticulocale);
+		locale = TextDatumGetCString(datum);
+	}
+	else if (datlocprovider == COLLPROVIDER_LIBC)
+	{
+		datum = SysCacheGetAttrNotNull(DATABASEOID, tp, Anum_pg_database_datcollate);
+		locale = TextDatumGetCString(datum);
+	}
+	else
+		locale = NULL; /* COLLPROVIDER_NONE */
+
+	version = get_collation_actual_version(datlocprovider, locale);
 
 	ReleaseSysCache(tp);
 
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index bb4a8d84f6..5ac5036f05 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1228,7 +1228,12 @@ lookup_collation_cache(Oid collation, bool set_flags)
 			elog(ERROR, "cache lookup failed for collation %u", collation);
 		collform = (Form_pg_collation) GETSTRUCT(tp);
 
-		if (collform->collprovider == COLLPROVIDER_LIBC)
+		if (collform->collprovider == COLLPROVIDER_NONE)
+		{
+			cache_entry->collate_is_c = true;
+			cache_entry->ctype_is_c = true;
+		}
+		else if (collform->collprovider == COLLPROVIDER_LIBC)
 		{
 			Datum		datum;
 			const char *collcollate;
@@ -1281,6 +1286,9 @@ lc_collate_is_c(Oid collation)
 		static int	result = -1;
 		char	   *localeptr;
 
+		if (default_locale.provider == COLLPROVIDER_NONE)
+			return true;
+
 		if (default_locale.provider == COLLPROVIDER_ICU)
 			return false;
 
@@ -1334,6 +1342,9 @@ lc_ctype_is_c(Oid collation)
 		static int	result = -1;
 		char	   *localeptr;
 
+		if (default_locale.provider == COLLPROVIDER_NONE)
+			return true;
+
 		if (default_locale.provider == COLLPROVIDER_ICU)
 			return false;
 
@@ -1487,8 +1498,10 @@ pg_newlocale_from_collation(Oid collid)
 	{
 		if (default_locale.provider == COLLPROVIDER_ICU)
 			return &default_locale;
-		else
+		else if (default_locale.provider == COLLPROVIDER_LIBC)
 			return (pg_locale_t) 0;
+		else
+			elog(ERROR, "cannot open collation with provider \"none\"");
 	}
 
 	cache_entry = lookup_collation_cache(collid, false);
@@ -1513,7 +1526,11 @@ pg_newlocale_from_collation(Oid collid)
 		result.provider = collform->collprovider;
 		result.deterministic = collform->collisdeterministic;
 
-		if (collform->collprovider == COLLPROVIDER_LIBC)
+		if (collform->collprovider == COLLPROVIDER_NONE)
+		{
+			elog(ERROR, "cannot open collation with provider \"none\"");
+		}
+		else if (collform->collprovider == COLLPROVIDER_LIBC)
 		{
 #ifdef HAVE_LOCALE_T
 			const char *collcollate;
@@ -1599,6 +1616,7 @@ pg_newlocale_from_collation(Oid collid)
 
 			collversionstr = TextDatumGetCString(datum);
 
+			Assert(collform->collprovider != COLLPROVIDER_NONE);
 			datum = SysCacheGetAttrNotNull(COLLOID, tp, collform->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate);
 
 			actual_versionstr = get_collation_actual_version(collform->collprovider,
@@ -1650,6 +1668,9 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 {
 	char	   *collversion = NULL;
 
+	if (collprovider == COLLPROVIDER_NONE)
+		return NULL;
+
 #ifdef USE_ICU
 	if (collprovider == COLLPROVIDER_ICU)
 	{
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 53420f4974..8053642fd3 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -461,10 +461,18 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 	{
 		char	   *actual_versionstr;
 		char	   *collversionstr;
+		char	   *locale;
 
 		collversionstr = TextDatumGetCString(datum);
 
-		actual_versionstr = get_collation_actual_version(dbform->datlocprovider, dbform->datlocprovider == COLLPROVIDER_ICU ? iculocale : collate);
+		if (dbform->datlocprovider == COLLPROVIDER_ICU)
+			locale = iculocale;
+		else if (dbform->datlocprovider == COLLPROVIDER_LIBC)
+			locale = collate;
+		else
+			locale = NULL; /* COLLPROVIDER_NONE */
+
+		actual_versionstr = get_collation_actual_version(dbform->datlocprovider, locale);
 		if (!actual_versionstr)
 			/* should not happen */
 			elog(WARNING,
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 2b5cc30955..4cf6892bee 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2469,6 +2469,22 @@ setlocales(void)
 
 	/* set empty lc_* values to locale config if set */
 
+	if (locale_provider == COLLPROVIDER_NONE)
+	{
+		if (!lc_ctype)
+			lc_ctype = "C";
+		if (!lc_collate)
+			lc_collate = "C";
+		if (!lc_numeric)
+			lc_numeric = "C";
+		if (!lc_time)
+			lc_time = "C";
+		if (!lc_monetary)
+			lc_monetary = "C";
+		if (!lc_messages)
+			lc_messages = "C";
+	}
+
 	if (locale)
 	{
 		if (!lc_ctype)
@@ -2563,7 +2579,7 @@ usage(const char *progname)
 			 "                            set default locale in the respective category for\n"
 			 "                            new databases (default taken from environment)\n"));
 	printf(_("      --no-locale           equivalent to --locale=C\n"));
-	printf(_("      --locale-provider={libc|icu}\n"
+	printf(_("      --locale-provider={none|libc|icu}\n"
 			 "                            set default locale provider for new databases\n"));
 	printf(_("      --pwfile=FILE         read password for the new superuser from file\n"));
 	printf(_("  -T, --text-search-config=CFG\n"
@@ -2713,7 +2729,15 @@ setup_locale_encoding(void)
 {
 	setlocales();
 
-	if (locale_provider == COLLPROVIDER_LIBC &&
+	if (locale_provider == COLLPROVIDER_NONE &&
+		strcmp(lc_ctype, "C") == 0 &&
+		strcmp(lc_collate, "C") == 0 &&
+		strcmp(lc_time, "C") == 0 &&
+		strcmp(lc_numeric, "C") == 0 &&
+		strcmp(lc_monetary, "C") == 0 &&
+		strcmp(lc_messages, "C") == 0)
+		printf(_("The database cluster will be initialized with no locale.\n"));
+	else if (locale_provider == COLLPROVIDER_LIBC &&
 		strcmp(lc_ctype, lc_collate) == 0 &&
 		strcmp(lc_ctype, lc_time) == 0 &&
 		strcmp(lc_ctype, lc_numeric) == 0 &&
@@ -3387,7 +3411,9 @@ main(int argc, char *argv[])
 										 "-c debug_discard_caches=1");
 				break;
 			case 15:
-				if (strcmp(optarg, "icu") == 0)
+				if (strcmp(optarg, "none") == 0)
+					locale_provider = COLLPROVIDER_NONE;
+				else if (strcmp(optarg, "icu") == 0)
 					locale_provider = COLLPROVIDER_ICU;
 				else if (strcmp(optarg, "libc") == 0)
 					locale_provider = COLLPROVIDER_LIBC;
@@ -3426,6 +3452,7 @@ main(int argc, char *argv[])
 		exit(1);
 	}
 
+
 	if (icu_locale && locale_provider != COLLPROVIDER_ICU)
 		pg_fatal("%s cannot be specified unless locale provider \"%s\" is chosen",
 				 "--icu-locale", "icu");
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index 17a444d80c..fe6d224e5b 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -154,6 +154,35 @@ else
 		'locale provider ICU fails since no ICU support');
 }
 
+command_ok(
+	[ 'initdb', '--no-sync', '--locale-provider=none', "$tempdir/data6" ],
+	'locale provider none');
+
+command_ok(
+	[ 'initdb', '--no-sync', '--locale-provider=none', '--locale=C',
+	  "$tempdir/data7" ],
+	'locale provider none with --locale');
+
+command_ok(
+	[ 'initdb', '--no-sync', '--locale-provider=none', '--lc-collate=C',
+	  "$tempdir/data8" ],
+	'locale provider none with --lc-collate');
+
+command_ok(
+	[ 'initdb', '--no-sync', '--locale-provider=none', '--lc-ctype=C',
+	  "$tempdir/data9" ],
+	'locale provider none with --lc-ctype');
+
+command_fails(
+	[ 'initdb', '--no-sync', '--locale-provider=none', '--icu-locale=en',
+	  "$tempdir/dataX" ],
+	'fails for locale provider none with ICU locale');
+
+command_fails(
+	[ 'initdb', '--no-sync', '--locale-provider=none', '--icu-rules=""',
+	  "$tempdir/dataX" ],
+	'fails for locale provider none with ICU rules');
+
 command_fails(
 	[ 'initdb', '--no-sync', '--locale-provider=xyz', "$tempdir/dataX" ],
 	'fails for invalid locale provider');
diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c
index 41a51ec5cd..be6580ab3c 100644
--- a/src/bin/pg_dump/pg_dump.c
+++ b/src/bin/pg_dump/pg_dump.c
@@ -3070,7 +3070,9 @@ dumpDatabase(Archive *fout)
 	}
 
 	appendPQExpBufferStr(creaQry, " LOCALE_PROVIDER = ");
-	if (datlocprovider[0] == 'c')
+	if (datlocprovider[0] == 'n')
+		appendPQExpBufferStr(creaQry, "none");
+	else if (datlocprovider[0] == 'c')
 		appendPQExpBufferStr(creaQry, "libc");
 	else if (datlocprovider[0] == 'i')
 		appendPQExpBufferStr(creaQry, "icu");
@@ -13446,7 +13448,9 @@ dumpCollation(Archive *fout, const CollInfo *collinfo)
 					  fmtQualifiedDumpable(collinfo));
 
 	appendPQExpBufferStr(q, "provider = ");
-	if (collprovider[0] == 'c')
+	if (collprovider[0] == 'n')
+		appendPQExpBufferStr(q, "none");
+	else if (collprovider[0] == 'c')
 		appendPQExpBufferStr(q, "libc");
 	else if (collprovider[0] == 'i')
 		appendPQExpBufferStr(q, "icu");
diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
index 4a7895a756..6d58f6103e 100644
--- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl
+++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl
@@ -114,12 +114,20 @@ my $original_locale = "C";
 my $original_iculocale = "";
 my $provider_field = "'c' AS datlocprovider";
 my $iculocale_field = "NULL AS daticulocale";
-if ($oldnode->pg_version >= 15 && $ENV{with_icu} eq 'yes')
+if ($oldnode->pg_version >= 15)
 {
 	$provider_field = "datlocprovider";
 	$iculocale_field = "daticulocale";
-	$original_provider = "i";
-	$original_iculocale = "fr-CA";
+
+	if ($ENV{with_icu} eq 'yes')
+	{
+		$original_provider = "i";
+		$original_iculocale = "fr-CA";
+	}
+	else
+	{
+		$original_provider = "n";
+	}
 }
 
 my @initdb_params = @custom_opts;
@@ -131,6 +139,10 @@ if ($original_provider eq "i")
 	push @initdb_params, ('--locale-provider', 'icu');
 	push @initdb_params, ('--icu-locale', 'fr-CA');
 }
+elsif ($original_provider eq "n")
+{
+	push @initdb_params, ('--locale-provider', 'none');
+}
 
 $node_params{extra} = \@initdb_params;
 $oldnode->init(%node_params);
diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c
index 058e41e749..16e726b784 100644
--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -932,7 +932,7 @@ listAllDbs(const char *pattern, bool verbose)
 					  gettext_noop("Encoding"));
 	if (pset.sversion >= 150000)
 		appendPQExpBuffer(&buf,
-						  "  CASE d.datlocprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
+						  "  CASE d.datlocprovider WHEN 'n' THEN 'none' WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n",
 						  gettext_noop("Locale Provider"));
 	else
 		appendPQExpBuffer(&buf,
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index b4205c4fa5..79367d933b 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -299,7 +299,7 @@ help(const char *progname)
 	printf(_("      --lc-ctype=LOCALE        LC_CTYPE setting for the database\n"));
 	printf(_("      --icu-locale=LOCALE      ICU locale setting for the database\n"));
 	printf(_("      --icu-rules=RULES        ICU rules setting for the database\n"));
-	printf(_("      --locale-provider={libc|icu}\n"
+	printf(_("      --locale-provider={none|libc|icu}\n"
 			 "                               locale provider for the database's default collation\n"));
 	printf(_("  -O, --owner=OWNER            database user to own the new database\n"));
 	printf(_("  -S, --strategy=STRATEGY      database creation strategy wal_log or file_copy\n"));
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index af3b1492e3..5aa658b671 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -83,6 +83,35 @@ else
 		'create database with ICU fails since no ICU support');
 }
 
+$node->command_ok(
+	[ 'createdb', '-T', 'template0', '--locale-provider=none', 'testnone1' ],
+	'create database with provider "none"');
+
+$node->command_ok(
+	[ 'createdb', '-T', 'template0', '--locale-provider=none', '--locale=C',
+	  'testnone2' ],
+	'create database with provider "none" and locale "C"');
+
+$node->command_ok(
+	[ 'createdb', '-T', 'template0', '--locale-provider=none', '--lc-collate=C',
+	  'testnone3' ],
+	'create database with provider "none" and LC_COLLATE=C');
+
+$node->command_ok(
+	[ 'createdb', '-T', 'template0', '--locale-provider=none', '--lc-ctype=C',
+	  'testnone4' ],
+	'create database with provider "none" and LC_CTYPE=C');
+
+$node->command_fails(
+	[ 'createdb', '-T', 'template0', '--locale-provider=none', '--icu-locale=en',
+	  'testnone5' ],
+	'create database with provider "none" and ICU_LOCALE="en"');
+
+$node->command_fails(
+	[ 'createdb', '-T', 'template0', '--locale-provider=none', '--icu-rules=""',
+	  'testnone6' ],
+	'create database with provider "none" and ICU_RULES=""');
+
 $node->command_fails([ 'createdb', 'foobar1' ],
 	'fails if database already exists');
 
diff --git a/src/include/catalog/pg_collation.dat b/src/include/catalog/pg_collation.dat
index b6a69d1d42..40d62416ea 100644
--- a/src/include/catalog/pg_collation.dat
+++ b/src/include/catalog/pg_collation.dat
@@ -24,8 +24,7 @@
   collname => 'POSIX', collprovider => 'c', collencoding => '-1',
   collcollate => 'POSIX', collctype => 'POSIX' },
 { oid => '962', descr => 'sorts by Unicode code point',
-  collname => 'ucs_basic', collprovider => 'c', collencoding => '6',
-  collcollate => 'C', collctype => 'C' },
+  collname => 'ucs_basic', collprovider => 'n', collencoding => '6' },
 { oid => '963',
   descr => 'sorts using the Unicode Collation Algorithm with default settings',
   collname => 'unicode', collprovider => 'i', collencoding => '-1',
diff --git a/src/include/catalog/pg_collation.h b/src/include/catalog/pg_collation.h
index bfa3568451..29be3f8d94 100644
--- a/src/include/catalog/pg_collation.h
+++ b/src/include/catalog/pg_collation.h
@@ -64,6 +64,7 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_collation_oid_index, 3085, CollationOidIndexId, on
 
 #ifdef EXPOSE_TO_CLIENT_CODE
 
+#define COLLPROVIDER_NONE		'n'
 #define COLLPROVIDER_DEFAULT	'd'
 #define COLLPROVIDER_ICU		'i'
 #define COLLPROVIDER_LIBC		'c'
@@ -73,6 +74,8 @@ collprovider_name(char c)
 {
 	switch (c)
 	{
+		case COLLPROVIDER_NONE:
+			return "none";
 		case COLLPROVIDER_ICU:
 			return "icu";
 		case COLLPROVIDER_LIBC:
diff --git a/src/test/regress/expected/collate.out b/src/test/regress/expected/collate.out
index 0649564485..b7603c9f6c 100644
--- a/src/test/regress/expected/collate.out
+++ b/src/test/regress/expected/collate.out
@@ -650,6 +650,13 @@ EXPLAIN (COSTS OFF)
 (3 rows)
 
 -- CREATE/DROP COLLATION
+CREATE COLLATION none ( PROVIDER = none );
+CREATE COLLATION none2 ( PROVIDER = none, LOCALE="POSIX" ); -- fails
+ERROR:  collation provider "none" does not support LOCALE, LC_COLLATE, or LC_CTYPE
+CREATE COLLATION none2 ( PROVIDER = none, LC_CTYPE="POSIX" ); -- fails
+ERROR:  collation provider "none" does not support LOCALE, LC_COLLATE, or LC_CTYPE
+CREATE COLLATION none2 ( PROVIDER = none, LC_COLLATE="POSIX" ); -- fails
+ERROR:  collation provider "none" does not support LOCALE, LC_COLLATE, or LC_CTYPE
 CREATE COLLATION mycoll1 FROM "C";
 CREATE COLLATION mycoll2 ( LC_COLLATE = "POSIX", LC_CTYPE = "POSIX" );
 CREATE COLLATION mycoll3 FROM "default";  -- intentionally unsupported
@@ -754,7 +761,7 @@ DETAIL:  FROM cannot be specified together with any other options.
 -- must get rid of them.
 --
 DROP SCHEMA collate_tests CASCADE;
-NOTICE:  drop cascades to 19 other objects
+NOTICE:  drop cascades to 20 other objects
 DETAIL:  drop cascades to table collate_test1
 drop cascades to table collate_test_like
 drop cascades to table collate_test2
@@ -771,6 +778,7 @@ drop cascades to function dup(anyelement)
 drop cascades to table collate_test20
 drop cascades to table collate_test21
 drop cascades to table collate_test22
+drop cascades to collation "none"
 drop cascades to collation mycoll2
 drop cascades to table collate_test23
 drop cascades to view collate_on_int
diff --git a/src/test/regress/sql/collate.sql b/src/test/regress/sql/collate.sql
index c3d40fc195..e2dceb8dff 100644
--- a/src/test/regress/sql/collate.sql
+++ b/src/test/regress/sql/collate.sql
@@ -244,6 +244,12 @@ EXPLAIN (COSTS OFF)
 
 -- CREATE/DROP COLLATION
 
+CREATE COLLATION none ( PROVIDER = none );
+
+CREATE COLLATION none2 ( PROVIDER = none, LOCALE="POSIX" ); -- fails
+CREATE COLLATION none2 ( PROVIDER = none, LC_CTYPE="POSIX" ); -- fails
+CREATE COLLATION none2 ( PROVIDER = none, LC_COLLATE="POSIX" ); -- fails
+
 CREATE COLLATION mycoll1 FROM "C";
 CREATE COLLATION mycoll2 ( LC_COLLATE = "POSIX", LC_CTYPE = "POSIX" );
 CREATE COLLATION mycoll3 FROM "default";  -- intentionally unsupported
-- 
2.34.1

From 23e85920dbcfd1d3e71041f92c4adea589acd4f2 Mon Sep 17 00:00:00 2001
From: Jeff Davis <j...@j-davis.com>
Date: Mon, 8 May 2023 13:48:01 -0700
Subject: [PATCH v5 5/7] ICU: for locale "C", automatically use "none" provider
 instead.

Postgres expects locale C to be optimizable to simple locale-unaware
byte operations; while ICU does not recognize the locale "C" at all,
and falls back to the root locale.

If the user specifies locale "C" when creating a new collation or a
new database with the ICU provider, automatically switch it to the
"none" provider.

If provider is libc, behavior is unchanged.
---
 doc/src/sgml/charset.sgml                     |  6 +++
 doc/src/sgml/ref/create_collation.sgml        |  6 +++
 doc/src/sgml/ref/create_database.sgml         |  5 +++
 doc/src/sgml/ref/createdb.sgml                |  5 +++
 doc/src/sgml/ref/initdb.sgml                  |  5 +++
 src/backend/commands/collationcmds.c          | 17 ++++++++
 src/backend/commands/dbcommands.c             | 21 ++++++++++
 src/bin/initdb/initdb.c                       | 10 +++++
 src/bin/initdb/t/001_initdb.pl                | 39 +++++++++++++++++++
 src/bin/scripts/createdb.c                    | 11 ++++++
 src/bin/scripts/t/020_createdb.pl             | 12 ++++++
 .../regress/expected/collate.icu.utf8.out     | 12 ++++--
 src/test/regress/sql/collate.icu.utf8.sql     |  3 ++
 13 files changed, 149 insertions(+), 3 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index de7c65ae35..5c4f713e8b 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -405,6 +405,12 @@ initdb --locale-provider=icu --icu-locale=en
      change in results. <literal>LC_COLLATE</literal> and
      <literal>LC_CTYPE</literal> can be set independently of the ICU locale.
     </para>
+    <para>
+     The ICU provider does not accept the <literal>C</literal>
+     locale. Commands that create collations or database with the
+     <literal>icu</literal> provider and ICU locale <literal>C</literal> use
+     the provider <literal>none</literal> instead.
+    </para>
     <note>
      <para>
       For the ICU provider, results may depend on the version of the ICU
diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml
index 5489ae7413..1ac41831d8 100644
--- a/doc/src/sgml/ref/create_collation.sgml
+++ b/doc/src/sgml/ref/create_collation.sgml
@@ -126,6 +126,12 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace
        <literal>libc</literal> is the default.  See <xref
        linkend="locale-providers"/> for details.
       </para>
+      <para>
+       If the provider is <literal>icu</literal> and the locale is
+       <literal>C</literal> or <literal>POSIX</literal>, the provider is
+       automatically set to <literal>none</literal>; as the ICU provider
+       doesn't support an ICU locale of <literal>C</literal>.
+      </para>
      </listitem>
     </varlistentry>
 
diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index 60b9da0952..c730d02e15 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -190,6 +190,11 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
        <para>
         Specifies the ICU locale ID if the ICU locale provider is used.
        </para>
+       <para>
+        If specified as <literal>C</literal> or <literal>POSIX</literal>, the
+        provider is automatically set to <literal>none</literal>, as the ICU
+        provider doesn't support an ICU locale of <literal>C</literal>.
+       </para>
       </listitem>
      </varlistentry>
 
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index 326a371d34..7c573e848a 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -154,6 +154,11 @@ PostgreSQL documentation
         Specifies the ICU locale ID to be used in this database, if the
         ICU locale provider is selected.
        </para>
+       <para>
+        If specified as <literal>C</literal> or <literal>POSIX</literal>, the
+        provider is automatically set to <literal>none</literal>, as the ICU
+        provider doesn't support an ICU locale of <literal>C</literal>.
+       </para>
       </listitem>
      </varlistentry>
 
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index e604ab48b7..76993acdfe 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -250,6 +250,11 @@ PostgreSQL documentation
         Specifies the ICU locale when the ICU provider is used. Locale support
         is described in <xref linkend="locale"/>.
        </para>
+       <para>
+        If specified as <literal>C</literal> or <literal>POSIX</literal>, the
+        provider is automatically set to <literal>none</literal>, as the ICU
+        provider doesn't support an ICU locale of <literal>C</literal>.
+       </para>
        <para>
         If this option is not specified, the locale is inherited from the
         environment in which <command>initdb</command> runs. The environment's
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 267a551818..ed64e17504 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -254,6 +254,23 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 		if (lcctypeEl)
 			collctype = defGetString(lcctypeEl);
 
+		/*
+		 * Postgres defines the "C" (and equivalently, "POSIX") locales to be
+		 * optimizable to byte operations (memcmp(), pg_ascii_tolower(),
+		 * etc.); transform into the "none" provider. Don't transform during
+		 * binary upgrade.
+		 */
+		if (!IsBinaryUpgrade && collprovider == COLLPROVIDER_ICU &&
+			colliculocale && (pg_strcasecmp(colliculocale, "C") == 0 ||
+							  pg_strcasecmp(colliculocale, "POSIX") == 0))
+		{
+			ereport(NOTICE,
+					(errmsg("using locale provider \"none\" for ICU locale \"%s\"",
+							colliculocale)));
+			colliculocale = NULL;
+			collprovider = COLLPROVIDER_NONE;
+		}
+
 		if (collprovider == COLLPROVIDER_LIBC)
 		{
 			if (!collcollate)
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 9e73f54803..6dc737aebb 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1043,6 +1043,27 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 
 	check_encoding_locale_matches(encoding, dbcollate, dbctype);
 
+	/*
+	 * Postgres defines the "C" (and equivalently, "POSIX") locales to be
+	 * optimizable to byte operations (memcmp(), pg_ascii_tolower(), etc.);
+	 * transform into the "none" provider.
+	 *
+	 * Don't transform during binary upgrade or when both the provider and ICU
+	 * locale are unchanged from the template.
+	 */
+	if (!IsBinaryUpgrade && dblocprovider == COLLPROVIDER_ICU &&
+		(src_locprovider != COLLPROVIDER_ICU ||
+		 strcmp(dbiculocale, src_iculocale) != 0) &&
+		dbiculocale && (pg_strcasecmp(dbiculocale, "C") == 0 ||
+						pg_strcasecmp(dbiculocale, "POSIX") == 0))
+	{
+		ereport(NOTICE,
+				(errmsg("using locale provider \"none\" for ICU locale \"%s\"",
+						dbiculocale)));
+		dbiculocale = NULL;
+		dblocprovider = COLLPROVIDER_NONE;
+	}
+
 	if (dblocprovider == COLLPROVIDER_ICU)
 	{
 		if (!(is_encoding_supported_by_icu(encoding)))
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 4cf6892bee..ea26bf8361 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2501,6 +2501,16 @@ setlocales(void)
 			lc_messages = locale;
 	}
 
+	if (icu_locale && locale_provider == COLLPROVIDER_ICU &&
+		(pg_strcasecmp(icu_locale, "C") == 0 ||
+		 pg_strcasecmp(icu_locale, "POSIX") == 0))
+	{
+		pg_log_info("using locale provider \"none\" for ICU locale \"%s\"",
+					 icu_locale);
+		icu_locale = NULL;
+		locale_provider = COLLPROVIDER_NONE;
+	}
+
 	/*
 	 * canonicalize locale names, and obtain any missing values from our
 	 * current environment
diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl
index fe6d224e5b..ea92b08511 100644
--- a/src/bin/initdb/t/001_initdb.pl
+++ b/src/bin/initdb/t/001_initdb.pl
@@ -111,6 +111,45 @@ if ($ENV{with_icu} eq 'yes')
 		],
 		'option --icu-locale');
 
+	# transformed to provider=none
+	command_ok(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--icu-locale=C',
+			"$tempdir/data4a"
+		],
+		'option --icu-locale=C');
+
+	# transformed to provider=none
+	command_ok(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--icu-locale=C',
+			'--locale=C',
+			"$tempdir/data4b"
+		],
+		'option --icu-locale=C --locale=C');
+
+	# transformed to provider=none
+	command_ok(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--icu-locale=C',
+			'--lc-collate=C',
+			"$tempdir/data4c"
+		],
+		'option --icu-locale=C --lc-collate=C');
+
+	# transformed to provider=none
+	command_ok(
+		[
+			'initdb',                '--no-sync',
+			'--locale-provider=icu', '--icu-locale=C',
+			'--lc-ctype=C',
+			"$tempdir/data4d"
+		],
+		'option --icu-locale=C --lc-ctype=C');
+
 	command_fails_like(
 		[
 			'initdb',                '--no-sync',
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index 79367d933b..9caf9190cf 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -172,6 +172,17 @@ main(int argc, char *argv[])
 			lc_collate = locale;
 	}
 
+	if (locale_provider && pg_strcasecmp(locale_provider, "icu") == 0 &&
+		icu_locale &&
+		(pg_strcasecmp(icu_locale, "C") == 0 ||
+		 pg_strcasecmp(icu_locale, "POSIX") == 0))
+	{
+		pg_log_info("using locale provider \"none\" for ICU locale \"%s\"",
+					 icu_locale);
+		icu_locale = NULL;
+		locale_provider = "none";
+	}
+
 	if (encoding)
 	{
 		if (pg_char_to_encoding(encoding) < 0)
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index 5aa658b671..eb3682f0fd 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -75,6 +75,18 @@ if ($ENV{with_icu} eq 'yes')
 	$node2->command_ok(
 		[ 'createdb', '-T', 'template0', '--icu-locale', 'en-US', 'foobar56' ],
 		'create database with icu locale from template database with icu provider');
+
+	# transformed into provider "none"
+	$node->command_ok(
+		[ 'createdb', '-T', 'template0', '--locale-provider=icu', '--icu-locale=C',
+		  'test_none_icu1' ],
+		'create database with provider "icu" and ICU_LOCALE="C"');
+
+	# transformed into provider "none"
+	$node->command_ok(
+		[ 'createdb', '-T', 'template0', '--locale-provider=icu', '--icu-locale=C',
+		  '--lc-ctype=C', 'test_none_icu_2' ],
+		'create database with provider "icu" and ICU_LOCALE="C" and LC_CTYPE=C');
 }
 else
 {
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 12afc3b65a..c0437231ad 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1035,6 +1035,9 @@ BEGIN
 END
 $$;
 RESET client_min_messages;
+-- uses "none" provider instead
+CREATE COLLATION testc (provider = icu, locale='C');
+NOTICE:  using locale provider "none" for ICU locale "C"
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 ERROR:  parameter "locale" must be specified
 SET icu_validation_level = ERROR;
@@ -1069,7 +1072,8 @@ SELECT collname FROM pg_collation WHERE collname LIKE 'test%' ORDER BY 1;
  test0
  test1
  test5
-(3 rows)
+ testc
+(4 rows)
 
 ALTER COLLATION test1 RENAME TO test11;
 ALTER COLLATION test0 RENAME TO test11; -- fail
@@ -1090,7 +1094,8 @@ SELECT collname, nspname, obj_description(pg_collation.oid, 'pg_collation')
  test0    | collate_tests | US English
  test11   | test_schema   | 
  test5    | collate_tests | 
-(3 rows)
+ testc    | collate_tests | 
+(4 rows)
 
 DROP COLLATION test0, test_schema.test11, test5;
 DROP COLLATION test0; -- fail
@@ -1100,7 +1105,8 @@ NOTICE:  collation "test0" does not exist, skipping
 SELECT collname FROM pg_collation WHERE collname LIKE 'test%';
  collname 
 ----------
-(0 rows)
+ testc
+(1 row)
 
 DROP SCHEMA test_schema;
 DROP ROLE regress_test_role;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 655c965f46..63c29dfe2a 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -375,6 +375,9 @@ $$;
 
 RESET client_min_messages;
 
+-- uses "none" provider instead
+CREATE COLLATION testc (provider = icu, locale='C');
+
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 SET icu_validation_level = ERROR;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
-- 
2.34.1

From 79732b2f94d5097b5ceebd2a22fdbb692c780156 Mon Sep 17 00:00:00 2001
From: Jeff Davis <j...@j-davis.com>
Date: Tue, 25 Apr 2023 15:01:55 -0700
Subject: [PATCH v5 6/7] Make LOCALE apply to ICU_LOCALE for CREATE DATABASE.

LOCALE is now an alias for LC_COLLATE, LC_CTYPE, and (if the provider
is ICU) ICU_LOCALE. The ICU provider accepts more locale names than
libc (e.g. language tags and locale names containing collation
attributes), so in some cases LC_COLLATE, LC_CTYPE, and ICU_LOCALE
will still need to be specified separately.

Previously, LOCALE applied only to LC_COLLATE and LC_CTYPE (and
similarly for --locale in initdb and createdb). That could lead to
confusion when the provider is implicit, such as when it is inherited
from the template database, or when ICU was made default at initdb
time in commit 27b62377b4.

Reverts incomplete fix 5cd1a5af4d.

Discussion: https://postgr.es/m/3391932.1682107...@sss.pgh.pa.us
---
 doc/src/sgml/ref/create_database.sgml         |  6 ++--
 doc/src/sgml/ref/createdb.sgml                |  5 +++-
 doc/src/sgml/ref/initdb.sgml                  |  7 +++--
 src/backend/commands/collationcmds.c          |  2 +-
 src/backend/commands/dbcommands.c             | 15 +++++++---
 src/bin/initdb/initdb.c                       | 11 ++++++--
 src/bin/scripts/createdb.c                    | 13 ++++-----
 src/bin/scripts/t/020_createdb.pl             |  4 +--
 src/test/icu/t/010_database.pl                | 23 +++++++++------
 .../regress/expected/collate.icu.utf8.out     | 28 +++++++++----------
 10 files changed, 68 insertions(+), 46 deletions(-)

diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml
index c730d02e15..dc57ba0c8b 100644
--- a/doc/src/sgml/ref/create_database.sgml
+++ b/doc/src/sgml/ref/create_database.sgml
@@ -145,8 +145,10 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable>
       <term><replaceable class="parameter">locale</replaceable></term>
       <listitem>
        <para>
-        This is a shortcut for setting <symbol>LC_COLLATE</symbol>
-        and <symbol>LC_CTYPE</symbol> at once.
+        This is a shortcut for setting <symbol>LC_COLLATE</symbol>,
+        <symbol>LC_CTYPE</symbol> and <symbol>ICU_LOCALE</symbol> at
+        once. Some locales are only valid for ICU, and must be set separately
+        with <symbol>ICU_LOCALE</symbol>.
        </para>
        <tip>
         <para>
diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml
index 7c573e848a..7991153ecc 100644
--- a/doc/src/sgml/ref/createdb.sgml
+++ b/doc/src/sgml/ref/createdb.sgml
@@ -124,7 +124,10 @@ PostgreSQL documentation
       <listitem>
        <para>
         Specifies the locale to be used in this database.  This is equivalent
-        to specifying both <option>--lc-collate</option> and <option>--lc-ctype</option>.
+        to specifying <option>--lc-collate</option>,
+        <option>--lc-ctype</option>, and <option>--icu-locale</option> to the
+        same value. Some locales are only valid for ICU and must be set with
+        <option>--icu-locale</option>.
        </para>
       </listitem>
      </varlistentry>
diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml
index 76993acdfe..d9ef21c422 100644
--- a/doc/src/sgml/ref/initdb.sgml
+++ b/doc/src/sgml/ref/initdb.sgml
@@ -116,9 +116,10 @@ PostgreSQL documentation
   <para>
    To choose a different locale for the cluster, use the option
    <option>--locale</option>.  There are also individual options
-   <option>--lc-*</option> (see below) to set values for the individual locale
-   categories.  Note that inconsistent settings for different locale
-   categories can give nonsensical results, so this should be used with care.
+   <option>--lc-*</option> and <option>--icu-locale</option> (see below) to
+   set values for the individual locale categories.  Note that inconsistent
+   settings for different locale categories can give nonsensical results, so
+   this should be used with care.
   </para>
 
   <para>
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index ed64e17504..9a83f9f303 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -302,7 +302,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 				if (langtag && strcmp(colliculocale, langtag) != 0)
 				{
 					ereport(NOTICE,
-							(errmsg("using standard form \"%s\" for locale \"%s\"",
+							(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 									langtag, colliculocale)));
 
 					colliculocale = langtag;
diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c
index 6dc737aebb..154f20573c 100644
--- a/src/backend/commands/dbcommands.c
+++ b/src/backend/commands/dbcommands.c
@@ -1019,7 +1019,12 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	if (dblocprovider == '\0')
 		dblocprovider = src_locprovider;
 	if (dbiculocale == NULL && dblocprovider == COLLPROVIDER_ICU)
-		dbiculocale = src_iculocale;
+	{
+		if (dlocale && dlocale->arg)
+			dbiculocale = defGetString(dlocale);
+		else
+			dbiculocale = src_iculocale;
+	}
 	if (dbicurules == NULL && dblocprovider == COLLPROVIDER_ICU)
 		dbicurules = src_icurules;
 
@@ -1033,12 +1038,14 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 	if (!check_locale(LC_COLLATE, dbcollate, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbcollate)));
+				 errmsg("invalid LC_COLLATE locale name: \"%s\"", dbcollate),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbcollate = canonname;
 	if (!check_locale(LC_CTYPE, dbctype, &canonname))
 		ereport(ERROR,
 				(errcode(ERRCODE_WRONG_OBJECT_TYPE),
-				 errmsg("invalid locale name: \"%s\"", dbctype)));
+				 errmsg("invalid LC_CTYPE locale name: \"%s\"", dbctype),
+				 errhint("If the locale name is specific to ICU, use ICU_LOCALE.")));
 	dbctype = canonname;
 
 	check_encoding_locale_matches(encoding, dbcollate, dbctype);
@@ -1094,7 +1101,7 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt)
 			if (langtag && strcmp(dbiculocale, langtag) != 0)
 			{
 				ereport(NOTICE,
-						(errmsg("using standard form \"%s\" for locale \"%s\"",
+						(errmsg("using standard form \"%s\" for ICU locale \"%s\"",
 								langtag, dbiculocale)));
 
 				dbiculocale = langtag;
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index ea26bf8361..ccb2414fed 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -2157,7 +2157,11 @@ check_locale_name(int category, const char *locale, char **canonname)
 	if (res == NULL)
 	{
 		if (*locale)
-			pg_fatal("invalid locale name \"%s\"", locale);
+		{
+			pg_log_error("invalid locale name \"%s\"", locale);
+			pg_log_error_hint("If the locale name is specific to ICU, use --icu-locale.");
+			exit(1);
+		}
 		else
 		{
 			/*
@@ -2467,7 +2471,7 @@ setlocales(void)
 {
 	char	   *canonname;
 
-	/* set empty lc_* values to locale config if set */
+	/* set empty lc_* and iculocale values to locale config if set */
 
 	if (locale_provider == COLLPROVIDER_NONE)
 	{
@@ -2499,6 +2503,8 @@ setlocales(void)
 			lc_monetary = locale;
 		if (!lc_messages)
 			lc_messages = locale;
+		if (!icu_locale && locale_provider == COLLPROVIDER_ICU)
+			icu_locale = locale;
 	}
 
 	if (icu_locale && locale_provider == COLLPROVIDER_ICU &&
@@ -3392,7 +3398,6 @@ main(int argc, char *argv[])
 				break;
 			case 8:
 				locale = "C";
-				locale_provider = COLLPROVIDER_LIBC;
 				break;
 			case 9:
 				pwfilename = pg_strdup(optarg);
diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c
index 9caf9190cf..51c4bb3592 100644
--- a/src/bin/scripts/createdb.c
+++ b/src/bin/scripts/createdb.c
@@ -164,14 +164,6 @@ main(int argc, char *argv[])
 			exit(1);
 	}
 
-	if (locale)
-	{
-		if (!lc_ctype)
-			lc_ctype = locale;
-		if (!lc_collate)
-			lc_collate = locale;
-	}
-
 	if (locale_provider && pg_strcasecmp(locale_provider, "icu") == 0 &&
 		icu_locale &&
 		(pg_strcasecmp(icu_locale, "C") == 0 ||
@@ -230,6 +222,11 @@ main(int argc, char *argv[])
 		appendPQExpBuffer(&sql, " STRATEGY %s", fmtId(strategy));
 	if (template)
 		appendPQExpBuffer(&sql, " TEMPLATE %s", fmtId(template));
+	if (locale)
+	{
+		appendPQExpBufferStr(&sql, " LOCALE ");
+		appendStringLiteralConn(&sql, locale, conn);
+	}
 	if (lc_collate)
 	{
 		appendPQExpBufferStr(&sql, " LC_COLLATE ");
diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl
index eb3682f0fd..81a9931c09 100644
--- a/src/bin/scripts/t/020_createdb.pl
+++ b/src/bin/scripts/t/020_createdb.pl
@@ -167,7 +167,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_COLLATE locale name|^createdb: error: database creation failed: ERROR:  new collation \(foo'; SELECT '1\) is incompatible with the collation of the template database/s
 	],
 	'createdb with incorrect --lc-collate');
 $node->command_checks_all(
@@ -175,7 +175,7 @@ $node->command_checks_all(
 	1,
 	[qr/^$/],
 	[
-		qr/^createdb: error: database creation failed: ERROR:  invalid locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
+		qr/^createdb: error: database creation failed: ERROR:  invalid LC_CTYPE locale name|^createdb: error: database creation failed: ERROR:  new LC_CTYPE \(foo'; SELECT '1\) is incompatible with the LC_CTYPE of the template database/s
 	],
 	'createdb with incorrect --lc-ctype');
 
diff --git a/src/test/icu/t/010_database.pl b/src/test/icu/t/010_database.pl
index 715b1bffd6..df4af00afe 100644
--- a/src/test/icu/t/010_database.pl
+++ b/src/test/icu/t/010_database.pl
@@ -51,16 +51,23 @@ b),
 	'sort by explicit collation upper first');
 
 
-# Test error cases in CREATE DATABASE involving locale-related options
+# Test that LOCALE='C' works for ICU
 
-my ($ret, $stdout, $stderr) = $node1->psql('postgres',
-	q{CREATE DATABASE dbicu LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8});
-isnt($ret, 0,
-	"ICU locale must be specified for ICU provider: exit code not 0");
+my $ret1 = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu2 LOCALE_PROVIDER icu LOCALE 'C' TEMPLATE template0 ENCODING UTF8});
+is($ret1, 0,
+	"C locale works for ICU");
+
+# Test that ICU-specific locale string must be specified with ICU_LOCALE,
+# not LOCALE
+
+my ($ret2, $stdout, $stderr) = $node1->psql('postgres',
+	q{CREATE DATABASE dbicu3 LOCALE_PROVIDER icu LOCALE '@colStrength=primary' TEMPLATE template0 ENCODING UTF8});
+isnt($ret2, 0,
+	"ICU-specific locale must be specified with ICU_LOCALE: exit code not 0");
 like(
 	$stderr,
-	qr/ERROR:  ICU locale must be specified/,
-	"ICU locale must be specified for ICU provider: error message");
-
+	qr/ERROR:  invalid LC_COLLATE locale name/,
+	"ICU-specific locale must be specified with ICU_LOCALE: error message");
 
 done_testing();
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index c0437231ad..39f61ca281 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1058,11 +1058,11 @@ CREATE COLLATION testx (provider = icu, locale = '@ASDF'); DROP COLLATION testx;
 WARNING:  could not convert locale name "@ASDF" to language tag: U_ILLEGAL_ARGUMENT_ERROR
 -- test special variants
 CREATE COLLATION testx (provider = icu, locale = '@EURO'); DROP COLLATION testx;
-NOTICE:  using standard form "und-u-cu-eur" for locale "@EURO"
+NOTICE:  using standard form "und-u-cu-eur" for ICU locale "@EURO"
 CREATE COLLATION testx (provider = icu, locale = '@pinyin'); DROP COLLATION testx;
-NOTICE:  using standard form "und-u-co-pinyin" for locale "@pinyin"
+NOTICE:  using standard form "und-u-co-pinyin" for ICU locale "@pinyin"
 CREATE COLLATION testx (provider = icu, locale = '@stroke'); DROP COLLATION testx;
-NOTICE:  using standard form "und-u-co-stroke" for locale "@stroke"
+NOTICE:  using standard form "und-u-co-stroke" for ICU locale "@stroke"
 CREATE COLLATION test4 FROM nonsense;
 ERROR:  collation "nonsense" for encoding "UTF8" does not exist
 CREATE COLLATION test5 FROM test0;
@@ -1211,9 +1211,9 @@ SELECT 'coté' < 'côte' COLLATE "und-x-icu", 'coté' > 'côte' COLLATE testcoll
 (1 row)
 
 CREATE COLLATION testcoll_lower_first (provider = icu, locale = '@colCaseFirst=lower');
-NOTICE:  using standard form "und-u-kf-lower" for locale "@colCaseFirst=lower"
+NOTICE:  using standard form "und-u-kf-lower" for ICU locale "@colCaseFirst=lower"
 CREATE COLLATION testcoll_upper_first (provider = icu, locale = '@colCaseFirst=upper');
-NOTICE:  using standard form "und-u-kf-upper" for locale "@colCaseFirst=upper"
+NOTICE:  using standard form "und-u-kf-upper" for ICU locale "@colCaseFirst=upper"
 SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcoll_upper_first;
  ?column? | ?column? 
 ----------+----------
@@ -1221,7 +1221,7 @@ SELECT 'aaa' < 'AAA' COLLATE testcoll_lower_first, 'aaa' > 'AAA' COLLATE testcol
 (1 row)
 
 CREATE COLLATION testcoll_shifted (provider = icu, locale = '@colAlternate=shifted');
-NOTICE:  using standard form "und-u-ka-shifted" for locale "@colAlternate=shifted"
+NOTICE:  using standard form "und-u-ka-shifted" for ICU locale "@colAlternate=shifted"
 SELECT 'de-luge' < 'deanza' COLLATE "und-x-icu", 'de-luge' > 'deanza' COLLATE testcoll_shifted;
  ?column? | ?column? 
 ----------+----------
@@ -1238,12 +1238,12 @@ SELECT 'A-21' > 'A-123' COLLATE "und-x-icu", 'A-21' < 'A-123' COLLATE testcoll_n
 (1 row)
 
 CREATE COLLATION testcoll_error1 (provider = icu, locale = '@colNumeric=lower');
-NOTICE:  using standard form "und-u-kn-lower" for locale "@colNumeric=lower"
+NOTICE:  using standard form "und-u-kn-lower" for ICU locale "@colNumeric=lower"
 ERROR:  could not open collator for locale "und-u-kn-lower": U_ILLEGAL_ARGUMENT_ERROR
 -- test that attributes not handled by icu_set_collation_attributes()
 -- (handled by ucol_open() directly) also work
 CREATE COLLATION testcoll_de_phonebook (provider = icu, locale = 'de@collation=phonebook');
-NOTICE:  using standard form "de-u-co-phonebk" for locale "de@collation=phonebook"
+NOTICE:  using standard form "de-u-co-phonebk" for ICU locale "de@collation=phonebook"
 SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE testcoll_de_phonebook;
  ?column? | ?column? 
 ----------+----------
@@ -1252,7 +1252,7 @@ SELECT 'Goldmann' < 'Götz' COLLATE "de-x-icu", 'Goldmann' > 'Götz' COLLATE tes
 
 -- rules
 CREATE COLLATION testcoll_rules1 (provider = icu, locale = '', rules = '&a < g');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test7 (a text);
 -- example from https://unicode-org.github.io/icu/userguide/collation/customization/#syntax
 INSERT INTO test7 VALUES ('Abernathy'), ('apple'), ('bird'), ('Boston'), ('Graham'), ('green');
@@ -1280,13 +1280,13 @@ SELECT * FROM test7 ORDER BY a COLLATE testcoll_rules1;
 
 DROP TABLE test7;
 CREATE COLLATION testcoll_rulesx (provider = icu, locale = '', rules = '!!wrong!!');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 ERROR:  could not open collator for locale "und" with rules "!!wrong!!": U_INVALID_FORMAT_ERROR
 -- nondeterministic collations
 CREATE COLLATION ctest_det (provider = icu, locale = '', deterministic = true);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION ctest_nondet (provider = icu, locale = '', deterministic = false);
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE TABLE test6 (a int, b text);
 -- same string in different normal forms
 INSERT INTO test6 VALUES (1, U&'\00E4bc');
@@ -1336,9 +1336,9 @@ SELECT * FROM test6a WHERE b = ARRAY['äbc'] COLLATE ctest_nondet;
 (2 rows)
 
 CREATE COLLATION case_sensitive (provider = icu, locale = '');
-NOTICE:  using standard form "und" for locale ""
+NOTICE:  using standard form "und" for ICU locale ""
 CREATE COLLATION case_insensitive (provider = icu, locale = '@colStrength=secondary', deterministic = false);
-NOTICE:  using standard form "und-u-ks-level2" for locale "@colStrength=secondary"
+NOTICE:  using standard form "und-u-ks-level2" for ICU locale "@colStrength=secondary"
 SELECT 'abc' <= 'ABC' COLLATE case_sensitive, 'abc' >= 'ABC' COLLATE case_sensitive;
  ?column? | ?column? 
 ----------+----------
-- 
2.34.1

From 3ca8e0a84f6593ffff9a409bd31dc1c9ed253d3a Mon Sep 17 00:00:00 2001
From: Jeff Davis <j...@j-davis.com>
Date: Thu, 11 May 2023 12:54:31 -0700
Subject: [PATCH v5 7/7] Add default_collation_provider GUC.

Controls default collation provider for CREATE COLLATION. Does not
affect CREATE DATABASE, which gets its default from the template
database.
---
 doc/src/sgml/config.sgml                       | 17 +++++++++++++++++
 src/backend/commands/collationcmds.c           |  3 ++-
 src/backend/utils/misc/guc_tables.c            | 18 ++++++++++++++++++
 src/backend/utils/misc/postgresql.conf.sample  |  4 ++++
 src/include/commands/collationcmds.h           |  2 ++
 src/test/regress/expected/collate.icu.utf8.out | 17 +++++++++++++++++
 src/test/regress/sql/collate.icu.utf8.sql      | 10 ++++++++++
 7 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index c4a9dcb9ae..038ecf9811 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -9819,6 +9819,23 @@ SET XML OPTION { DOCUMENT | CONTENT };
       </listitem>
      </varlistentry>
 
+     <varlistentry id="guc-default-collation-provider" xreflabel="default_collation_provider">
+      <term><varname>default_collation_provider</varname> (<type>enum</type>)
+      <indexterm>
+       <primary><varname>default_collation_provider</varname> configuration parameter</primary>
+      </indexterm>
+      </term>
+      <listitem>
+       <para>
+        Default collation provider for <command>CREATE
+        COLLATION</command>. Does not affect <command>CREATE
+        DATABASE</command>, which gets the default collation provider from the
+        template database. Valid values are <literal>icu</literal> and
+        <literal>libc</literal>. The default is <literal>libc</literal>.
+       </para>
+      </listitem>
+     </varlistentry>
+
      <varlistentry id="guc-icu-validation-level" xreflabel="icu_validation_level">
       <term><varname>icu_validation_level</varname> (<type>enum</type>)
       <indexterm>
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 9a83f9f303..b42a660386 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -47,6 +47,7 @@ typedef struct
 	int			enc;			/* encoding */
 } CollAliasData;
 
+int		default_collation_provider = (int) COLLPROVIDER_LIBC;
 
 /*
  * CREATE COLLATION
@@ -228,7 +229,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 								collproviderstr)));
 		}
 		else
-			collprovider = COLLPROVIDER_LIBC;
+			collprovider = (char) default_collation_provider;
 
 		if (collprovider == COLLPROVIDER_NONE
 			&& (localeEl || lccollateEl || lcctypeEl))
diff --git a/src/backend/utils/misc/guc_tables.c b/src/backend/utils/misc/guc_tables.c
index 8c843f4ab6..d64b3a9a6f 100644
--- a/src/backend/utils/misc/guc_tables.c
+++ b/src/backend/utils/misc/guc_tables.c
@@ -35,8 +35,10 @@
 #include "access/xlogrecovery.h"
 #include "archive/archive_module.h"
 #include "catalog/namespace.h"
+#include "catalog/pg_collation.h"
 #include "catalog/storage.h"
 #include "commands/async.h"
+#include "commands/collationcmds.h"
 #include "commands/tablespace.h"
 #include "commands/trigger.h"
 #include "commands/user.h"
@@ -166,6 +168,12 @@ static const struct config_enum_entry intervalstyle_options[] = {
 	{NULL, 0, false}
 };
 
+static const struct config_enum_entry collation_provider_options[] = {
+	{"icu", (int) 'i', false},
+	{"libc", (int) 'c', false},
+	{NULL, 0, false}
+};
+
 static const struct config_enum_entry icu_validation_level_options[] = {
 	{"disabled", -1, false},
 	{"debug5", DEBUG5, false},
@@ -4683,6 +4691,16 @@ struct config_enum ConfigureNamesEnum[] =
 		NULL, NULL, NULL
 	},
 
+	{
+		{"default_collation_provider", PGC_USERSET, CLIENT_CONN_LOCALE,
+		 gettext_noop("Default collation provider for CREATE COLLATION."),
+		 NULL
+		},
+		&default_collation_provider,
+		(int) COLLPROVIDER_LIBC, collation_provider_options,
+		NULL, NULL, NULL
+	},
+
 	{
 		{"icu_validation_level", PGC_USERSET, CLIENT_CONN_LOCALE,
 		 gettext_noop("Log level for reporting invalid ICU locale strings."),
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index 87bad8ecbf..b2b015b31f 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -734,6 +734,10 @@
 #lc_numeric = 'C'			# locale for number formatting
 #lc_time = 'C'				# locale for time formatting
 
+#default_collation_provider = 'libc'	# default collation provider
+					# for CREATE COLLATION
+					# (none, icu, libc)
+
 #icu_validation_level = WARNING		# report ICU locale validation
 					# errors at the given level
 
diff --git a/src/include/commands/collationcmds.h b/src/include/commands/collationcmds.h
index b76c7b3dc3..f54389525d 100644
--- a/src/include/commands/collationcmds.h
+++ b/src/include/commands/collationcmds.h
@@ -18,6 +18,8 @@
 #include "catalog/objectaddress.h"
 #include "parser/parse_node.h"
 
+extern PGDLLIMPORT int default_collation_provider;
+
 extern ObjectAddress DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_exists);
 extern void IsThereCollationInNamespace(const char *collname, Oid nspOid);
 extern ObjectAddress AlterCollation(AlterCollationStmt *stmt);
diff --git a/src/test/regress/expected/collate.icu.utf8.out b/src/test/regress/expected/collate.icu.utf8.out
index 39f61ca281..d9da8d1310 100644
--- a/src/test/regress/expected/collate.icu.utf8.out
+++ b/src/test/regress/expected/collate.icu.utf8.out
@@ -1038,6 +1038,23 @@ RESET client_min_messages;
 -- uses "none" provider instead
 CREATE COLLATION testc (provider = icu, locale='C');
 NOTICE:  using locale provider "none" for ICU locale "C"
+SET default_collation_provider = 'libc';
+CREATE COLLATION def_libc (LOCALE = 'C');
+SELECT collname, collprovider FROM pg_collation WHERE collname='def_libc';
+ collname | collprovider 
+----------+--------------
+ def_libc | c
+(1 row)
+
+SET default_collation_provider = 'icu';
+CREATE COLLATION def_icu (LOCALE = 'und');
+SELECT collname, collprovider FROM pg_collation WHERE collname='def_icu';
+ collname | collprovider 
+----------+--------------
+ def_icu  | i
+(1 row)
+
+RESET default_collation_provider;
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 ERROR:  parameter "locale" must be specified
 SET icu_validation_level = ERROR;
diff --git a/src/test/regress/sql/collate.icu.utf8.sql b/src/test/regress/sql/collate.icu.utf8.sql
index 63c29dfe2a..13089c7f8e 100644
--- a/src/test/regress/sql/collate.icu.utf8.sql
+++ b/src/test/regress/sql/collate.icu.utf8.sql
@@ -378,6 +378,16 @@ RESET client_min_messages;
 -- uses "none" provider instead
 CREATE COLLATION testc (provider = icu, locale='C');
 
+SET default_collation_provider = 'libc';
+CREATE COLLATION def_libc (LOCALE = 'C');
+SELECT collname, collprovider FROM pg_collation WHERE collname='def_libc';
+
+SET default_collation_provider = 'icu';
+CREATE COLLATION def_icu (LOCALE = 'und');
+SELECT collname, collprovider FROM pg_collation WHERE collname='def_icu';
+
+RESET default_collation_provider;
+
 CREATE COLLATION test3 (provider = icu, lc_collate = 'en_US.utf8'); -- fail, needs "locale"
 SET icu_validation_level = ERROR;
 CREATE COLLATION testx (provider = icu, locale = 'nonsense-nowhere'); -- fails
-- 
2.34.1

Reply via email to