On 8/13/24 7:56 PM, Jeff Davis wrote:
But I don't think that's a major problem -- we can just move the
hardcoded test into pg_newlocale_from_collation() and return a
predefined struct with collate_is_c/ctype_is_c already set.

I tried that out but thought it felt cleaner to do the hardcoding in pg_set_regex_collation(). What do you think?

I have attached patches removing lc_collate_is_c() and lc_ctype_is_c(). I have not checked if there are any performance regressions when using the C and POSIX locales but remove these special cases makes the code a lot cleaner in my book.

I also attach some other clean up patches I did while touching this code.

0001-Remove-lc_collate_is_c.patch

Removes lc_collate_is_c().

0002-Remove-lc_ctype_is_c.patch

Removes lc_ctype_is_c() and POSIX_COLLATION_OID which is no longer necessary.

0003-Remove-dubious-check-against-default-locale.patch

This patch removes a check against DEFAULT_COLLATION_OID which I thought looked really dubious. Shouldn't this just be a simple check for if the locale is deterministic? Since we know we have a valid locale that should be enough, right?

0004-Do-not-check-both-for-collate_is_c-and-deterministic.patch

It is redundant to check both for "collation_is_c && deterministic", right?

0005-Remove-pg_collate_deterministic-and-check-field-dire.patch

Since after my patches we look a lot directly at the collation_is_c and ctype_is_c fields I think the thin wrapper around the deterministic field makes it seem like there is more to it so I suggest that we should just remove it.

0006-Slightly-refactor-varstr_sortsupport-to-improve-read.patch

Small refactor to make a hard to read function a bit easier to read.

Andreas
From 99ffa0b76ce810846af09f4430903dd59ed06366 Mon Sep 17 00:00:00 2001
From: Andreas Karlsson <andr...@proxel.se>
Date: Wed, 14 Aug 2024 00:37:23 +0200
Subject: [PATCH 1/6] Remove lc_collate_is_c()

Instead always look up the collation and check to collate_is_c field.

TODO: Check for performance regression?
---
 src/backend/access/spgist/spgtextproc.c |  2 +-
 src/backend/commands/collationcmds.c    |  8 ++-----
 src/backend/utils/adt/like_support.c    |  4 ++--
 src/backend/utils/adt/pg_locale.c       | 27 ---------------------
 src/backend/utils/adt/selfuncs.c        |  6 +++--
 src/backend/utils/adt/varchar.c         | 20 +++++-----------
 src/backend/utils/adt/varlena.c         | 32 ++++++++++++-------------
 src/include/utils/pg_locale.h           |  1 -
 8 files changed, 30 insertions(+), 70 deletions(-)

diff --git a/src/backend/access/spgist/spgtextproc.c b/src/backend/access/spgist/spgtextproc.c
index 3f08d330b6..d5237a68b5 100644
--- a/src/backend/access/spgist/spgtextproc.c
+++ b/src/backend/access/spgist/spgtextproc.c
@@ -427,7 +427,7 @@ spg_text_inner_consistent(PG_FUNCTION_ARGS)
 {
 	spgInnerConsistentIn *in = (spgInnerConsistentIn *) PG_GETARG_POINTER(0);
 	spgInnerConsistentOut *out = (spgInnerConsistentOut *) PG_GETARG_POINTER(1);
-	bool		collate_is_c = lc_collate_is_c(PG_GET_COLLATION());
+	bool		collate_is_c = pg_newlocale_from_collation(PG_GET_COLLATION())->collate_is_c;
 	text	   *reconstructedValue;
 	text	   *reconstrText;
 	int			maxReconstrLen;
diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c
index 63ef9a0841..53b6a479aa 100644
--- a/src/backend/commands/collationcmds.c
+++ b/src/backend/commands/collationcmds.c
@@ -377,13 +377,9 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e
 	if (!OidIsValid(newoid))
 		return InvalidObjectAddress;
 
-	/*
-	 * Check that the locales can be loaded.  NB: pg_newlocale_from_collation
-	 * is only supposed to be called on non-C-equivalent locales.
-	 */
+	/* Check that the locales can be loaded. */
 	CommandCounterIncrement();
-	if (!lc_collate_is_c(newoid) || !lc_ctype_is_c(newoid))
-		(void) pg_newlocale_from_collation(newoid);
+	(void) pg_newlocale_from_collation(newoid);
 
 	ObjectAddressSet(address, CollationRelationId, newoid);
 
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 2635050861..a579fa0157 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -433,7 +433,7 @@ match_pattern_prefix(Node *leftop,
 	 * collation.
 	 */
 	if (collation_aware &&
-		!lc_collate_is_c(indexcollation))
+		!pg_newlocale_from_collation(indexcollation)->collate_is_c)
 		return NIL;
 
 	/*
@@ -1603,7 +1603,7 @@ make_greater_string(const Const *str_const, FmgrInfo *ltproc, Oid collation)
 		else
 			workstr = TextDatumGetCString(str_const->constvalue);
 		len = strlen(workstr);
-		if (lc_collate_is_c(collation) || len == 0)
+		if (pg_newlocale_from_collation(collation)->collate_is_c || len == 0)
 			cmpstr = str_const->constvalue;
 		else
 		{
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index cd3661e727..7512ac055e 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1258,33 +1258,6 @@ lookup_collation_cache(Oid collation)
 	return cache_entry;
 }
 
-
-/*
- * Detect whether collation's LC_COLLATE property is C
- */
-bool
-lc_collate_is_c(Oid collation)
-{
-	/*
-	 * If we're asked about "collation 0", return false, so that the code will
-	 * go into the non-C path and report that the collation is bogus.
-	 */
-	if (!OidIsValid(collation))
-		return false;
-
-	/*
-	 * If we're asked about the built-in C/POSIX collations, we know that.
-	 */
-	if (collation == C_COLLATION_OID ||
-		collation == POSIX_COLLATION_OID)
-		return true;
-
-	/*
-	 * Otherwise, we have to consult pg_collation, but we cache that.
-	 */
-	return pg_newlocale_from_collation(collation)->collate_is_c;
-}
-
 /*
  * Detect whether collation's LC_CTYPE property is C
  */
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index bf42393bec..03d7fb5f48 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -4646,6 +4646,7 @@ static char *
 convert_string_datum(Datum value, Oid typid, Oid collid, bool *failure)
 {
 	char	   *val;
+	pg_locale_t mylocale;
 
 	switch (typid)
 	{
@@ -4671,9 +4672,10 @@ convert_string_datum(Datum value, Oid typid, Oid collid, bool *failure)
 			return NULL;
 	}
 
-	if (!lc_collate_is_c(collid))
+	mylocale = pg_newlocale_from_collation(collid);
+
+	if (!mylocale->collate_is_c)
 	{
-		pg_locale_t mylocale = pg_newlocale_from_collation(collid);
 		char	   *xfrmstr;
 		size_t		xfrmlen;
 		size_t		xfrmlen2 PG_USED_FOR_ASSERTS_ONLY;
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index 0c219dcc77..d2d1c5709d 100644
--- a/src/backend/utils/adt/varchar.c
+++ b/src/backend/utils/adt/varchar.c
@@ -748,20 +748,16 @@ bpchareq(PG_FUNCTION_ARGS)
 				len2;
 	bool		result;
 	Oid			collid = PG_GET_COLLATION();
-	bool		locale_is_c = false;
-	pg_locale_t mylocale = 0;
+	pg_locale_t mylocale;
 
 	check_collation_set(collid);
 
 	len1 = bcTruelen(arg1);
 	len2 = bcTruelen(arg2);
 
-	if (lc_collate_is_c(collid))
-		locale_is_c = true;
-	else
-		mylocale = pg_newlocale_from_collation(collid);
+	mylocale = pg_newlocale_from_collation(collid);
 
-	if (locale_is_c || pg_locale_deterministic(mylocale))
+	if (mylocale->collate_is_c || pg_locale_deterministic(mylocale))
 	{
 		/*
 		 * Since we only care about equality or not-equality, we can avoid all
@@ -793,20 +789,16 @@ bpcharne(PG_FUNCTION_ARGS)
 				len2;
 	bool		result;
 	Oid			collid = PG_GET_COLLATION();
-	bool		locale_is_c = false;
-	pg_locale_t mylocale = 0;
+	pg_locale_t mylocale;
 
 	check_collation_set(collid);
 
 	len1 = bcTruelen(arg1);
 	len2 = bcTruelen(arg2);
 
-	if (lc_collate_is_c(collid))
-		locale_is_c = true;
-	else
-		mylocale = pg_newlocale_from_collation(collid);
+	mylocale = pg_newlocale_from_collation(collid);
 
-	if (locale_is_c || pg_locale_deterministic(mylocale))
+	if (mylocale->collate_is_c || pg_locale_deterministic(mylocale))
 	{
 		/*
 		 * Since we only care about equality or not-equality, we can avoid all
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 52ab8c43c6..81343bd0e1 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -1538,16 +1538,19 @@ int
 varstr_cmp(const char *arg1, int len1, const char *arg2, int len2, Oid collid)
 {
 	int			result;
+	pg_locale_t mylocale;
 
 	check_collation_set(collid);
 
+	mylocale = pg_newlocale_from_collation(collid);
+
 	/*
 	 * Unfortunately, there is no strncoll(), so in the non-C locale case we
 	 * have to do some memory copying.  This turns out to be significantly
 	 * slower, so we optimize the case where LC_COLLATE is C.  We also try to
 	 * optimize relatively-short strings by avoiding palloc/pfree overhead.
 	 */
-	if (lc_collate_is_c(collid))
+	if (mylocale->collate_is_c)
 	{
 		result = memcmp(arg1, arg2, Min(len1, len2));
 		if ((result == 0) && (len1 != len2))
@@ -1555,10 +1558,6 @@ varstr_cmp(const char *arg1, int len1, const char *arg2, int len2, Oid collid)
 	}
 	else
 	{
-		pg_locale_t mylocale;
-
-		mylocale = pg_newlocale_from_collation(collid);
-
 		/*
 		 * memcmp() can't tell us which of two unequal strings sorts first,
 		 * but it's a cheap way to tell if they're equal.  Testing shows that
@@ -1865,10 +1864,12 @@ varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
 	bool		abbreviate = ssup->abbreviate;
 	bool		collate_c = false;
 	VarStringSortSupport *sss;
-	pg_locale_t locale = 0;
+	pg_locale_t locale;
 
 	check_collation_set(collid);
 
+	locale = pg_newlocale_from_collation(collid);
+
 	/*
 	 * If possible, set ssup->comparator to a function which can be used to
 	 * directly compare two datums.  If we can do this, we'll avoid the
@@ -1882,7 +1883,7 @@ varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
 	 * varstrfastcmp_c, bpcharfastcmp_c, or namefastcmp_c, all of which use
 	 * memcmp() rather than strcoll().
 	 */
-	if (lc_collate_is_c(collid))
+	if (locale->collate_is_c)
 	{
 		if (typid == BPCHAROID)
 			ssup->comparator = bpcharfastcmp_c;
@@ -1899,13 +1900,6 @@ varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
 	}
 	else
 	{
-		/*
-		 * We need a collation-sensitive comparison.  To make things faster,
-		 * we'll figure out the collation based on the locale id and cache the
-		 * result.
-		 */
-		locale = pg_newlocale_from_collation(collid);
-
 		/*
 		 * We use varlenafastcmp_locale except for type NAME.
 		 */
@@ -1956,7 +1950,8 @@ varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
 		sss->last_len2 = -1;
 		/* Initialize */
 		sss->last_returned = 0;
-		sss->locale = locale;
+		if (!collate_c)
+			sss->locale = locale;
 
 		/*
 		 * To avoid somehow confusing a strxfrm() blob and an original string,
@@ -2546,12 +2541,15 @@ btvarstrequalimage(PG_FUNCTION_ARGS)
 {
 	/* Oid		opcintype = PG_GETARG_OID(0); */
 	Oid			collid = PG_GET_COLLATION();
+	pg_locale_t locale;
 
 	check_collation_set(collid);
 
-	if (lc_collate_is_c(collid) ||
+	locale = pg_newlocale_from_collation(collid);
+
+	if (locale->collate_is_c ||
 		collid == DEFAULT_COLLATION_OID ||
-		get_collation_isdeterministic(collid))
+		pg_locale_deterministic(locale))
 		PG_RETURN_BOOL(true);
 	else
 		PG_RETURN_BOOL(false);
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index f41d33975b..8ec24437f4 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -54,7 +54,6 @@ extern PGDLLIMPORT bool database_ctype_is_c;
 extern bool check_locale(int category, const char *locale, char **canonname);
 extern char *pg_perm_setlocale(int category, const char *locale);
 
-extern bool lc_collate_is_c(Oid collation);
 extern bool lc_ctype_is_c(Oid collation);
 
 /*
-- 
2.43.0

From c7486013aca89bab8178cb7ca0944c52d5805820 Mon Sep 17 00:00:00 2001
From: Andreas Karlsson <andr...@proxel.se>
Date: Wed, 14 Aug 2024 00:37:48 +0200
Subject: [PATCH 2/6] Remove lc_ctype_is_c()

Instead always fetch the locale and look at the ctype_is_c field.

Since regular expressions are used by pg_hba we need to hardcode
the logic for regular expression using C_COLLATE_OID.
---
 src/backend/regex/regc_pg_locale.c   | 20 ++++++++++++++------
 src/backend/utils/adt/formatting.c   | 27 ++++++++++++---------------
 src/backend/utils/adt/like.c         |  2 +-
 src/backend/utils/adt/like_support.c | 15 +++++----------
 src/backend/utils/adt/pg_locale.c    | 26 --------------------------
 src/include/catalog/pg_collation.dat |  3 +--
 src/include/utils/pg_locale.h        |  2 --
 7 files changed, 33 insertions(+), 62 deletions(-)

diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index 947d73f3e0..21f640a7e1 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -243,29 +243,36 @@ pg_set_regex_collation(Oid collation)
 				 errhint("Use the COLLATE clause to set the collation explicitly.")));
 	}
 
-	if (lc_ctype_is_c(collation))
+	if (collation == C_COLLATION_OID)
 	{
-		/* C/POSIX collations use this path regardless of database encoding */
+		/* Allow using regex in C locale without catalog */
 		pg_regex_strategy = PG_REGEX_LOCALE_C;
 		pg_regex_locale = 0;
 		pg_regex_collation = C_COLLATION_OID;
 	}
 	else
 	{
-		pg_regex_locale = pg_newlocale_from_collation(collation);
+		pg_locale_t locale = pg_newlocale_from_collation(collation);
 
-		if (!pg_locale_deterministic(pg_regex_locale))
+		if (!pg_locale_deterministic(locale))
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("nondeterministic collations are not supported for regular expressions")));
 
-		if (pg_regex_locale->provider == COLLPROVIDER_BUILTIN)
+		if (locale->ctype_is_c)
+		{
+			/* C/POSIX collations use this path regardless of database encoding */
+			pg_regex_strategy = PG_REGEX_LOCALE_C;
+			locale = 0;
+			collation = C_COLLATION_OID;
+		}
+		else if (locale->provider == COLLPROVIDER_BUILTIN)
 		{
 			Assert(GetDatabaseEncoding() == PG_UTF8);
 			pg_regex_strategy = PG_REGEX_BUILTIN;
 		}
 #ifdef USE_ICU
-		else if (pg_regex_locale->provider == COLLPROVIDER_ICU)
+		else if (locale->provider == COLLPROVIDER_ICU)
 		{
 			pg_regex_strategy = PG_REGEX_LOCALE_ICU;
 		}
@@ -278,6 +285,7 @@ pg_set_regex_collation(Oid collation)
 				pg_regex_strategy = PG_REGEX_LOCALE_1BYTE_L;
 		}
 
+		pg_regex_locale = locale;
 		pg_regex_collation = collation;
 	}
 }
diff --git a/src/backend/utils/adt/formatting.c b/src/backend/utils/adt/formatting.c
index 68069fcfd3..dc5178f56b 100644
--- a/src/backend/utils/adt/formatting.c
+++ b/src/backend/utils/adt/formatting.c
@@ -1636,6 +1636,7 @@ char *
 str_tolower(const char *buff, size_t nbytes, Oid collid)
 {
 	char	   *result;
+	pg_locale_t mylocale;
 
 	if (!buff)
 		return NULL;
@@ -1653,17 +1654,15 @@ str_tolower(const char *buff, size_t nbytes, Oid collid)
 				 errhint("Use the COLLATE clause to set the collation explicitly.")));
 	}
 
+	mylocale = pg_newlocale_from_collation(collid);
+
 	/* C/POSIX collations use this path regardless of database encoding */
-	if (lc_ctype_is_c(collid))
+	if (mylocale->ctype_is_c)
 	{
 		result = asc_tolower(buff, nbytes);
 	}
 	else
 	{
-		pg_locale_t mylocale;
-
-		mylocale = pg_newlocale_from_collation(collid);
-
 #ifdef USE_ICU
 		if (mylocale->provider == COLLPROVIDER_ICU)
 		{
@@ -1774,6 +1773,7 @@ char *
 str_toupper(const char *buff, size_t nbytes, Oid collid)
 {
 	char	   *result;
+	pg_locale_t mylocale;
 
 	if (!buff)
 		return NULL;
@@ -1791,17 +1791,15 @@ str_toupper(const char *buff, size_t nbytes, Oid collid)
 				 errhint("Use the COLLATE clause to set the collation explicitly.")));
 	}
 
+	mylocale = pg_newlocale_from_collation(collid);
+
 	/* C/POSIX collations use this path regardless of database encoding */
-	if (lc_ctype_is_c(collid))
+	if (mylocale->ctype_is_c)
 	{
 		result = asc_toupper(buff, nbytes);
 	}
 	else
 	{
-		pg_locale_t mylocale;
-
-		mylocale = pg_newlocale_from_collation(collid);
-
 #ifdef USE_ICU
 		if (mylocale->provider == COLLPROVIDER_ICU)
 		{
@@ -1954,6 +1952,7 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
 {
 	char	   *result;
 	int			wasalnum = false;
+	pg_locale_t mylocale;
 
 	if (!buff)
 		return NULL;
@@ -1971,17 +1970,15 @@ str_initcap(const char *buff, size_t nbytes, Oid collid)
 				 errhint("Use the COLLATE clause to set the collation explicitly.")));
 	}
 
+	mylocale = pg_newlocale_from_collation(collid);
+
 	/* C/POSIX collations use this path regardless of database encoding */
-	if (lc_ctype_is_c(collid))
+	if (mylocale->ctype_is_c)
 	{
 		result = asc_initcap(buff, nbytes);
 	}
 	else
 	{
-		pg_locale_t mylocale;
-
-		mylocale = pg_newlocale_from_collation(collid);
-
 #ifdef USE_ICU
 		if (mylocale->provider == COLLPROVIDER_ICU)
 		{
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 131616fa6b..f87675d755 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -147,7 +147,7 @@ SB_lower_char(unsigned char c, pg_locale_t locale, bool locale_is_c)
 static inline int
 GenericMatchText(const char *s, int slen, const char *p, int plen, Oid collation)
 {
-	if (collation && !lc_ctype_is_c(collation))
+	if (collation)
 	{
 		pg_locale_t locale = pg_newlocale_from_collation(collation);
 
diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index a579fa0157..fcfd2576b2 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -100,7 +100,7 @@ static Selectivity regex_selectivity(const char *patt, int pattlen,
 									 bool case_insensitive,
 									 int fixed_prefix_len);
 static int	pattern_char_isalpha(char c, bool is_multibyte,
-								 pg_locale_t locale, bool locale_is_c);
+								 pg_locale_t locale);
 static Const *make_greater_string(const Const *str_const, FmgrInfo *ltproc,
 								  Oid collation);
 static Datum string_to_datum(const char *str, Oid datatype);
@@ -1000,7 +1000,6 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 				match_pos;
 	bool		is_multibyte = (pg_database_encoding_max_length() > 1);
 	pg_locale_t locale = 0;
-	bool		locale_is_c = false;
 
 	/* the right-hand const is type text or bytea */
 	Assert(typeid == BYTEAOID || typeid == TEXTOID);
@@ -1024,11 +1023,7 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 					 errhint("Use the COLLATE clause to set the collation explicitly.")));
 		}
 
-		/* If case-insensitive, we need locale info */
-		if (lc_ctype_is_c(collation))
-			locale_is_c = true;
-		else
-			locale = pg_newlocale_from_collation(collation);
+		locale = pg_newlocale_from_collation(collation);
 	}
 
 	if (typeid != BYTEAOID)
@@ -1065,7 +1060,7 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 
 		/* Stop if case-varying character (it's sort of a wildcard) */
 		if (case_insensitive &&
-			pattern_char_isalpha(patt[pos], is_multibyte, locale, locale_is_c))
+			pattern_char_isalpha(patt[pos], is_multibyte, locale))
 			break;
 
 		match[match_pos++] = patt[pos];
@@ -1499,9 +1494,9 @@ regex_selectivity(const char *patt, int pattlen, bool case_insensitive,
  */
 static int
 pattern_char_isalpha(char c, bool is_multibyte,
-					 pg_locale_t locale, bool locale_is_c)
+					 pg_locale_t locale)
 {
-	if (locale_is_c)
+	if (locale->ctype_is_c)
 		return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
 	else if (is_multibyte && IS_HIGHBIT_SET(c))
 		return true;
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 7512ac055e..b1663e2b6e 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1258,32 +1258,6 @@ lookup_collation_cache(Oid collation)
 	return cache_entry;
 }
 
-/*
- * Detect whether collation's LC_CTYPE property is C
- */
-bool
-lc_ctype_is_c(Oid collation)
-{
-	/*
-	 * If we're asked about "collation 0", return false, so that the code will
-	 * go into the non-C path and report that the collation is bogus.
-	 */
-	if (!OidIsValid(collation))
-		return false;
-
-	/*
-	 * If we're asked about the built-in C/POSIX collations, we know that.
-	 */
-	if (collation == C_COLLATION_OID ||
-		collation == POSIX_COLLATION_OID)
-		return true;
-
-	/*
-	 * Otherwise, we have to consult pg_collation, but we cache that.
-	 */
-	return pg_newlocale_from_collation(collation)->ctype_is_c;
-}
-
 /* simple subroutine for reporting errors from newlocale() */
 static void
 report_newlocale_failure(const char *localename)
diff --git a/src/include/catalog/pg_collation.dat b/src/include/catalog/pg_collation.dat
index f126201276..af5c9aa582 100644
--- a/src/include/catalog/pg_collation.dat
+++ b/src/include/catalog/pg_collation.dat
@@ -19,8 +19,7 @@
   descr => 'standard C collation',
   collname => 'C', collprovider => 'c', collencoding => '-1',
   collcollate => 'C', collctype => 'C' },
-{ oid => '951', oid_symbol => 'POSIX_COLLATION_OID',
-  descr => 'standard POSIX collation',
+{ oid => '951', descr => 'standard POSIX collation',
   collname => 'POSIX', collprovider => 'c', collencoding => '-1',
   collcollate => 'POSIX', collctype => 'POSIX' },
 { oid => '962', descr => 'sorts by Unicode code point, C character semantics',
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 8ec24437f4..ab1c37a44b 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -54,8 +54,6 @@ extern PGDLLIMPORT bool database_ctype_is_c;
 extern bool check_locale(int category, const char *locale, char **canonname);
 extern char *pg_perm_setlocale(int category, const char *locale);
 
-extern bool lc_ctype_is_c(Oid collation);
-
 /*
  * Return the POSIX lconv struct (contains number/money formatting
  * information) with locale information for all categories.
-- 
2.43.0

From 6a1c264ff11a1d745a789c06ea7d13a8c5e28363 Mon Sep 17 00:00:00 2001
From: Andreas Karlsson <andr...@proxel.se>
Date: Wed, 14 Aug 2024 00:47:08 +0200
Subject: [PATCH 3/6] Remove dubious check against default locale

---
 src/backend/utils/adt/varlena.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 81343bd0e1..1c02a67d72 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -2548,7 +2548,6 @@ btvarstrequalimage(PG_FUNCTION_ARGS)
 	locale = pg_newlocale_from_collation(collid);
 
 	if (locale->collate_is_c ||
-		collid == DEFAULT_COLLATION_OID ||
 		pg_locale_deterministic(locale))
 		PG_RETURN_BOOL(true);
 	else
-- 
2.43.0

From 4e3dd6fcf2f33d709699216a7c7fa7f7dca54d44 Mon Sep 17 00:00:00 2001
From: Andreas Karlsson <andr...@proxel.se>
Date: Wed, 14 Aug 2024 00:49:46 +0200
Subject: [PATCH 4/6] Do not check both for collate_is_c and deterministic

Since deterimistic will always be true if collate_is_c we can just
check for deterimistic and make the code cleaner to read.
---
 src/backend/utils/adt/varchar.c | 4 ++--
 src/backend/utils/adt/varlena.c | 3 +--
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index d2d1c5709d..03e91c0333 100644
--- a/src/backend/utils/adt/varchar.c
+++ b/src/backend/utils/adt/varchar.c
@@ -757,7 +757,7 @@ bpchareq(PG_FUNCTION_ARGS)
 
 	mylocale = pg_newlocale_from_collation(collid);
 
-	if (mylocale->collate_is_c || pg_locale_deterministic(mylocale))
+	if (pg_locale_deterministic(mylocale))
 	{
 		/*
 		 * Since we only care about equality or not-equality, we can avoid all
@@ -798,7 +798,7 @@ bpcharne(PG_FUNCTION_ARGS)
 
 	mylocale = pg_newlocale_from_collation(collid);
 
-	if (mylocale->collate_is_c || pg_locale_deterministic(mylocale))
+	if (pg_locale_deterministic(mylocale))
 	{
 		/*
 		 * Since we only care about equality or not-equality, we can avoid all
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 1c02a67d72..2086c47e0f 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -2547,8 +2547,7 @@ btvarstrequalimage(PG_FUNCTION_ARGS)
 
 	locale = pg_newlocale_from_collation(collid);
 
-	if (locale->collate_is_c ||
-		pg_locale_deterministic(locale))
+	if (pg_locale_deterministic(locale))
 		PG_RETURN_BOOL(true);
 	else
 		PG_RETURN_BOOL(false);
-- 
2.43.0

From 86d66751699a232beaf49e469ff6029a8b9caa21 Mon Sep 17 00:00:00 2001
From: Andreas Karlsson <andr...@proxel.se>
Date: Wed, 14 Aug 2024 00:52:53 +0200
Subject: [PATCH 5/6] Remove pg_collate_deterministic() and check field
 directly

There is no clear benefit from using this thin accessor function,
espeically since other fields like collate_is_c and ctype_is_c are
accessed directly.
---
 src/backend/access/hash/hashfunc.c |  4 ++--
 src/backend/regex/regc_pg_locale.c |  2 +-
 src/backend/utils/adt/like.c       |  4 ++--
 src/backend/utils/adt/pg_locale.c  |  7 -------
 src/backend/utils/adt/varchar.c    |  8 ++++----
 src/backend/utils/adt/varlena.c    | 14 +++++++-------
 src/include/utils/pg_locale.h      |  1 -
 7 files changed, 16 insertions(+), 24 deletions(-)

diff --git a/src/backend/access/hash/hashfunc.c b/src/backend/access/hash/hashfunc.c
index 86f1adecb7..01bdf997e3 100644
--- a/src/backend/access/hash/hashfunc.c
+++ b/src/backend/access/hash/hashfunc.c
@@ -279,7 +279,7 @@ hashtext(PG_FUNCTION_ARGS)
 
 	mylocale = pg_newlocale_from_collation(collid);
 
-	if (pg_locale_deterministic(mylocale))
+	if (mylocale->deterministic)
 	{
 		result = hash_any((unsigned char *) VARDATA_ANY(key),
 						  VARSIZE_ANY_EXHDR(key));
@@ -334,7 +334,7 @@ hashtextextended(PG_FUNCTION_ARGS)
 
 	mylocale = pg_newlocale_from_collation(collid);
 
-	if (pg_locale_deterministic(mylocale))
+	if (mylocale->deterministic)
 	{
 		result = hash_any_extended((unsigned char *) VARDATA_ANY(key),
 								   VARSIZE_ANY_EXHDR(key),
diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index 21f640a7e1..1683d706b9 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -254,7 +254,7 @@ pg_set_regex_collation(Oid collation)
 	{
 		pg_locale_t locale = pg_newlocale_from_collation(collation);
 
-		if (!pg_locale_deterministic(locale))
+		if (!locale->deterministic)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("nondeterministic collations are not supported for regular expressions")));
diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index f87675d755..286b737fd7 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -151,7 +151,7 @@ GenericMatchText(const char *s, int slen, const char *p, int plen, Oid collation
 	{
 		pg_locale_t locale = pg_newlocale_from_collation(collation);
 
-		if (!pg_locale_deterministic(locale))
+		if (!locale->deterministic)
 			ereport(ERROR,
 					(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 					 errmsg("nondeterministic collations are not supported for LIKE")));
@@ -188,7 +188,7 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 
 	locale = pg_newlocale_from_collation(collation);
 
-	if (!pg_locale_deterministic(locale))
+	if (!locale->deterministic)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("nondeterministic collations are not supported for ILIKE")));
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index b1663e2b6e..8e952c7f16 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1407,13 +1407,6 @@ make_icu_collator(const char *iculocstr,
 #endif							/* not USE_ICU */
 }
 
-
-bool
-pg_locale_deterministic(pg_locale_t locale)
-{
-	return locale->deterministic;
-}
-
 /*
  * Initialize default_locale with database locale settings.
  */
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
index 03e91c0333..8f86aca297 100644
--- a/src/backend/utils/adt/varchar.c
+++ b/src/backend/utils/adt/varchar.c
@@ -757,7 +757,7 @@ bpchareq(PG_FUNCTION_ARGS)
 
 	mylocale = pg_newlocale_from_collation(collid);
 
-	if (pg_locale_deterministic(mylocale))
+	if (mylocale->deterministic)
 	{
 		/*
 		 * Since we only care about equality or not-equality, we can avoid all
@@ -798,7 +798,7 @@ bpcharne(PG_FUNCTION_ARGS)
 
 	mylocale = pg_newlocale_from_collation(collid);
 
-	if (pg_locale_deterministic(mylocale))
+	if (mylocale->deterministic)
 	{
 		/*
 		 * Since we only care about equality or not-equality, we can avoid all
@@ -1005,7 +1005,7 @@ hashbpchar(PG_FUNCTION_ARGS)
 
 	mylocale = pg_newlocale_from_collation(collid);
 
-	if (pg_locale_deterministic(mylocale))
+	if (mylocale->deterministic)
 	{
 		result = hash_any((unsigned char *) keydata, keylen);
 	}
@@ -1061,7 +1061,7 @@ hashbpcharextended(PG_FUNCTION_ARGS)
 
 	mylocale = pg_newlocale_from_collation(collid);
 
-	if (pg_locale_deterministic(mylocale))
+	if (mylocale->deterministic)
 	{
 		result = hash_any_extended((unsigned char *) keydata, keylen,
 								   PG_GETARG_INT64(1));
diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index 2086c47e0f..bc79d71d70 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -1223,7 +1223,7 @@ text_position_setup(text *t1, text *t2, Oid collid, TextPositionState *state)
 
 	mylocale = pg_newlocale_from_collation(collid);
 
-	if (!pg_locale_deterministic(mylocale))
+	if (!mylocale->deterministic)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("nondeterministic collations are not supported for substring searches")));
@@ -1573,7 +1573,7 @@ varstr_cmp(const char *arg1, int len1, const char *arg2, int len2, Oid collid)
 		result = pg_strncoll(arg1, len1, arg2, len2, mylocale);
 
 		/* Break tie if necessary. */
-		if (result == 0 && pg_locale_deterministic(mylocale))
+		if (result == 0 && mylocale->deterministic)
 		{
 			result = memcmp(arg1, arg2, Min(len1, len2));
 			if ((result == 0) && (len1 != len2))
@@ -1624,7 +1624,7 @@ texteq(PG_FUNCTION_ARGS)
 
 	mylocale = pg_newlocale_from_collation(collid);
 
-	if (pg_locale_deterministic(mylocale))
+	if (mylocale->deterministic)
 	{
 		Datum		arg1 = PG_GETARG_DATUM(0);
 		Datum		arg2 = PG_GETARG_DATUM(1);
@@ -1679,7 +1679,7 @@ textne(PG_FUNCTION_ARGS)
 
 	mylocale = pg_newlocale_from_collation(collid);
 
-	if (pg_locale_deterministic(mylocale))
+	if (mylocale->deterministic)
 	{
 		Datum		arg1 = PG_GETARG_DATUM(0);
 		Datum		arg2 = PG_GETARG_DATUM(1);
@@ -1792,7 +1792,7 @@ text_starts_with(PG_FUNCTION_ARGS)
 
 	mylocale = pg_newlocale_from_collation(collid);
 
-	if (!pg_locale_deterministic(mylocale))
+	if (!mylocale->deterministic)
 		ereport(ERROR,
 				(errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 				 errmsg("nondeterministic collations are not supported for substring searches")));
@@ -2204,7 +2204,7 @@ varstrfastcmp_locale(char *a1p, int len1, char *a2p, int len2, SortSupport ssup)
 	result = pg_strcoll(sss->buf1, sss->buf2, sss->locale);
 
 	/* Break tie if necessary. */
-	if (result == 0 && pg_locale_deterministic(sss->locale))
+	if (result == 0 && sss->locale->deterministic)
 		result = strcmp(sss->buf1, sss->buf2);
 
 	/* Cache result, perhaps saving an expensive strcoll() call next time */
@@ -2547,7 +2547,7 @@ btvarstrequalimage(PG_FUNCTION_ARGS)
 
 	locale = pg_newlocale_from_collation(collid);
 
-	if (pg_locale_deterministic(locale))
+	if (locale->deterministic)
 		PG_RETURN_BOOL(true);
 	else
 		PG_RETURN_BOOL(false);
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index ab1c37a44b..faae868bfc 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -108,7 +108,6 @@ extern void make_icu_collator(const char *iculocstr,
 							  const char *icurules,
 							  struct pg_locale_struct *resultp);
 
-extern bool pg_locale_deterministic(pg_locale_t locale);
 extern void init_database_collation(void);
 extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 
-- 
2.43.0

From 9b47f1a5190dd8dcd182c9f8b380242bbc510ff9 Mon Sep 17 00:00:00 2001
From: Andreas Karlsson <andr...@proxel.se>
Date: Wed, 14 Aug 2024 00:03:17 +0200
Subject: [PATCH 6/6] Slightly refactor varstr_sortsupport() to improve
 readability

---
 src/backend/utils/adt/varlena.c | 36 ++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/src/backend/utils/adt/varlena.c b/src/backend/utils/adt/varlena.c
index bc79d71d70..a6a150e90e 100644
--- a/src/backend/utils/adt/varlena.c
+++ b/src/backend/utils/adt/varlena.c
@@ -1911,25 +1911,25 @@ varstr_sortsupport(SortSupport ssup, Oid typid, Oid collid)
 		}
 		else
 			ssup->comparator = varlenafastcmp_locale;
-	}
 
-	/*
-	 * Unfortunately, it seems that abbreviation for non-C collations is
-	 * broken on many common platforms; see pg_strxfrm_enabled().
-	 *
-	 * Even apart from the risk of broken locales, it's possible that there
-	 * are platforms where the use of abbreviated keys should be disabled at
-	 * compile time.  Having only 4 byte datums could make worst-case
-	 * performance drastically more likely, for example.  Moreover, macOS's
-	 * strxfrm() implementation is known to not effectively concentrate a
-	 * significant amount of entropy from the original string in earlier
-	 * transformed blobs.  It's possible that other supported platforms are
-	 * similarly encumbered.  So, if we ever get past disabling this
-	 * categorically, we may still want or need to disable it for particular
-	 * platforms.
-	 */
-	if (!collate_c && !pg_strxfrm_enabled(locale))
-		abbreviate = false;
+		/*
+		 * Unfortunately, it seems that abbreviation for non-C collations is
+		 * broken on many common platforms; see pg_strxfrm_enabled().
+		 *
+		 * Even apart from the risk of broken locales, it's possible that there
+		 * are platforms where the use of abbreviated keys should be disabled at
+		 * compile time.  Having only 4 byte datums could make worst-case
+		 * performance drastically more likely, for example.  Moreover, macOS's
+		 * strxfrm() implementation is known to not effectively concentrate a
+		 * significant amount of entropy from the original string in earlier
+		 * transformed blobs.  It's possible that other supported platforms are
+		 * similarly encumbered.  So, if we ever get past disabling this
+		 * categorically, we may still want or need to disable it for particular
+		 * platforms.
+		 */
+		if (!pg_strxfrm_enabled(locale))
+			abbreviate = false;
+	}
 
 	/*
 	 * If we're using abbreviated keys, or if we're using a locale-aware
-- 
2.43.0

Reply via email to