On Tue, 2025-11-25 at 09:44 +0800, Chao Li wrote:
> I was curious why “inline” is needed, then I figured out when I tried
> to build. Without “inline”, compile will raise warnings of “unused
> function”. So I guess it’s better to explain why “inline” is used in
> the function comment, otherwise other readers may get the same
> confusion.

That's a typical pattern: to make it "inline", move it to a .h file and
declare it as "static inline". For common patterns like that, I don't
think we should explain them in comments, because it would mean we
would start adding comments in zillions of places.

> With “change in length”, I confirmed “Unicode 5.18.2” means the
> Unicode Standard Section 5.18.2 “Complications for Case Mapping”. Why
> don’t we just give the URL in the comment.
> https://www.unicode.org/versions/Unicode17.0.0/core-spec/chapter-5/#G29675

Done, thank you. (Though since we haven't actually moved to 17 yet, I
linked to 16 instead.)

> I don’t get this change. In old code, depending on locale->ctype-
> >strfold, it calls strfold or strlower. But in this patch, it only
> calls strfold. Why? If that’s intentional, maybe better to add a
> comment to explain that.

I thought it would be slightly cleaner to just define the strfold
method in the libc provider as the same as strlower. I agree it's worth
a comment, so I added some in pg_locale_libc.c.

> In pg_strfold, the ctype==NULL fallback code is exactly the same as
> pg_strlower. I guess you intentionally to not call pg_strlower here
> for performance consideration, is that true?

I made some static functions to clean that up, and added some comments.
Thank you.

New series attached with only these changes and a rebase.

Regards,
        Jeff Davis

From a92c2595a8781bb972a301fa962083cc43ba6739 Mon Sep 17 00:00:00 2001
From: Jeff Davis <[email protected]>
Date: Thu, 20 Nov 2025 13:09:26 -0800
Subject: [PATCH v10 01/11] Inline pg_ascii_tolower() and pg_ascii_toupper().

Discussion: https://postgr.es/m/[email protected]
Reviewed-by: Chao Li <[email protected]>
---
 src/include/port.h      | 25 +++++++++++++++++++++++--
 src/port/pgstrcasecmp.c | 26 --------------------------
 2 files changed, 23 insertions(+), 28 deletions(-)

diff --git a/src/include/port.h b/src/include/port.h
index 3964d3b1293..159c2bcd7e3 100644
--- a/src/include/port.h
+++ b/src/include/port.h
@@ -169,8 +169,29 @@ extern int	pg_strcasecmp(const char *s1, const char *s2);
 extern int	pg_strncasecmp(const char *s1, const char *s2, size_t n);
 extern unsigned char pg_toupper(unsigned char ch);
 extern unsigned char pg_tolower(unsigned char ch);
-extern unsigned char pg_ascii_toupper(unsigned char ch);
-extern unsigned char pg_ascii_tolower(unsigned char ch);
+
+/*
+ * Fold a character to upper case, following C/POSIX locale rules.
+ */
+static inline unsigned char
+pg_ascii_toupper(unsigned char ch)
+{
+	if (ch >= 'a' && ch <= 'z')
+		ch += 'A' - 'a';
+	return ch;
+}
+
+/*
+ * Fold a character to lower case, following C/POSIX locale rules.
+ */
+static inline unsigned char
+pg_ascii_tolower(unsigned char ch)
+{
+	if (ch >= 'A' && ch <= 'Z')
+		ch += 'a' - 'A';
+	return ch;
+}
+
 
 /*
  * Beginning in v12, we always replace snprintf() and friends with our own
diff --git a/src/port/pgstrcasecmp.c b/src/port/pgstrcasecmp.c
index ec2b3a75c3d..17e93180381 100644
--- a/src/port/pgstrcasecmp.c
+++ b/src/port/pgstrcasecmp.c
@@ -13,10 +13,6 @@
  *
  * NB: this code should match downcase_truncate_identifier() in scansup.c.
  *
- * We also provide strict ASCII-only case conversion functions, which can
- * be used to implement C/POSIX case folding semantics no matter what the
- * C library thinks the locale is.
- *
  *
  * Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
  *
@@ -127,25 +123,3 @@ pg_tolower(unsigned char ch)
 		ch = tolower(ch);
 	return ch;
 }
-
-/*
- * Fold a character to upper case, following C/POSIX locale rules.
- */
-unsigned char
-pg_ascii_toupper(unsigned char ch)
-{
-	if (ch >= 'a' && ch <= 'z')
-		ch += 'A' - 'a';
-	return ch;
-}
-
-/*
- * Fold a character to lower case, following C/POSIX locale rules.
- */
-unsigned char
-pg_ascii_tolower(unsigned char ch)
-{
-	if (ch >= 'A' && ch <= 'Z')
-		ch += 'a' - 'A';
-	return ch;
-}
-- 
2.43.0

From bccff163e0ce2c0b1f9cb0cfdeecb32ededb898d Mon Sep 17 00:00:00 2001
From: Jeff Davis <[email protected]>
Date: Wed, 19 Nov 2025 10:11:52 -0800
Subject: [PATCH v10 02/11] Add #define for UNICODE_CASEMAP_BUFSZ.

Useful for mapping a single codepoint at a time into a
statically-allocated buffer.

Discussion: https://postgr.es/m/[email protected]
Reviewed-by: Chao Li <[email protected]>
---
 src/include/utils/pg_locale.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 683e1a0eef8..54193a17a90 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -26,6 +26,20 @@
 /* use for libc locale names */
 #define LOCALE_NAME_BUFLEN 128
 
+/*
+ * Maximum number of bytes needed to map a single codepoint. Useful for
+ * mapping and processing a single input codepoint at a time with a
+ * statically-allocated buffer.
+ *
+ * With full case mapping, an input codepoint may be mapped to as many as
+ * three output codepoints. See Unicode 16.0.0, section 5.18.2, "Change in
+ * Length":
+ *
+ * https://www.unicode.org/versions/Unicode16.0.0/core-spec/chapter-5/#G29675
+ */
+#define UNICODE_CASEMAP_LEN		3
+#define UNICODE_CASEMAP_BUFSZ	(UNICODE_CASEMAP_LEN * sizeof(char32_t))
+
 /* GUC settings */
 extern PGDLLIMPORT char *locale_messages;
 extern PGDLLIMPORT char *locale_monetary;
-- 
2.43.0

From bc9bd2209a7fb31993b04be9a79b82d0ed67c676 Mon Sep 17 00:00:00 2001
From: Jeff Davis <[email protected]>
Date: Mon, 24 Nov 2025 09:09:06 -0800
Subject: [PATCH v10 03/11] Change some callers to use pg_ascii_toupper().

The input is ASCII anyway, so it's better to be clear that it's not
locale-dependent.

Discussion: https://postgr.es/m/[email protected]
---
 src/backend/access/transam/xlogfuncs.c | 2 +-
 src/backend/utils/adt/cash.c           | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/src/backend/access/transam/xlogfuncs.c b/src/backend/access/transam/xlogfuncs.c
index 3e45fce43ed..a50345f9bf7 100644
--- a/src/backend/access/transam/xlogfuncs.c
+++ b/src/backend/access/transam/xlogfuncs.c
@@ -479,7 +479,7 @@ pg_split_walfile_name(PG_FUNCTION_ARGS)
 
 	/* Capitalize WAL file name. */
 	for (p = fname_upper; *p; p++)
-		*p = pg_toupper((unsigned char) *p);
+		*p = pg_ascii_toupper((unsigned char) *p);
 
 	if (!IsXLogFileName(fname_upper))
 		ereport(ERROR,
diff --git a/src/backend/utils/adt/cash.c b/src/backend/utils/adt/cash.c
index 611d23f3cb0..623f6eec056 100644
--- a/src/backend/utils/adt/cash.c
+++ b/src/backend/utils/adt/cash.c
@@ -1035,7 +1035,7 @@ cash_words(PG_FUNCTION_ARGS)
 	appendStringInfoString(&buf, m0 == 1 ? " cent" : " cents");
 
 	/* capitalize output */
-	buf.data[0] = pg_toupper((unsigned char) buf.data[0]);
+	buf.data[0] = pg_ascii_toupper((unsigned char) buf.data[0]);
 
 	/* return as text datum */
 	res = cstring_to_text_with_len(buf.data, buf.len);
-- 
2.43.0

From 9d1c50e406099ebe82de8a8e03d9cb2f564d76eb Mon Sep 17 00:00:00 2001
From: Jeff Davis <[email protected]>
Date: Fri, 7 Nov 2025 12:11:34 -0800
Subject: [PATCH v10 04/11] Allow pg_locale_t APIs to work when ctype_is_c.

Previously, the caller needed to check ctype_is_c first for some
routines and not others. Now, the APIs consistently work, and the
caller can just check ctype_is_c for optimization purposes.

Discussion: https://postgr.es/m/[email protected]
Reviewed-by: Chao Li <[email protected]>
---
 src/backend/utils/adt/like_support.c   | 34 ++++-------
 src/backend/utils/adt/pg_locale.c      | 78 ++++++++++++++++++++++++--
 src/backend/utils/adt/pg_locale_libc.c |  6 ++
 3 files changed, 88 insertions(+), 30 deletions(-)

diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 999f23f86d5..0debccfa67b 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -99,8 +99,6 @@ static Selectivity like_selectivity(const char *patt, int pattlen,
 static Selectivity regex_selectivity(const char *patt, int pattlen,
 									 bool case_insensitive,
 									 int fixed_prefix_len);
-static int	pattern_char_isalpha(char c, bool is_multibyte,
-								 pg_locale_t locale);
 static Const *make_greater_string(const Const *str_const, FmgrInfo *ltproc,
 								  Oid collation);
 static Datum string_to_datum(const char *str, Oid datatype);
@@ -995,7 +993,6 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 	Oid			typeid = patt_const->consttype;
 	int			pos,
 				match_pos;
-	bool		is_multibyte = (pg_database_encoding_max_length() > 1);
 	pg_locale_t locale = 0;
 
 	/* the right-hand const is type text or bytea */
@@ -1055,9 +1052,16 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 				break;
 		}
 
-		/* Stop if case-varying character (it's sort of a wildcard) */
-		if (case_insensitive &&
-			pattern_char_isalpha(patt[pos], is_multibyte, locale))
+		/*
+		 * Stop if case-varying character (it's sort of a wildcard).
+		 *
+		 * In multibyte character sets or with non-libc providers, we can't
+		 * use isalpha, and it does not seem worth trying to convert to
+		 * wchar_t or char32_t.  Instead, just pass the single byte to the
+		 * provider, which will assume any non-ASCII char is potentially
+		 * case-varying.
+		 */
+		if (case_insensitive && char_is_cased(patt[pos], locale))
 			break;
 
 		match[match_pos++] = patt[pos];
@@ -1481,24 +1485,6 @@ regex_selectivity(const char *patt, int pattlen, bool case_insensitive,
 	return sel;
 }
 
-/*
- * Check whether char is a letter (and, hence, subject to case-folding)
- *
- * In multibyte character sets or with ICU, we can't use isalpha, and it does
- * not seem worth trying to convert to wchar_t to use iswalpha or u_isalpha.
- * Instead, just assume any non-ASCII char is potentially case-varying, and
- * hard-wire knowledge of which ASCII chars are letters.
- */
-static int
-pattern_char_isalpha(char c, bool is_multibyte,
-					 pg_locale_t locale)
-{
-	if (locale->ctype_is_c)
-		return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z');
-	else
-		return char_is_cased(c, locale);
-}
-
 
 /*
  * For bytea, the increment function need only increment the current byte
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index b14c7837938..48f9d44a5f7 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1257,35 +1257,99 @@ get_collation_actual_version(char collprovider, const char *collcollate)
 	return collversion;
 }
 
+/* lowercasing/casefolding in C locale */
+static size_t
+strlower_c(char *dst, size_t dstsize, const char *src, ssize_t srclen)
+{
+	int			i;
+
+	srclen = (srclen >= 0) ? srclen : strlen(src);
+	for (i = 0; i < srclen && i < dstsize; i++)
+		dst[i] = pg_ascii_tolower(src[i]);
+	if (i < dstsize)
+		dst[i] = '\0';
+	return srclen;
+}
+
+/* titlecasing in C locale */
+static size_t
+strtitle_c(char *dst, size_t dstsize, const char *src, ssize_t srclen)
+{
+	bool		wasalnum = false;
+	int			i;
+
+	srclen = (srclen >= 0) ? srclen : strlen(src);
+	for (i = 0; i < srclen && i < dstsize; i++)
+	{
+		char		c = src[i];
+
+		if (wasalnum)
+			dst[i] = pg_ascii_tolower(c);
+		else
+			dst[i] = pg_ascii_toupper(c);
+
+		wasalnum = ((c >= '0' && c <= '9') ||
+					(c >= 'A' && c <= 'Z') ||
+					(c >= 'a' && c <= 'z'));
+	}
+	if (i < dstsize)
+		dst[i] = '\0';
+	return srclen;
+}
+
+/* uppercasing in C locale */
+static size_t
+strupper_c(char *dst, size_t dstsize, const char *src, ssize_t srclen)
+{
+	int			i;
+
+	srclen = (srclen >= 0) ? srclen : strlen(src);
+	for (i = 0; i < srclen && i < dstsize; i++)
+		dst[i] = pg_ascii_toupper(src[i]);
+	if (i < dstsize)
+		dst[i] = '\0';
+	return srclen;
+}
+
 size_t
 pg_strlower(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	return locale->ctype->strlower(dst, dstsize, src, srclen, locale);
+	if (locale->ctype == NULL)
+		return strlower_c(dst, dstsize, src, srclen);
+	else
+		return locale->ctype->strlower(dst, dstsize, src, srclen, locale);
 }
 
 size_t
 pg_strtitle(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	return locale->ctype->strtitle(dst, dstsize, src, srclen, locale);
+	if (locale->ctype == NULL)
+		return strtitle_c(dst, dstsize, src, srclen);
+	else
+		return locale->ctype->strtitle(dst, dstsize, src, srclen, locale);
 }
 
 size_t
 pg_strupper(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 			pg_locale_t locale)
 {
-	return locale->ctype->strupper(dst, dstsize, src, srclen, locale);
+	if (locale->ctype == NULL)
+		return strupper_c(dst, dstsize, src, srclen);
+	else
+		return locale->ctype->strupper(dst, dstsize, src, srclen, locale);
 }
 
 size_t
 pg_strfold(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 		   pg_locale_t locale)
 {
-	if (locale->ctype->strfold)
-		return locale->ctype->strfold(dst, dstsize, src, srclen, locale);
+	/* in the C locale, casefolding is the same as lowercasing */
+	if (locale->ctype == NULL)
+		return strlower_c(dst, dstsize, src, srclen);
 	else
-		return locale->ctype->strlower(dst, dstsize, src, srclen, locale);
+		return locale->ctype->strfold(dst, dstsize, src, srclen, locale);
 }
 
 /*
@@ -1560,6 +1624,8 @@ pg_towlower(pg_wchar wc, pg_locale_t locale)
 bool
 char_is_cased(char ch, pg_locale_t locale)
 {
+	if (locale->ctype == NULL)
+		return (ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
 	return locale->ctype->char_is_cased(ch, locale);
 }
 
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 716f005066a..26ba1be73f1 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -326,6 +326,8 @@ static const struct ctype_methods ctype_methods_libc_sb = {
 	.strlower = strlower_libc_sb,
 	.strtitle = strtitle_libc_sb,
 	.strupper = strupper_libc_sb,
+	/* in libc, casefolding is the same as lowercasing */
+	.strfold = strlower_libc_sb,
 	.wc_isdigit = wc_isdigit_libc_sb,
 	.wc_isalpha = wc_isalpha_libc_sb,
 	.wc_isalnum = wc_isalnum_libc_sb,
@@ -351,6 +353,8 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.strlower = strlower_libc_mb,
 	.strtitle = strtitle_libc_mb,
 	.strupper = strupper_libc_mb,
+	/* in libc, casefolding is the same as lowercasing */
+	.strfold = strlower_libc_mb,
 	.wc_isdigit = wc_isdigit_libc_sb,
 	.wc_isalpha = wc_isalpha_libc_sb,
 	.wc_isalnum = wc_isalnum_libc_sb,
@@ -372,6 +376,8 @@ static const struct ctype_methods ctype_methods_libc_utf8 = {
 	.strlower = strlower_libc_mb,
 	.strtitle = strtitle_libc_mb,
 	.strupper = strupper_libc_mb,
+	/* in libc, casefolding is the same as lowercasing */
+	.strfold = strlower_libc_mb,
 	.wc_isdigit = wc_isdigit_libc_mb,
 	.wc_isalpha = wc_isalpha_libc_mb,
 	.wc_isalnum = wc_isalnum_libc_mb,
-- 
2.43.0

From 6cbcec89d329892a6db3b7b8efba2e0fda4a566e Mon Sep 17 00:00:00 2001
From: Jeff Davis <[email protected]>
Date: Fri, 21 Nov 2025 12:41:47 -0800
Subject: [PATCH v10 05/11] Make regex "max_chr" depend on encoding, not
 provider.

The previous per-provider "max_chr" field was there as a hack to
preserve the exact prior behavior, which depended on the
provider. Change to depend on the encoding, which makes more sense,
and remove the per-provider logic.

The only difference is for ICU: previously it always used
MAX_SIMPLE_CHR (0x7FF) regardless of the encoding; whereas now it will
match libc and use MAX_SIMPLE_CHR for UTF-8, and MAX_UCHAR for other
encodings. That's possibly a loss for non-UTF8 multibyte encodings,
but a win for single-byte encodings. Regardless, this distinction was
not worth the complexity.

Discussion: https://postgr.es/m/[email protected]
---
 src/backend/regex/regc_pg_locale.c     | 18 ++++++++++--------
 src/backend/utils/adt/pg_locale_libc.c |  2 --
 src/include/utils/pg_locale.h          |  6 ------
 3 files changed, 10 insertions(+), 16 deletions(-)

diff --git a/src/backend/regex/regc_pg_locale.c b/src/backend/regex/regc_pg_locale.c
index 4698f110a0c..bb0e3f1d139 100644
--- a/src/backend/regex/regc_pg_locale.c
+++ b/src/backend/regex/regc_pg_locale.c
@@ -320,16 +320,18 @@ regc_ctype_get_cache(regc_wc_probefunc probefunc, int cclasscode)
 		max_chr = (pg_wchar) MAX_SIMPLE_CHR;
 #endif
 	}
+	else if (GetDatabaseEncoding() == PG_UTF8)
+	{
+		max_chr = (pg_wchar) MAX_SIMPLE_CHR;
+	}
 	else
 	{
-		if (pg_regex_locale->ctype->max_chr != 0 &&
-			pg_regex_locale->ctype->max_chr <= MAX_SIMPLE_CHR)
-		{
-			max_chr = pg_regex_locale->ctype->max_chr;
-			pcc->cv.cclasscode = -1;
-		}
-		else
-			max_chr = (pg_wchar) MAX_SIMPLE_CHR;
+#if MAX_SIMPLE_CHR >= UCHAR_MAX
+		max_chr = (pg_wchar) UCHAR_MAX;
+		pcc->cv.cclasscode = -1;
+#else
+		max_chr = (pg_wchar) MAX_SIMPLE_CHR;
+#endif
 	}
 
 	/*
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 26ba1be73f1..69b336070e9 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -342,7 +342,6 @@ static const struct ctype_methods ctype_methods_libc_sb = {
 	.char_tolower = char_tolower_libc,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
-	.max_chr = UCHAR_MAX,
 };
 
 /*
@@ -369,7 +368,6 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.char_tolower = char_tolower_libc,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
-	.max_chr = UCHAR_MAX,
 };
 
 static const struct ctype_methods ctype_methods_libc_utf8 = {
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 54193a17a90..42e21e7fb8a 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -134,12 +134,6 @@ struct ctype_methods
 	 * pg_strlower().
 	 */
 	char		(*char_tolower) (unsigned char ch, pg_locale_t locale);
-
-	/*
-	 * For regex and pattern matching efficiency, the maximum char value
-	 * supported by the above methods. If zero, limit is set by regex code.
-	 */
-	pg_wchar	max_chr;
 };
 
 /*
-- 
2.43.0

From dcd90b2751f4aaf11d6164ea9fec3309e0817125 Mon Sep 17 00:00:00 2001
From: Jeff Davis <[email protected]>
Date: Wed, 19 Nov 2025 10:20:36 -0800
Subject: [PATCH v10 06/11] Fix inconsistency between ltree_strncasecmp() and
 ltree_crc32_sz().

Previously, ltree_strncasecmp() used lowercasing with the default
collation; while ltree_crc32_sz used tolower() directly. These were
equivalent only if the default collation provider was libc and the
encoding is single-byte.

Change both to use casefolding with the default collation.

Discussion: https://postgr.es/m/[email protected]
---
 contrib/ltree/crc32.c     | 46 ++++++++++++++++++++++++++++++++-------
 contrib/ltree/lquery_op.c | 31 ++++++++++++++++++++++++--
 2 files changed, 67 insertions(+), 10 deletions(-)

diff --git a/contrib/ltree/crc32.c b/contrib/ltree/crc32.c
index 134f46a805e..3918d4a0ec2 100644
--- a/contrib/ltree/crc32.c
+++ b/contrib/ltree/crc32.c
@@ -10,31 +10,61 @@
 #include "postgres.h"
 #include "ltree.h"
 
+#include "crc32.h"
+#include "utils/pg_crc.h"
 #ifdef LOWER_NODE
-#include <ctype.h>
-#define TOLOWER(x)	tolower((unsigned char) (x))
-#else
-#define TOLOWER(x)	(x)
+#include "utils/pg_locale.h"
 #endif
 
-#include "crc32.h"
-#include "utils/pg_crc.h"
+#ifdef LOWER_NODE
 
 unsigned int
 ltree_crc32_sz(const char *buf, int size)
 {
 	pg_crc32	crc;
 	const char *p = buf;
+	static pg_locale_t locale = NULL;
+
+	if (!locale)
+		locale = pg_database_locale();
 
 	INIT_TRADITIONAL_CRC32(crc);
 	while (size > 0)
 	{
-		char		c = (char) TOLOWER(*p);
+		char		foldstr[UNICODE_CASEMAP_BUFSZ];
+		int			srclen = pg_mblen(p);
+		size_t		foldlen;
+
+		/* fold one codepoint at a time */
+		foldlen = pg_strfold(foldstr, UNICODE_CASEMAP_BUFSZ, p, srclen,
+							 locale);
+
+		COMP_TRADITIONAL_CRC32(crc, foldstr, foldlen);
+
+		size -= srclen;
+		p += srclen;
+	}
+	FIN_TRADITIONAL_CRC32(crc);
+	return (unsigned int) crc;
+}
+
+#else
 
-		COMP_TRADITIONAL_CRC32(crc, &c, 1);
+unsigned int
+ltree_crc32_sz(const char *buf, int size)
+{
+	pg_crc32	crc;
+	const char *p = buf;
+
+	INIT_TRADITIONAL_CRC32(crc);
+	while (size > 0)
+	{
+		COMP_TRADITIONAL_CRC32(crc, p, 1);
 		size--;
 		p++;
 	}
 	FIN_TRADITIONAL_CRC32(crc);
 	return (unsigned int) crc;
 }
+
+#endif							/* !LOWER_NODE */
diff --git a/contrib/ltree/lquery_op.c b/contrib/ltree/lquery_op.c
index a6466f575fd..d6754eb613f 100644
--- a/contrib/ltree/lquery_op.c
+++ b/contrib/ltree/lquery_op.c
@@ -77,10 +77,37 @@ compare_subnode(ltree_level *t, char *qn, int len, int (*cmpptr) (const char *,
 int
 ltree_strncasecmp(const char *a, const char *b, size_t s)
 {
-	char	   *al = str_tolower(a, s, DEFAULT_COLLATION_OID);
-	char	   *bl = str_tolower(b, s, DEFAULT_COLLATION_OID);
+	static pg_locale_t locale = NULL;
+	size_t		al_sz = s + 1;
+	char	   *al = palloc(al_sz);
+	size_t		bl_sz = s + 1;
+	char	   *bl = palloc(bl_sz);
+	size_t		needed;
 	int			res;
 
+	if (!locale)
+		locale = pg_database_locale();
+
+	needed = pg_strfold(al, al_sz, a, s, locale);
+	if (needed + 1 > al_sz)
+	{
+		/* grow buffer if needed and retry */
+		al_sz = needed + 1;
+		al = repalloc(al, al_sz);
+		needed = pg_strfold(al, al_sz, a, s, locale);
+		Assert(needed + 1 <= al_sz);
+	}
+
+	needed = pg_strfold(bl, bl_sz, b, s, locale);
+	if (needed + 1 > bl_sz)
+	{
+		/* grow buffer if needed and retry */
+		bl_sz = needed + 1;
+		bl = repalloc(bl, bl_sz);
+		needed = pg_strfold(bl, bl_sz, b, s, locale);
+		Assert(needed + 1 <= bl_sz);
+	}
+
 	res = strncmp(al, bl, s);
 
 	pfree(al);
-- 
2.43.0

From 04045352820f56b8403aca7a3e319532e71ca66d Mon Sep 17 00:00:00 2001
From: Jeff Davis <[email protected]>
Date: Wed, 19 Nov 2025 18:16:41 -0800
Subject: [PATCH v10 07/11] Remove char_tolower() API.

It's only useful for an ILIKE optimization for the libc provider using
a single-byte encoding and a non-C locale, but it creates significant
internal complexity.

Discussion: https://postgr.es/m/[email protected]
---
 src/backend/utils/adt/like.c           | 42 +++++++++-----------------
 src/backend/utils/adt/like_match.c     | 18 ++++++-----
 src/backend/utils/adt/pg_locale.c      | 22 --------------
 src/backend/utils/adt/pg_locale_libc.c | 10 ------
 src/include/utils/pg_locale.h          |  9 ------
 5 files changed, 25 insertions(+), 76 deletions(-)

diff --git a/src/backend/utils/adt/like.c b/src/backend/utils/adt/like.c
index 4216ac17f43..4a7fc583c71 100644
--- a/src/backend/utils/adt/like.c
+++ b/src/backend/utils/adt/like.c
@@ -43,8 +43,8 @@ static text *MB_do_like_escape(text *pat, text *esc);
 static int	UTF8_MatchText(const char *t, int tlen, const char *p, int plen,
 						   pg_locale_t locale);
 
-static int	SB_IMatchText(const char *t, int tlen, const char *p, int plen,
-						  pg_locale_t locale);
+static int	C_IMatchText(const char *t, int tlen, const char *p, int plen,
+						 pg_locale_t locale);
 
 static int	GenericMatchText(const char *s, int slen, const char *p, int plen, Oid collation);
 static int	Generic_Text_IC_like(text *str, text *pat, Oid collation);
@@ -84,22 +84,10 @@ wchareq(const char *p1, const char *p2)
  * of getting a single character transformed to the system's wchar_t format.
  * So now, we just downcase the strings using lower() and apply regular LIKE
  * comparison.  This should be revisited when we install better locale support.
- */
-
-/*
- * We do handle case-insensitive matching for single-byte encodings using
+ *
+ * We do handle case-insensitive matching for the C locale using
  * fold-on-the-fly processing, however.
  */
-static char
-SB_lower_char(unsigned char c, pg_locale_t locale)
-{
-	if (locale->ctype_is_c)
-		return pg_ascii_tolower(c);
-	else if (locale->is_default)
-		return pg_tolower(c);
-	else
-		return char_tolower(c, locale);
-}
 
 
 #define NextByte(p, plen)	((p)++, (plen)--)
@@ -131,9 +119,9 @@ SB_lower_char(unsigned char c, pg_locale_t locale)
 #include "like_match.c"
 
 /* setup to compile like_match.c for single byte case insensitive matches */
-#define MATCH_LOWER(t, locale) SB_lower_char((unsigned char) (t), locale)
+#define MATCH_LOWER
 #define NextChar(p, plen) NextByte((p), (plen))
-#define MatchText SB_IMatchText
+#define MatchText C_IMatchText
 
 #include "like_match.c"
 
@@ -202,22 +190,17 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 				 errmsg("nondeterministic collations are not supported for ILIKE")));
 
 	/*
-	 * For efficiency reasons, in the single byte case we don't call lower()
-	 * on the pattern and text, but instead call SB_lower_char on each
-	 * character.  In the multi-byte case we don't have much choice :-(. Also,
-	 * ICU does not support single-character case folding, so we go the long
-	 * way.
+	 * For efficiency reasons, in the C locale we don't call lower() on the
+	 * pattern and text, but instead call SB_lower_char on each character.
 	 */
 
-	if (locale->ctype_is_c ||
-		(char_tolower_enabled(locale) &&
-		 pg_database_encoding_max_length() == 1))
+	if (locale->ctype_is_c)
 	{
 		p = VARDATA_ANY(pat);
 		plen = VARSIZE_ANY_EXHDR(pat);
 		s = VARDATA_ANY(str);
 		slen = VARSIZE_ANY_EXHDR(str);
-		return SB_IMatchText(s, slen, p, plen, locale);
+		return C_IMatchText(s, slen, p, plen, locale);
 	}
 	else
 	{
@@ -229,10 +212,13 @@ Generic_Text_IC_like(text *str, text *pat, Oid collation)
 													 PointerGetDatum(str)));
 		s = VARDATA_ANY(str);
 		slen = VARSIZE_ANY_EXHDR(str);
+
 		if (GetDatabaseEncoding() == PG_UTF8)
 			return UTF8_MatchText(s, slen, p, plen, 0);
-		else
+		else if (pg_database_encoding_max_length() > 1)
 			return MB_MatchText(s, slen, p, plen, 0);
+		else
+			return SB_MatchText(s, slen, p, plen, 0);
 	}
 }
 
diff --git a/src/backend/utils/adt/like_match.c b/src/backend/utils/adt/like_match.c
index 892f8a745ea..54846c9541d 100644
--- a/src/backend/utils/adt/like_match.c
+++ b/src/backend/utils/adt/like_match.c
@@ -70,10 +70,14 @@
  *--------------------
  */
 
+/*
+ * MATCH_LOWER is defined for ILIKE in the C locale as an optimization. Other
+ * locales must casefold the inputs before matching.
+ */
 #ifdef MATCH_LOWER
-#define GETCHAR(t, locale) MATCH_LOWER(t, locale)
+#define GETCHAR(t) pg_ascii_tolower(t)
 #else
-#define GETCHAR(t, locale) (t)
+#define GETCHAR(t) (t)
 #endif
 
 static int
@@ -105,7 +109,7 @@ MatchText(const char *t, int tlen, const char *p, int plen, pg_locale_t locale)
 				ereport(ERROR,
 						(errcode(ERRCODE_INVALID_ESCAPE_SEQUENCE),
 						 errmsg("LIKE pattern must not end with escape character")));
-			if (GETCHAR(*p, locale) != GETCHAR(*t, locale))
+			if (GETCHAR(*p) != GETCHAR(*t))
 				return LIKE_FALSE;
 		}
 		else if (*p == '%')
@@ -167,14 +171,14 @@ MatchText(const char *t, int tlen, const char *p, int plen, pg_locale_t locale)
 					ereport(ERROR,
 							(errcode(ERRCODE_INVALID_ESCAPE_SEQUENCE),
 							 errmsg("LIKE pattern must not end with escape character")));
-				firstpat = GETCHAR(p[1], locale);
+				firstpat = GETCHAR(p[1]);
 			}
 			else
-				firstpat = GETCHAR(*p, locale);
+				firstpat = GETCHAR(*p);
 
 			while (tlen > 0)
 			{
-				if (GETCHAR(*t, locale) == firstpat || (locale && !locale->deterministic))
+				if (GETCHAR(*t) == firstpat || (locale && !locale->deterministic))
 				{
 					int			matched = MatchText(t, tlen, p, plen, locale);
 
@@ -342,7 +346,7 @@ MatchText(const char *t, int tlen, const char *p, int plen, pg_locale_t locale)
 					NextChar(t1, t1len);
 			}
 		}
-		else if (GETCHAR(*p, locale) != GETCHAR(*t, locale))
+		else if (GETCHAR(*p) != GETCHAR(*t))
 		{
 			/* non-wildcard pattern char fails to match text char */
 			return LIKE_FALSE;
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 48f9d44a5f7..5aba277ba99 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1629,28 +1629,6 @@ char_is_cased(char ch, pg_locale_t locale)
 	return locale->ctype->char_is_cased(ch, locale);
 }
 
-/*
- * char_tolower_enabled()
- *
- * Does the provider support char_tolower()?
- */
-bool
-char_tolower_enabled(pg_locale_t locale)
-{
-	return (locale->ctype->char_tolower != NULL);
-}
-
-/*
- * char_tolower()
- *
- * Convert char (single-byte encoding) to lowercase.
- */
-char
-char_tolower(unsigned char ch, pg_locale_t locale)
-{
-	return locale->ctype->char_tolower(ch, locale);
-}
-
 /*
  * Return required encoding ID for the given locale, or -1 if any encoding is
  * valid for the locale.
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 69b336070e9..545ee9a3099 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -248,13 +248,6 @@ wc_isxdigit_libc_mb(pg_wchar wc, pg_locale_t locale)
 #endif
 }
 
-static char
-char_tolower_libc(unsigned char ch, pg_locale_t locale)
-{
-	Assert(pg_database_encoding_max_length() == 1);
-	return tolower_l(ch, locale->lt);
-}
-
 static bool
 char_is_cased_libc(char ch, pg_locale_t locale)
 {
@@ -339,7 +332,6 @@ static const struct ctype_methods ctype_methods_libc_sb = {
 	.wc_isspace = wc_isspace_libc_sb,
 	.wc_isxdigit = wc_isxdigit_libc_sb,
 	.char_is_cased = char_is_cased_libc,
-	.char_tolower = char_tolower_libc,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
 };
@@ -365,7 +357,6 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.wc_isspace = wc_isspace_libc_sb,
 	.wc_isxdigit = wc_isxdigit_libc_sb,
 	.char_is_cased = char_is_cased_libc,
-	.char_tolower = char_tolower_libc,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
 };
@@ -387,7 +378,6 @@ static const struct ctype_methods ctype_methods_libc_utf8 = {
 	.wc_isspace = wc_isspace_libc_mb,
 	.wc_isxdigit = wc_isxdigit_libc_mb,
 	.char_is_cased = char_is_cased_libc,
-	.char_tolower = char_tolower_libc,
 	.wc_toupper = toupper_libc_mb,
 	.wc_tolower = tolower_libc_mb,
 };
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 42e21e7fb8a..50520e50127 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -127,13 +127,6 @@ struct ctype_methods
 
 	/* required */
 	bool		(*char_is_cased) (char ch, pg_locale_t locale);
-
-	/*
-	 * Optional. If defined, will only be called for single-byte encodings. If
-	 * not defined, or if the encoding is multibyte, will fall back to
-	 * pg_strlower().
-	 */
-	char		(*char_tolower) (unsigned char ch, pg_locale_t locale);
 };
 
 /*
@@ -185,8 +178,6 @@ extern pg_locale_t pg_newlocale_from_collation(Oid collid);
 extern char *get_collation_actual_version(char collprovider, const char *collcollate);
 
 extern bool char_is_cased(char ch, pg_locale_t locale);
-extern bool char_tolower_enabled(pg_locale_t locale);
-extern char char_tolower(unsigned char ch, pg_locale_t locale);
 extern size_t pg_strlower(char *dst, size_t dstsize,
 						  const char *src, ssize_t srclen,
 						  pg_locale_t locale);
-- 
2.43.0

From 94a0e519f13cfc8554d11cf46ed7bbef8aad2ed3 Mon Sep 17 00:00:00 2001
From: Jeff Davis <[email protected]>
Date: Fri, 21 Nov 2025 12:14:21 -0800
Subject: [PATCH v10 08/11] Use multibyte-aware extraction of pattern prefixes.

Previously, like_fixed_prefix() used char-at-a-time logic, which
forced it to be too conservative for case-insensitive matching.

Now, use pg_wchar-at-a-time loop for text types, along with proper
detection of cased characters; and preserve and char-at-a-time logic
for bytea.

Removes the pg_locale_t char_is_cased() single-byte method and
replaces it with a proper multibyte pg_iswcased() method.

Discussion: https://postgr.es/m/[email protected]
---
 src/backend/utils/adt/like_support.c      | 111 +++++++++++++---------
 src/backend/utils/adt/pg_locale.c         |  26 +++--
 src/backend/utils/adt/pg_locale_builtin.c |   7 +-
 src/backend/utils/adt/pg_locale_icu.c     |  15 ++-
 src/backend/utils/adt/pg_locale_libc.c    |  23 +++--
 src/include/utils/pg_locale.h             |   5 +-
 6 files changed, 103 insertions(+), 84 deletions(-)

diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c
index 0debccfa67b..e7255fa652a 100644
--- a/src/backend/utils/adt/like_support.c
+++ b/src/backend/utils/adt/like_support.c
@@ -987,12 +987,11 @@ static Pattern_Prefix_Status
 like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 				  Const **prefix_const, Selectivity *rest_selec)
 {
-	char	   *match;
 	char	   *patt;
 	int			pattlen;
 	Oid			typeid = patt_const->consttype;
-	int			pos,
-				match_pos;
+	int			pos;
+	int			match_pos = 0;
 	pg_locale_t locale = 0;
 
 	/* the right-hand const is type text or bytea */
@@ -1020,67 +1019,91 @@ like_fixed_prefix(Const *patt_const, bool case_insensitive, Oid collation,
 		locale = pg_newlocale_from_collation(collation);
 	}
 
+	/* for text types, use pg_wchar; for BYTEA, use char */
 	if (typeid != BYTEAOID)
 	{
-		patt = TextDatumGetCString(patt_const->constvalue);
-		pattlen = strlen(patt);
+		text	   *val = DatumGetTextPP(patt_const->constvalue);
+		pg_wchar   *wpatt;
+		pg_wchar   *wmatch;
+		char	   *match;
+
+		patt = VARDATA_ANY(val);
+		pattlen = VARSIZE_ANY_EXHDR(val);
+		wpatt = palloc((pattlen + 1) * sizeof(pg_wchar));
+		wmatch = palloc((pattlen + 1) * sizeof(pg_wchar));
+		pg_mb2wchar_with_len(patt, wpatt, pattlen);
+
+		match = palloc(pattlen + 1);
+		for (pos = 0; pos < pattlen; pos++)
+		{
+			/* % and _ are wildcard characters in LIKE */
+			if (wpatt[pos] == '%' ||
+				wpatt[pos] == '_')
+				break;
+
+			/* Backslash escapes the next character */
+			if (wpatt[pos] == '\\')
+			{
+				pos++;
+				if (pos >= pattlen)
+					break;
+			}
+
+			/*
+			 * For ILIKE, stop if it's a case-varying character (it's sort of
+			 * a wildcard).
+			 */
+			if (case_insensitive && pg_iswcased(wpatt[pos], locale))
+				break;
+
+			wmatch[match_pos++] = wpatt[pos];
+		}
+
+		wmatch[match_pos] = '\0';
+
+		pg_wchar2mb_with_len(wmatch, match, pattlen);
+
+		pfree(wpatt);
+		pfree(wmatch);
+
+		*prefix_const = string_to_const(match, typeid);
 	}
 	else
 	{
 		bytea	   *bstr = DatumGetByteaPP(patt_const->constvalue);
+		char	   *match;
 
+		patt = VARDATA_ANY(bstr);
 		pattlen = VARSIZE_ANY_EXHDR(bstr);
-		patt = (char *) palloc(pattlen);
-		memcpy(patt, VARDATA_ANY(bstr), pattlen);
-		Assert((Pointer) bstr == DatumGetPointer(patt_const->constvalue));
-	}
 
-	match = palloc(pattlen + 1);
-	match_pos = 0;
-	for (pos = 0; pos < pattlen; pos++)
-	{
-		/* % and _ are wildcard characters in LIKE */
-		if (patt[pos] == '%' ||
-			patt[pos] == '_')
-			break;
-
-		/* Backslash escapes the next character */
-		if (patt[pos] == '\\')
+		match = palloc(pattlen + 1);
+		for (pos = 0; pos < pattlen; pos++)
 		{
-			pos++;
-			if (pos >= pattlen)
+			/* % and _ are wildcard characters in LIKE */
+			if (patt[pos] == '%' ||
+				patt[pos] == '_')
 				break;
-		}
 
-		/*
-		 * Stop if case-varying character (it's sort of a wildcard).
-		 *
-		 * In multibyte character sets or with non-libc providers, we can't
-		 * use isalpha, and it does not seem worth trying to convert to
-		 * wchar_t or char32_t.  Instead, just pass the single byte to the
-		 * provider, which will assume any non-ASCII char is potentially
-		 * case-varying.
-		 */
-		if (case_insensitive && char_is_cased(patt[pos], locale))
-			break;
-
-		match[match_pos++] = patt[pos];
-	}
+			/* Backslash escapes the next character */
+			if (patt[pos] == '\\')
+			{
+				pos++;
+				if (pos >= pattlen)
+					break;
+			}
 
-	match[match_pos] = '\0';
+			match[match_pos++] = pos;
+		}
 
-	if (typeid != BYTEAOID)
-		*prefix_const = string_to_const(match, typeid);
-	else
 		*prefix_const = string_to_bytea_const(match, match_pos);
 
+		pfree(match);
+	}
+
 	if (rest_selec != NULL)
 		*rest_selec = like_selectivity(&patt[pos], pattlen - pos,
 									   case_insensitive);
 
-	pfree(patt);
-	pfree(match);
-
 	/* in LIKE, an empty pattern is an exact match! */
 	if (pos == pattlen)
 		return Pattern_Prefix_Exact;	/* reached end of pattern, so exact */
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 5aba277ba99..c4e89502f85 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1588,6 +1588,17 @@ pg_iswxdigit(pg_wchar wc, pg_locale_t locale)
 		return locale->ctype->wc_isxdigit(wc, locale);
 }
 
+bool
+pg_iswcased(pg_wchar wc, pg_locale_t locale)
+{
+	/* for the C locale, Cased and Alpha are equivalent */
+	if (locale->ctype == NULL)
+		return (wc <= (pg_wchar) 127 &&
+				(pg_char_properties[wc] & PG_ISALPHA));
+	else
+		return locale->ctype->wc_iscased(wc, locale);
+}
+
 pg_wchar
 pg_towupper(pg_wchar wc, pg_locale_t locale)
 {
@@ -1614,21 +1625,6 @@ pg_towlower(pg_wchar wc, pg_locale_t locale)
 		return locale->ctype->wc_tolower(wc, locale);
 }
 
-/*
- * char_is_cased()
- *
- * Fuzzy test of whether the given char is case-varying or not. The argument
- * is a single byte, so in a multibyte encoding, just assume any non-ASCII
- * char is case-varying.
- */
-bool
-char_is_cased(char ch, pg_locale_t locale)
-{
-	if (locale->ctype == NULL)
-		return (ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
-	return locale->ctype->char_is_cased(ch, locale);
-}
-
 /*
  * Return required encoding ID for the given locale, or -1 if any encoding is
  * valid for the locale.
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 1021e0d129b..0c2920112bb 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -186,10 +186,9 @@ wc_isxdigit_builtin(pg_wchar wc, pg_locale_t locale)
 }
 
 static bool
-char_is_cased_builtin(char ch, pg_locale_t locale)
+wc_iscased_builtin(pg_wchar wc, pg_locale_t locale)
 {
-	return IS_HIGHBIT_SET(ch) ||
-		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
+	return pg_u_prop_cased(to_char32(wc));
 }
 
 static pg_wchar
@@ -219,7 +218,7 @@ static const struct ctype_methods ctype_methods_builtin = {
 	.wc_ispunct = wc_ispunct_builtin,
 	.wc_isspace = wc_isspace_builtin,
 	.wc_isxdigit = wc_isxdigit_builtin,
-	.char_is_cased = char_is_cased_builtin,
+	.wc_iscased = wc_iscased_builtin,
 	.wc_tolower = wc_tolower_builtin,
 	.wc_toupper = wc_toupper_builtin,
 };
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index f5a0cc8fe41..18d026deda8 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -121,13 +121,6 @@ static int32_t u_strFoldCase_default(UChar *dest, int32_t destCapacity,
 									 const char *locale,
 									 UErrorCode *pErrorCode);
 
-static bool
-char_is_cased_icu(char ch, pg_locale_t locale)
-{
-	return IS_HIGHBIT_SET(ch) ||
-		(ch >= 'A' && ch <= 'Z') || (ch >= 'a' && ch <= 'z');
-}
-
 /*
  * XXX: many of the functions below rely on casts directly from pg_wchar to
  * UChar32, which is correct for the UTF-8 encoding, but not in general.
@@ -223,6 +216,12 @@ wc_isxdigit_icu(pg_wchar wc, pg_locale_t locale)
 	return u_isxdigit(wc);
 }
 
+static bool
+wc_iscased_icu(pg_wchar wc, pg_locale_t locale)
+{
+	return u_hasBinaryProperty(wc, UCHAR_CASED);
+}
+
 static const struct ctype_methods ctype_methods_icu = {
 	.strlower = strlower_icu,
 	.strtitle = strtitle_icu,
@@ -238,7 +237,7 @@ static const struct ctype_methods ctype_methods_icu = {
 	.wc_ispunct = wc_ispunct_icu,
 	.wc_isspace = wc_isspace_icu,
 	.wc_isxdigit = wc_isxdigit_icu,
-	.char_is_cased = char_is_cased_icu,
+	.wc_iscased = wc_iscased_icu,
 	.wc_toupper = toupper_icu,
 	.wc_tolower = tolower_icu,
 };
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index 545ee9a3099..fa419863fa7 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -184,6 +184,13 @@ wc_isxdigit_libc_sb(pg_wchar wc, pg_locale_t locale)
 #endif
 }
 
+static bool
+wc_iscased_libc_sb(pg_wchar wc, pg_locale_t locale)
+{
+	return isupper_l((unsigned char) wc, locale->lt) ||
+		islower_l((unsigned char) wc, locale->lt);
+}
+
 static bool
 wc_isdigit_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
@@ -249,14 +256,10 @@ wc_isxdigit_libc_mb(pg_wchar wc, pg_locale_t locale)
 }
 
 static bool
-char_is_cased_libc(char ch, pg_locale_t locale)
+wc_iscased_libc_mb(pg_wchar wc, pg_locale_t locale)
 {
-	bool		is_multibyte = pg_database_encoding_max_length() > 1;
-
-	if (is_multibyte && IS_HIGHBIT_SET(ch))
-		return true;
-	else
-		return isalpha_l((unsigned char) ch, locale->lt);
+	return iswupper_l((wint_t) wc, locale->lt) ||
+		iswlower_l((wint_t) wc, locale->lt);
 }
 
 static pg_wchar
@@ -331,7 +334,7 @@ static const struct ctype_methods ctype_methods_libc_sb = {
 	.wc_ispunct = wc_ispunct_libc_sb,
 	.wc_isspace = wc_isspace_libc_sb,
 	.wc_isxdigit = wc_isxdigit_libc_sb,
-	.char_is_cased = char_is_cased_libc,
+	.wc_iscased = wc_iscased_libc_sb,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
 };
@@ -356,7 +359,7 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.wc_ispunct = wc_ispunct_libc_sb,
 	.wc_isspace = wc_isspace_libc_sb,
 	.wc_isxdigit = wc_isxdigit_libc_sb,
-	.char_is_cased = char_is_cased_libc,
+	.wc_iscased = wc_iscased_libc_sb,
 	.wc_toupper = toupper_libc_sb,
 	.wc_tolower = tolower_libc_sb,
 };
@@ -377,7 +380,7 @@ static const struct ctype_methods ctype_methods_libc_utf8 = {
 	.wc_ispunct = wc_ispunct_libc_mb,
 	.wc_isspace = wc_isspace_libc_mb,
 	.wc_isxdigit = wc_isxdigit_libc_mb,
-	.char_is_cased = char_is_cased_libc,
+	.wc_iscased = wc_iscased_libc_mb,
 	.wc_toupper = toupper_libc_mb,
 	.wc_tolower = tolower_libc_mb,
 };
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 50520e50127..01f891def7a 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -122,11 +122,9 @@ struct ctype_methods
 	bool		(*wc_ispunct) (pg_wchar wc, pg_locale_t locale);
 	bool		(*wc_isspace) (pg_wchar wc, pg_locale_t locale);
 	bool		(*wc_isxdigit) (pg_wchar wc, pg_locale_t locale);
+	bool		(*wc_iscased) (pg_wchar wc, pg_locale_t locale);
 	pg_wchar	(*wc_toupper) (pg_wchar wc, pg_locale_t locale);
 	pg_wchar	(*wc_tolower) (pg_wchar wc, pg_locale_t locale);
-
-	/* required */
-	bool		(*char_is_cased) (char ch, pg_locale_t locale);
 };
 
 /*
@@ -214,6 +212,7 @@ extern bool pg_iswprint(pg_wchar wc, pg_locale_t locale);
 extern bool pg_iswpunct(pg_wchar wc, pg_locale_t locale);
 extern bool pg_iswspace(pg_wchar wc, pg_locale_t locale);
 extern bool pg_iswxdigit(pg_wchar wc, pg_locale_t locale);
+extern bool pg_iswcased(pg_wchar wc, pg_locale_t locale);
 extern pg_wchar pg_towupper(pg_wchar wc, pg_locale_t locale);
 extern pg_wchar pg_towlower(pg_wchar wc, pg_locale_t locale);
 
-- 
2.43.0

From c32561a8447abda8fa03e45f3bfc857227b3d43a Mon Sep 17 00:00:00 2001
From: Jeff Davis <[email protected]>
Date: Wed, 19 Nov 2025 13:24:38 -0800
Subject: [PATCH v10 09/11] fuzzystrmatch: use pg_ascii_toupper().

fuzzystrmatch is designed for ASCII, so no need to rely on the global
LC_CTYPE setting.

Discussion: https://postgr.es/m/[email protected]
---
 contrib/fuzzystrmatch/dmetaphone.c    |  2 +-
 contrib/fuzzystrmatch/fuzzystrmatch.c | 16 ++++++++--------
 2 files changed, 9 insertions(+), 9 deletions(-)

diff --git a/contrib/fuzzystrmatch/dmetaphone.c b/contrib/fuzzystrmatch/dmetaphone.c
index 6627b2b8943..bb5d3e90756 100644
--- a/contrib/fuzzystrmatch/dmetaphone.c
+++ b/contrib/fuzzystrmatch/dmetaphone.c
@@ -284,7 +284,7 @@ MakeUpper(metastring *s)
 	char	   *i;
 
 	for (i = s->str; *i; i++)
-		*i = toupper((unsigned char) *i);
+		*i = pg_ascii_toupper((unsigned char) *i);
 }
 
 
diff --git a/contrib/fuzzystrmatch/fuzzystrmatch.c b/contrib/fuzzystrmatch/fuzzystrmatch.c
index e7cc314b763..7f07efc2c35 100644
--- a/contrib/fuzzystrmatch/fuzzystrmatch.c
+++ b/contrib/fuzzystrmatch/fuzzystrmatch.c
@@ -62,7 +62,7 @@ static const char *const soundex_table = "01230120022455012623010202";
 static char
 soundex_code(char letter)
 {
-	letter = toupper((unsigned char) letter);
+	letter = pg_ascii_toupper((unsigned char) letter);
 	/* Defend against non-ASCII letters */
 	if (letter >= 'A' && letter <= 'Z')
 		return soundex_table[letter - 'A'];
@@ -124,7 +124,7 @@ getcode(char c)
 {
 	if (isalpha((unsigned char) c))
 	{
-		c = toupper((unsigned char) c);
+		c = pg_ascii_toupper((unsigned char) c);
 		/* Defend against non-ASCII letters */
 		if (c >= 'A' && c <= 'Z')
 			return _codes[c - 'A'];
@@ -301,18 +301,18 @@ metaphone(PG_FUNCTION_ARGS)
  * accessing the array directly... */
 
 /* Look at the next letter in the word */
-#define Next_Letter (toupper((unsigned char) word[w_idx+1]))
+#define Next_Letter (pg_ascii_toupper((unsigned char) word[w_idx+1]))
 /* Look at the current letter in the word */
-#define Curr_Letter (toupper((unsigned char) word[w_idx]))
+#define Curr_Letter (pg_ascii_toupper((unsigned char) word[w_idx]))
 /* Go N letters back. */
 #define Look_Back_Letter(n) \
-	(w_idx >= (n) ? toupper((unsigned char) word[w_idx-(n)]) : '\0')
+	(w_idx >= (n) ? pg_ascii_toupper((unsigned char) word[w_idx-(n)]) : '\0')
 /* Previous letter.  I dunno, should this return null on failure? */
 #define Prev_Letter (Look_Back_Letter(1))
 /* Look two letters down.  It makes sure you don't walk off the string. */
 #define After_Next_Letter \
-	(Next_Letter != '\0' ? toupper((unsigned char) word[w_idx+2]) : '\0')
-#define Look_Ahead_Letter(n) toupper((unsigned char) Lookahead(word+w_idx, n))
+	(Next_Letter != '\0' ? pg_ascii_toupper((unsigned char) word[w_idx+2]) : '\0')
+#define Look_Ahead_Letter(n) pg_ascii_toupper((unsigned char) Lookahead(word+w_idx, n))
 
 
 /* Allows us to safely look ahead an arbitrary # of letters */
@@ -742,7 +742,7 @@ _soundex(const char *instr, char *outstr)
 	}
 
 	/* Take the first letter as is */
-	*outstr++ = (char) toupper((unsigned char) *instr++);
+	*outstr++ = (char) pg_ascii_toupper((unsigned char) *instr++);
 
 	count = 1;
 	while (*instr && count < SOUNDEX_LEN)
-- 
2.43.0

From 66bcc8c8de81ba83e90fd43a48acdb214c85139e Mon Sep 17 00:00:00 2001
From: Jeff Davis <[email protected]>
Date: Mon, 20 Oct 2025 16:32:18 -0700
Subject: [PATCH v10 10/11] downcase_identifier(): use method table from locale
 provider.

Previously, libc's tolower() was always used for identifier case
folding, regardless of the database locale (though only characters
beyond 127 in single-byte encodings were affected). Refactor to allow
each provider to supply its own implementation of identifier
casefolding.

For historical compatibility, when using a single-byte encoding, ICU
still relies on tolower().

One minor behavior change is that, before the database default locale
is initialized, it uses ASCII semantics to fold the
identifiers. Previously, it would use the postmaster's LC_CTYPE
setting from the environment. While that could have some effect during
GUC processing, for example, it would have been fragile to rely on the
environment setting anyway. (Also, it only matters when the encoding
is single-byte.)

Discussion: https://postgr.es/m/[email protected]
---
 src/backend/parser/scansup.c              | 39 +++++++---------
 src/backend/utils/adt/pg_locale.c         | 32 +++++++++++++
 src/backend/utils/adt/pg_locale_builtin.c | 24 ++++++++++
 src/backend/utils/adt/pg_locale_icu.c     | 36 ++++++++++++++-
 src/backend/utils/adt/pg_locale_libc.c    | 55 +++++++++++++++++++++++
 src/include/utils/pg_locale.h             |  5 +++
 6 files changed, 166 insertions(+), 25 deletions(-)

diff --git a/src/backend/parser/scansup.c b/src/backend/parser/scansup.c
index 2feb2b6cf5a..0bd049643d1 100644
--- a/src/backend/parser/scansup.c
+++ b/src/backend/parser/scansup.c
@@ -18,6 +18,7 @@
 
 #include "mb/pg_wchar.h"
 #include "parser/scansup.h"
+#include "utils/pg_locale.h"
 
 
 /*
@@ -46,35 +47,25 @@ char *
 downcase_identifier(const char *ident, int len, bool warn, bool truncate)
 {
 	char	   *result;
-	int			i;
-	bool		enc_is_single_byte;
-
-	result = palloc(len + 1);
-	enc_is_single_byte = pg_database_encoding_max_length() == 1;
+	size_t		dstsize;
+	size_t		needed pg_attribute_unused();
 
 	/*
-	 * SQL99 specifies Unicode-aware case normalization, which we don't yet
-	 * have the infrastructure for.  Instead we use tolower() to provide a
-	 * locale-aware translation.  However, there are some locales where this
-	 * is not right either (eg, Turkish may do strange things with 'i' and
-	 * 'I').  Our current compromise is to use tolower() for characters with
-	 * the high bit set, as long as they aren't part of a multi-byte
-	 * character, and use an ASCII-only downcasing for 7-bit characters.
+	 * Preserves string length.
+	 *
+	 * NB: if we decide to support Unicode-aware identifier case folding, then
+	 * we need to account for a change in string length.
 	 */
-	for (i = 0; i < len; i++)
-	{
-		unsigned char ch = (unsigned char) ident[i];
+	dstsize = len + 1;
+	result = palloc(dstsize);
 
-		if (ch >= 'A' && ch <= 'Z')
-			ch += 'a' - 'A';
-		else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch))
-			ch = tolower(ch);
-		result[i] = (char) ch;
-	}
-	result[i] = '\0';
+	needed = pg_strfold_ident(result, dstsize, ident, len);
+	Assert(needed + 1 == dstsize);
+	Assert(needed == len);
+	Assert(result[len] == '\0');
 
-	if (i >= NAMEDATALEN && truncate)
-		truncate_identifier(result, i, warn);
+	if (len >= NAMEDATALEN && truncate)
+		truncate_identifier(result, len, warn);
 
 	return result;
 }
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index c4e89502f85..9167018c85b 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1352,6 +1352,38 @@ pg_strfold(char *dst, size_t dstsize, const char *src, ssize_t srclen,
 		return locale->ctype->strfold(dst, dstsize, src, srclen, locale);
 }
 
+/*
+ * Fold an identifier using the database default locale.
+ *
+ * For historical reasons, does not use ordinary locale behavior. Should only
+ * be used for identifier folding. XXX: can we make this equivalent to
+ * pg_strfold(..., default_locale)?
+ */
+size_t
+pg_strfold_ident(char *dest, size_t destsize, const char *src, ssize_t srclen)
+{
+	if (default_locale == NULL || default_locale->ctype == NULL)
+	{
+		int			i;
+
+		for (i = 0; i < srclen && i < destsize; i++)
+		{
+			unsigned char ch = (unsigned char) src[i];
+
+			if (ch >= 'A' && ch <= 'Z')
+				ch += 'a' - 'A';
+			dest[i] = (char) ch;
+		}
+
+		if (i < destsize)
+			dest[i] = '\0';
+
+		return srclen;
+	}
+	return default_locale->ctype->strfold_ident(dest, destsize, src, srclen,
+												default_locale);
+}
+
 /*
  * pg_strcoll
  *
diff --git a/src/backend/utils/adt/pg_locale_builtin.c b/src/backend/utils/adt/pg_locale_builtin.c
index 0c2920112bb..659e588d513 100644
--- a/src/backend/utils/adt/pg_locale_builtin.c
+++ b/src/backend/utils/adt/pg_locale_builtin.c
@@ -125,6 +125,29 @@ strfold_builtin(char *dest, size_t destsize, const char *src, ssize_t srclen,
 						   locale->builtin.casemap_full);
 }
 
+static size_t
+strfold_ident_builtin(char *dst, size_t dstsize, const char *src,
+					  ssize_t srclen, pg_locale_t locale)
+{
+	int			i;
+
+	Assert(GetDatabaseEncoding() == PG_UTF8);
+
+	for (i = 0; i < srclen && i < dstsize; i++)
+	{
+		unsigned char ch = (unsigned char) src[i];
+
+		if (ch >= 'A' && ch <= 'Z')
+			ch += 'a' - 'A';
+		dst[i] = (char) ch;
+	}
+
+	if (i < dstsize)
+		dst[i] = '\0';
+
+	return srclen;
+}
+
 static bool
 wc_isdigit_builtin(pg_wchar wc, pg_locale_t locale)
 {
@@ -208,6 +231,7 @@ static const struct ctype_methods ctype_methods_builtin = {
 	.strtitle = strtitle_builtin,
 	.strupper = strupper_builtin,
 	.strfold = strfold_builtin,
+	.strfold_ident = strfold_ident_builtin,
 	.wc_isdigit = wc_isdigit_builtin,
 	.wc_isalpha = wc_isalpha_builtin,
 	.wc_isalnum = wc_isalnum_builtin,
diff --git a/src/backend/utils/adt/pg_locale_icu.c b/src/backend/utils/adt/pg_locale_icu.c
index 18d026deda8..39b153a4262 100644
--- a/src/backend/utils/adt/pg_locale_icu.c
+++ b/src/backend/utils/adt/pg_locale_icu.c
@@ -61,6 +61,8 @@ static size_t strupper_icu(char *dest, size_t destsize, const char *src,
 						   ssize_t srclen, pg_locale_t locale);
 static size_t strfold_icu(char *dest, size_t destsize, const char *src,
 						  ssize_t srclen, pg_locale_t locale);
+static size_t strfold_ident_icu(char *dst, size_t dstsize, const char *src,
+								ssize_t srclen, pg_locale_t locale);
 static int	strncoll_icu(const char *arg1, ssize_t len1,
 						 const char *arg2, ssize_t len2,
 						 pg_locale_t locale);
@@ -123,7 +125,7 @@ static int32_t u_strFoldCase_default(UChar *dest, int32_t destCapacity,
 
 /*
  * XXX: many of the functions below rely on casts directly from pg_wchar to
- * UChar32, which is correct for the UTF-8 encoding, but not in general.
+ * UChar32, which is correct for UTF-8 and LATIN1, but not in general.
  */
 
 static pg_wchar
@@ -227,6 +229,7 @@ static const struct ctype_methods ctype_methods_icu = {
 	.strtitle = strtitle_icu,
 	.strupper = strupper_icu,
 	.strfold = strfold_icu,
+	.strfold_ident = strfold_ident_icu,
 	.wc_isdigit = wc_isdigit_icu,
 	.wc_isalpha = wc_isalpha_icu,
 	.wc_isalnum = wc_isalnum_icu,
@@ -564,6 +567,37 @@ strfold_icu(char *dest, size_t destsize, const char *src, ssize_t srclen,
 	return result_len;
 }
 
+/*
+ * For historical compatibility, behavior is not multibyte-aware.
+ *
+ * NB: uses libc tolower() for single-byte encodings (also for historical
+ * compatibility), and therefore relies on the global LC_CTYPE setting.
+ */
+static size_t
+strfold_ident_icu(char *dst, size_t dstsize, const char *src,
+				  ssize_t srclen, pg_locale_t locale)
+{
+	int			i;
+	bool		enc_is_single_byte;
+
+	enc_is_single_byte = pg_database_encoding_max_length() == 1;
+	for (i = 0; i < srclen && i < dstsize; i++)
+	{
+		unsigned char ch = (unsigned char) src[i];
+
+		if (ch >= 'A' && ch <= 'Z')
+			ch += 'a' - 'A';
+		else if (enc_is_single_byte && IS_HIGHBIT_SET(ch) && isupper(ch))
+			ch = tolower(ch);
+		dst[i] = (char) ch;
+	}
+
+	if (i < dstsize)
+		dst[i] = '\0';
+
+	return srclen;
+}
+
 /*
  * strncoll_icu_utf8
  *
diff --git a/src/backend/utils/adt/pg_locale_libc.c b/src/backend/utils/adt/pg_locale_libc.c
index fa419863fa7..cb6b573dd34 100644
--- a/src/backend/utils/adt/pg_locale_libc.c
+++ b/src/backend/utils/adt/pg_locale_libc.c
@@ -318,12 +318,65 @@ tolower_libc_mb(pg_wchar wc, pg_locale_t locale)
 		return wc;
 }
 
+/*
+ * Characters A..Z always fold to a..z, even in the Turkish locale. Characters
+ * beyond 127 use tolower().
+ */
+static size_t
+strfold_ident_libc_sb(char *dst, size_t dstsize, const char *src,
+					  ssize_t srclen, pg_locale_t locale)
+{
+	locale_t	loc = locale->lt;
+	int			i;
+
+	for (i = 0; i < srclen && i < dstsize; i++)
+	{
+		unsigned char ch = (unsigned char) src[i];
+
+		if (ch >= 'A' && ch <= 'Z')
+			ch += 'a' - 'A';
+		else if (IS_HIGHBIT_SET(ch) && isupper_l(ch, loc))
+			ch = tolower_l(ch, loc);
+		dst[i] = (char) ch;
+	}
+
+	if (i < dstsize)
+		dst[i] = '\0';
+
+	return srclen;
+}
+
+/*
+ * For historical reasons, not multibyte-aware; uses plain ASCII semantics.
+ */
+static size_t
+strfold_ident_libc_mb(char *dst, size_t dstsize, const char *src,
+					  ssize_t srclen, pg_locale_t locale)
+{
+	int			i;
+
+	for (i = 0; i < srclen && i < dstsize; i++)
+	{
+		unsigned char ch = (unsigned char) src[i];
+
+		if (ch >= 'A' && ch <= 'Z')
+			ch += 'a' - 'A';
+		dst[i] = (char) ch;
+	}
+
+	if (i < dstsize)
+		dst[i] = '\0';
+
+	return srclen;
+}
+
 static const struct ctype_methods ctype_methods_libc_sb = {
 	.strlower = strlower_libc_sb,
 	.strtitle = strtitle_libc_sb,
 	.strupper = strupper_libc_sb,
 	/* in libc, casefolding is the same as lowercasing */
 	.strfold = strlower_libc_sb,
+	.strfold_ident = strfold_ident_libc_sb,
 	.wc_isdigit = wc_isdigit_libc_sb,
 	.wc_isalpha = wc_isalpha_libc_sb,
 	.wc_isalnum = wc_isalnum_libc_sb,
@@ -349,6 +402,7 @@ static const struct ctype_methods ctype_methods_libc_other_mb = {
 	.strupper = strupper_libc_mb,
 	/* in libc, casefolding is the same as lowercasing */
 	.strfold = strlower_libc_mb,
+	.strfold_ident = strfold_ident_libc_mb,
 	.wc_isdigit = wc_isdigit_libc_sb,
 	.wc_isalpha = wc_isalpha_libc_sb,
 	.wc_isalnum = wc_isalnum_libc_sb,
@@ -370,6 +424,7 @@ static const struct ctype_methods ctype_methods_libc_utf8 = {
 	.strupper = strupper_libc_mb,
 	/* in libc, casefolding is the same as lowercasing */
 	.strfold = strlower_libc_mb,
+	.strfold_ident = strfold_ident_libc_mb,
 	.wc_isdigit = wc_isdigit_libc_mb,
 	.wc_isalpha = wc_isalpha_libc_mb,
 	.wc_isalnum = wc_isalnum_libc_mb,
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 01f891def7a..53574d2ef85 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -110,6 +110,9 @@ struct ctype_methods
 	size_t		(*strfold) (char *dest, size_t destsize,
 							const char *src, ssize_t srclen,
 							pg_locale_t locale);
+	size_t		(*strfold_ident) (char *dest, size_t destsize,
+								  const char *src, ssize_t srclen,
+								  pg_locale_t locale);
 
 	/* required */
 	bool		(*wc_isdigit) (pg_wchar wc, pg_locale_t locale);
@@ -188,6 +191,8 @@ extern size_t pg_strupper(char *dst, size_t dstsize,
 extern size_t pg_strfold(char *dst, size_t dstsize,
 						 const char *src, ssize_t srclen,
 						 pg_locale_t locale);
+extern size_t pg_strfold_ident(char *dst, size_t dstsize,
+							   const char *src, ssize_t srclen);
 extern int	pg_strcoll(const char *arg1, const char *arg2, pg_locale_t locale);
 extern int	pg_strncoll(const char *arg1, ssize_t len1,
 						const char *arg2, ssize_t len2, pg_locale_t locale);
-- 
2.43.0

From c9661114336dc16e81761b48a151c476cc4d4dea Mon Sep 17 00:00:00 2001
From: Jeff Davis <[email protected]>
Date: Mon, 24 Nov 2025 14:00:52 -0800
Subject: [PATCH v10 11/11] Control LC_COLLATE with GUC.

Now that the global LC_COLLATE setting is not used for any in-core
purpose at all (see commit 5e6e42e44f), allow it to be set with a
GUC. This may be useful for extensions or procedural languages that
still depend on the global LC_COLLATE setting.

Discussion: https://postgr.es/m/[email protected]
---
 src/backend/utils/adt/pg_locale.c             | 59 +++++++++++++++++++
 src/backend/utils/init/postinit.c             |  2 +
 src/backend/utils/misc/guc_parameters.dat     |  9 +++
 src/backend/utils/misc/postgresql.conf.sample |  2 +
 src/bin/initdb/initdb.c                       |  3 +
 src/include/utils/guc_hooks.h                 |  2 +
 src/include/utils/pg_locale.h                 |  1 +
 7 files changed, 78 insertions(+)

diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 9167018c85b..91e7eba2eac 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -81,6 +81,7 @@ extern pg_locale_t create_pg_locale_libc(Oid collid, MemoryContext context);
 extern char *get_collation_actual_version_libc(const char *collcollate);
 
 /* GUC settings */
+char	   *locale_collate;
 char	   *locale_messages;
 char	   *locale_monetary;
 char	   *locale_numeric;
@@ -369,6 +370,64 @@ assign_locale_time(const char *newval, void *extra)
 	CurrentLCTimeValid = false;
 }
 
+/*
+ * We allow LC_COLLATE to actually be set globally.
+ *
+ * Note: we normally disallow value = "" because it wouldn't have consistent
+ * semantics (it'd effectively just use the previous value).  However, this
+ * is the value passed for PGC_S_DEFAULT, so don't complain in that case,
+ * not even if the attempted setting fails due to invalid environment value.
+ * The idea there is just to accept the environment setting *if possible*
+ * during startup, until we can read the proper value from postgresql.conf.
+ */
+bool
+check_locale_collate(char **newval, void **extra, GucSource source)
+{
+	int			locale_enc;
+	int			db_enc;
+
+	if (**newval == '\0')
+	{
+		if (source == PGC_S_DEFAULT)
+			return true;
+		else
+			return false;
+	}
+
+	locale_enc = pg_get_encoding_from_locale(*newval, true);
+	db_enc = GetDatabaseEncoding();
+
+	if (!(locale_enc == db_enc ||
+		  locale_enc == PG_SQL_ASCII ||
+		  db_enc == PG_SQL_ASCII ||
+		  locale_enc == -1))
+	{
+		if (source == PGC_S_FILE)
+		{
+			guc_free(*newval);
+			*newval = guc_strdup(LOG, "C");
+			if (!*newval)
+				return false;
+		}
+		else if (source != PGC_S_TEST)
+		{
+			ereport(WARNING,
+					(errmsg("encoding mismatch"),
+					 errdetail("Locale \"%s\" uses encoding \"%s\", which does not match database encoding \"%s\".",
+							   *newval, pg_encoding_to_char(locale_enc), pg_encoding_to_char(db_enc))));
+			return false;
+		}
+	}
+
+	return check_locale(LC_COLLATE, *newval, NULL);
+}
+
+void
+assign_locale_collate(const char *newval, void *extra)
+{
+	(void) pg_perm_setlocale(LC_COLLATE, newval);
+}
+
 /*
  * We allow LC_MESSAGES to actually be set globally.
  *
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 98f9598cd78..c99d57eba48 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -404,6 +404,8 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 	 * the pg_database tuple.
 	 */
 	SetDatabaseEncoding(dbform->encoding);
+	/* Reset lc_collate to check encoding, and fall back to C if necessary */
+	SetConfigOption("lc_collate", locale_collate, PGC_POSTMASTER, PGC_S_FILE);
 	/* Record it as a GUC internal option, too */
 	SetConfigOption("server_encoding", GetDatabaseEncodingName(),
 					PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
diff --git a/src/backend/utils/misc/guc_parameters.dat b/src/backend/utils/misc/guc_parameters.dat
index 1128167c025..a3da16eadb1 100644
--- a/src/backend/utils/misc/guc_parameters.dat
+++ b/src/backend/utils/misc/guc_parameters.dat
@@ -1450,6 +1450,15 @@
   boot_val => 'PG_KRB_SRVTAB',
 },
 
+{ name => 'lc_collate', type => 'string', context => 'PGC_SUSET', group => 'CLIENT_CONN_LOCALE',
+  short_desc => 'Sets the locale for text ordering in extensions.',
+  long_desc => 'An empty string means use the operating system setting.',
+  variable => 'locale_collate',
+  boot_val => '""',
+  check_hook => 'check_locale_collate',
+  assign_hook => 'assign_locale_collate',
+},
+
 { name => 'lc_messages', type => 'string', context => 'PGC_SUSET', group => 'CLIENT_CONN_LOCALE',
   short_desc => 'Sets the language in which messages are displayed.',
   long_desc => 'An empty string means use the operating system setting.',
diff --git a/src/backend/utils/misc/postgresql.conf.sample b/src/backend/utils/misc/postgresql.conf.sample
index dc9e2255f8a..19332e39e82 100644
--- a/src/backend/utils/misc/postgresql.conf.sample
+++ b/src/backend/utils/misc/postgresql.conf.sample
@@ -798,6 +798,8 @@
                                         # encoding
 
 # These settings are initialized by initdb, but they can be changed.
+#lc_collate = ''                        # locale for text ordering (only affects
+                                        # extensions)
 #lc_messages = ''                       # locale for system error message
                                         # strings
 #lc_monetary = 'C'                      # locale for monetary formatting
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 92fe2f531f7..8b2e7bfab6f 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1312,6 +1312,9 @@ setup_config(void)
 	conflines = replace_guc_value(conflines, "shared_buffers",
 								  repltok, false);
 
+	conflines = replace_guc_value(conflines, "lc_collate",
+								  lc_collate, false);
+
 	conflines = replace_guc_value(conflines, "lc_messages",
 								  lc_messages, false);
 
diff --git a/src/include/utils/guc_hooks.h b/src/include/utils/guc_hooks.h
index 82ac8646a8d..8a20f76eec8 100644
--- a/src/include/utils/guc_hooks.h
+++ b/src/include/utils/guc_hooks.h
@@ -65,6 +65,8 @@ extern bool check_huge_page_size(int *newval, void **extra, GucSource source);
 extern void assign_io_method(int newval, void *extra);
 extern bool check_io_max_concurrency(int *newval, void **extra, GucSource source);
 extern const char *show_in_hot_standby(void);
+extern bool check_locale_collate(char **newval, void **extra, GucSource source);
+extern void assign_locale_collate(const char *newval, void *extra);
 extern bool check_locale_messages(char **newval, void **extra, GucSource source);
 extern void assign_locale_messages(const char *newval, void *extra);
 extern bool check_locale_monetary(char **newval, void **extra, GucSource source);
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index 53574d2ef85..276be4c1fef 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -41,6 +41,7 @@
 #define UNICODE_CASEMAP_BUFSZ	(UNICODE_CASEMAP_LEN * sizeof(char32_t))
 
 /* GUC settings */
+extern PGDLLIMPORT char *locale_collate;
 extern PGDLLIMPORT char *locale_messages;
 extern PGDLLIMPORT char *locale_monetary;
 extern PGDLLIMPORT char *locale_numeric;
-- 
2.43.0

Reply via email to