Hi

While studying Jeff's new crop of collation patches I noticed in
passing that check_strxfrm_bug() must surely by now be unnecessary.
The buffer overrun bugs were fixed a decade ago, and the relevant
systems are way out of support.  If you're worried that the bugs might
come back, then the test is insufficient: modern versions of both OSes
have strxfrm_l(), which we aren't checking.  In any case, we also
completely disable this stuff because of bugs and quality problems in
every other known implementation, via TRUST_STRXFRM (or rather the
lack of it).  So I think it's time to remove that function; please see
attached.

Just by the way, if you like slow motion domino runs, check this out:

* Original pgsql-bugs investigation into strxfrm() inconsistencies
  
https://www.postgresql.org/message-id/flat/111d0e27-a8f3-4a84-a4e0-b0fb70386...@s24.com

* I happened to test that on bleeding-edge FreeBSD 11 (wasn't released
yet), because at that time FreeBSD was in the process of adopting
illumos's new collation code, and reported teething problems:
  https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208266

* FreeBSD, DragonFly and illumos's trees were then partially fixed by
the authors, but our strcolltest.c still showed some remaining
problems in some locales (and it still does on my local FreeBSD
battlestation):
  
https://github.com/freebsd/freebsd-src/commit/c48dc2a193b9befceda8dfc6f894d73251cc00a4
  https://www.illumos.org/rb/r/402/

* The authors traced the remaining problem to flaws in the Unicode
project's CLDR/POSIX data, and the report was accepted:
  https://www.illumos.org/issues/7962
  https://unicode-org.atlassian.net/browse/CLDR-10394

Eventually that'll be fixed, and (I guess) trigger at least a CLDR
minor version bump affecting all downstream consumers (ICU, ...).
Then... maybe... at least FreeBSD will finally pass that test.  I do
wonder whether other consumer libraries are also confused by that
problem source data, and if not, why not; are glibc's problems related
or just random code or data quality problems in different areas?  (I
also don't know why a problem in that data should affect strxfrm() and
strcoll() differently, but I don't plan to find out owing to an acute
shortage of round tuits).

But in the meantime, I still can't recommend turning on TRUST_STRXFRM
on any OS that I know of!  The strcolltest.c program certainly still
finds fault with glibc 2.36 despite the last update on that redhat
bugzilla ticket that suggested that the big resync back in 2.28 was
going to fix it.

To be fair, macOS does actually pass that test for all locales, but
the strxfrm() result is too narrow to be useful, according to comments
in our tree.  I would guess that a couple of other OSes with the old
Berkeley locale code are similar.
From 8a70596a7de680e02fb2a6db59e622a653f097c2 Mon Sep 17 00:00:00 2001
From: Thomas Munro <thomas.mu...@gmail.com>
Date: Thu, 15 Dec 2022 13:38:57 +1300
Subject: [PATCH] Remove obsolete defense against strxfrm() bugs.

Old versions of Solaris and illumos had buffer overrun bugs in their
strxfrm() implementations.  The bugs were fixed a decade ago and the
relevant releases are long out of vendor support.  It's time to remove
the defense added by commit be8b06c3.

diff --git a/src/backend/main/main.c b/src/backend/main/main.c
index 37989ba567..bf6992d9fe 100644
--- a/src/backend/main/main.c
+++ b/src/backend/main/main.c
@@ -137,8 +137,6 @@ main(int argc, char *argv[])
 	 */
 	unsetenv("LC_ALL");
 
-	check_strxfrm_bug();
-
 	/*
 	 * Catch standard options before doing much else, in particular before we
 	 * insist on not being root.
diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c
index 2b42d9ccd8..aa60008ad7 100644
--- a/src/backend/utils/adt/pg_locale.c
+++ b/src/backend/utils/adt/pg_locale.c
@@ -1134,64 +1134,6 @@ IsoLocaleName(const char *winlocname)
 #endif							/* WIN32 && LC_MESSAGES */
 
 
-/*
- * Detect aging strxfrm() implementations that, in a subset of locales, write
- * past the specified buffer length.  Affected users must update OS packages
- * before using PostgreSQL 9.5 or later.
- *
- * Assume that the bug can come and go from one postmaster startup to another
- * due to physical replication among diverse machines.  Assume that the bug's
- * presence will not change during the life of a particular postmaster.  Given
- * those assumptions, call this no less than once per postmaster startup per
- * LC_COLLATE setting used.  No known-affected system offers strxfrm_l(), so
- * there is no need to consider pg_collation locales.
- */
-void
-check_strxfrm_bug(void)
-{
-	char		buf[32];
-	const int	canary = 0x7F;
-	bool		ok = true;
-
-	/*
-	 * Given a two-byte ASCII string and length limit 7, 8 or 9, Solaris 10
-	 * 05/08 returns 18 and modifies 10 bytes.  It respects limits above or
-	 * below that range.
-	 *
-	 * The bug is present in Solaris 8 as well; it is absent in Solaris 10
-	 * 01/13 and Solaris 11.2.  Affected locales include is_IS.ISO8859-1,
-	 * en_US.UTF-8, en_US.ISO8859-1, and ru_RU.KOI8-R.  Unaffected locales
-	 * include de_DE.UTF-8, de_DE.ISO8859-1, zh_TW.UTF-8, and C.
-	 */
-	buf[7] = canary;
-	(void) strxfrm(buf, "ab", 7);
-	if (buf[7] != canary)
-		ok = false;
-
-	/*
-	 * illumos bug #1594 was present in the source tree from 2010-10-11 to
-	 * 2012-02-01.  Given an ASCII string of any length and length limit 1,
-	 * affected systems ignore the length limit and modify a number of bytes
-	 * one less than the return value.  The problem inputs for this bug do not
-	 * overlap those for the Solaris bug, hence a distinct test.
-	 *
-	 * Affected systems include smartos-20110926T021612Z.  Affected locales
-	 * include en_US.ISO8859-1 and en_US.UTF-8.  Unaffected locales include C.
-	 */
-	buf[1] = canary;
-	(void) strxfrm(buf, "a", 1);
-	if (buf[1] != canary)
-		ok = false;
-
-	if (!ok)
-		ereport(ERROR,
-				(errcode(ERRCODE_SYSTEM_ERROR),
-				 errmsg_internal("strxfrm(), in locale \"%s\", writes past the specified array length",
-								 setlocale(LC_COLLATE, NULL)),
-				 errhint("Apply system library package updates.")));
-}
-
-
 /*
  * Cache mechanism for collation information.
  *
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index a990c833c5..d07adb0a20 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -475,8 +475,6 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect
 	SetConfigOption("lc_collate", collate, PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
 	SetConfigOption("lc_ctype", ctype, PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT);
 
-	check_strxfrm_bug();
-
 	ReleaseSysCache(tup);
 }
 
diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h
index a875942123..60ccd7e34d 100644
--- a/src/include/utils/pg_locale.h
+++ b/src/include/utils/pg_locale.h
@@ -50,7 +50,6 @@ extern PGDLLIMPORT char *localized_full_months[];
 
 extern bool check_locale(int category, const char *locale, char **canonname);
 extern char *pg_perm_setlocale(int category, const char *locale);
-extern void check_strxfrm_bug(void);
 
 extern bool lc_collate_is_c(Oid collation);
 extern bool lc_ctype_is_c(Oid collation);
-- 
2.38.1

Reply via email to