Hi While studying Jeff's new crop of collation patches I noticed in passing that check_strxfrm_bug() must surely by now be unnecessary. The buffer overrun bugs were fixed a decade ago, and the relevant systems are way out of support. If you're worried that the bugs might come back, then the test is insufficient: modern versions of both OSes have strxfrm_l(), which we aren't checking. In any case, we also completely disable this stuff because of bugs and quality problems in every other known implementation, via TRUST_STRXFRM (or rather the lack of it). So I think it's time to remove that function; please see attached.
Just by the way, if you like slow motion domino runs, check this out: * Original pgsql-bugs investigation into strxfrm() inconsistencies https://www.postgresql.org/message-id/flat/111d0e27-a8f3-4a84-a4e0-b0fb70386...@s24.com * I happened to test that on bleeding-edge FreeBSD 11 (wasn't released yet), because at that time FreeBSD was in the process of adopting illumos's new collation code, and reported teething problems: https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=208266 * FreeBSD, DragonFly and illumos's trees were then partially fixed by the authors, but our strcolltest.c still showed some remaining problems in some locales (and it still does on my local FreeBSD battlestation): https://github.com/freebsd/freebsd-src/commit/c48dc2a193b9befceda8dfc6f894d73251cc00a4 https://www.illumos.org/rb/r/402/ * The authors traced the remaining problem to flaws in the Unicode project's CLDR/POSIX data, and the report was accepted: https://www.illumos.org/issues/7962 https://unicode-org.atlassian.net/browse/CLDR-10394 Eventually that'll be fixed, and (I guess) trigger at least a CLDR minor version bump affecting all downstream consumers (ICU, ...). Then... maybe... at least FreeBSD will finally pass that test. I do wonder whether other consumer libraries are also confused by that problem source data, and if not, why not; are glibc's problems related or just random code or data quality problems in different areas? (I also don't know why a problem in that data should affect strxfrm() and strcoll() differently, but I don't plan to find out owing to an acute shortage of round tuits). But in the meantime, I still can't recommend turning on TRUST_STRXFRM on any OS that I know of! The strcolltest.c program certainly still finds fault with glibc 2.36 despite the last update on that redhat bugzilla ticket that suggested that the big resync back in 2.28 was going to fix it. To be fair, macOS does actually pass that test for all locales, but the strxfrm() result is too narrow to be useful, according to comments in our tree. I would guess that a couple of other OSes with the old Berkeley locale code are similar.
From 8a70596a7de680e02fb2a6db59e622a653f097c2 Mon Sep 17 00:00:00 2001 From: Thomas Munro <thomas.mu...@gmail.com> Date: Thu, 15 Dec 2022 13:38:57 +1300 Subject: [PATCH] Remove obsolete defense against strxfrm() bugs. Old versions of Solaris and illumos had buffer overrun bugs in their strxfrm() implementations. The bugs were fixed a decade ago and the relevant releases are long out of vendor support. It's time to remove the defense added by commit be8b06c3. diff --git a/src/backend/main/main.c b/src/backend/main/main.c index 37989ba567..bf6992d9fe 100644 --- a/src/backend/main/main.c +++ b/src/backend/main/main.c @@ -137,8 +137,6 @@ main(int argc, char *argv[]) */ unsetenv("LC_ALL"); - check_strxfrm_bug(); - /* * Catch standard options before doing much else, in particular before we * insist on not being root. diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c index 2b42d9ccd8..aa60008ad7 100644 --- a/src/backend/utils/adt/pg_locale.c +++ b/src/backend/utils/adt/pg_locale.c @@ -1134,64 +1134,6 @@ IsoLocaleName(const char *winlocname) #endif /* WIN32 && LC_MESSAGES */ -/* - * Detect aging strxfrm() implementations that, in a subset of locales, write - * past the specified buffer length. Affected users must update OS packages - * before using PostgreSQL 9.5 or later. - * - * Assume that the bug can come and go from one postmaster startup to another - * due to physical replication among diverse machines. Assume that the bug's - * presence will not change during the life of a particular postmaster. Given - * those assumptions, call this no less than once per postmaster startup per - * LC_COLLATE setting used. No known-affected system offers strxfrm_l(), so - * there is no need to consider pg_collation locales. - */ -void -check_strxfrm_bug(void) -{ - char buf[32]; - const int canary = 0x7F; - bool ok = true; - - /* - * Given a two-byte ASCII string and length limit 7, 8 or 9, Solaris 10 - * 05/08 returns 18 and modifies 10 bytes. It respects limits above or - * below that range. - * - * The bug is present in Solaris 8 as well; it is absent in Solaris 10 - * 01/13 and Solaris 11.2. Affected locales include is_IS.ISO8859-1, - * en_US.UTF-8, en_US.ISO8859-1, and ru_RU.KOI8-R. Unaffected locales - * include de_DE.UTF-8, de_DE.ISO8859-1, zh_TW.UTF-8, and C. - */ - buf[7] = canary; - (void) strxfrm(buf, "ab", 7); - if (buf[7] != canary) - ok = false; - - /* - * illumos bug #1594 was present in the source tree from 2010-10-11 to - * 2012-02-01. Given an ASCII string of any length and length limit 1, - * affected systems ignore the length limit and modify a number of bytes - * one less than the return value. The problem inputs for this bug do not - * overlap those for the Solaris bug, hence a distinct test. - * - * Affected systems include smartos-20110926T021612Z. Affected locales - * include en_US.ISO8859-1 and en_US.UTF-8. Unaffected locales include C. - */ - buf[1] = canary; - (void) strxfrm(buf, "a", 1); - if (buf[1] != canary) - ok = false; - - if (!ok) - ereport(ERROR, - (errcode(ERRCODE_SYSTEM_ERROR), - errmsg_internal("strxfrm(), in locale \"%s\", writes past the specified array length", - setlocale(LC_COLLATE, NULL)), - errhint("Apply system library package updates."))); -} - - /* * Cache mechanism for collation information. * diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index a990c833c5..d07adb0a20 100644 --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -475,8 +475,6 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect SetConfigOption("lc_collate", collate, PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT); SetConfigOption("lc_ctype", ctype, PGC_INTERNAL, PGC_S_DYNAMIC_DEFAULT); - check_strxfrm_bug(); - ReleaseSysCache(tup); } diff --git a/src/include/utils/pg_locale.h b/src/include/utils/pg_locale.h index a875942123..60ccd7e34d 100644 --- a/src/include/utils/pg_locale.h +++ b/src/include/utils/pg_locale.h @@ -50,7 +50,6 @@ extern PGDLLIMPORT char *localized_full_months[]; extern bool check_locale(int category, const char *locale, char **canonname); extern char *pg_perm_setlocale(int category, const char *locale); -extern void check_strxfrm_bug(void); extern bool lc_collate_is_c(Oid collation); extern bool lc_ctype_is_c(Oid collation); -- 2.38.1