pattern_char_isalpha() doesn't check for the PG_C_UTF8 builtin collation provider, and ends up falling through to isalpha() for characters in the ascii range.
I don't think this is an actual correctness bug, because: (a) For all locales I tested on linux and mac, isalpha() has identical behavior for the ascii range. (b) To be an actual correctness bug, it would need to be a false negative; that is, to say that a character is not case-varying when it is. The only case-varying characters in the ascii range for PG_C_UTF8 are [A-Za-z], and it seems unlikely that any locale would treat those as non-alphabetic. But I I think we should fix and backport to 17, because there's no reason we should be calling libc at all when using PG_C_UTF8, and it might cause an issue on some platform that I didn't test. Fix attached (slightly different on master and 17). I intend to commit soon. Regards, Jeff Davis
From c65eb2fda1d7c9a29846d61bdb0358a0e73e2226 Mon Sep 17 00:00:00 2001 From: Jeff Davis <j...@j-davis.com> Date: Wed, 9 Oct 2024 22:28:15 -0700 Subject: [PATCH v17] Fix missed case for builtin collation provider. A missed check for the builtin collation provider could result in falling through to call isalpha(). This does not appear to have practical consequences because it only happens for characters in the ASCII range. Regardless, the builtin provider should not be calling libc functions, so backpatch. Backpatch-through: 17 --- src/backend/utils/adt/like_support.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c index 2635050861..6cd21ba8fe 100644 --- a/src/backend/utils/adt/like_support.c +++ b/src/backend/utils/adt/like_support.c @@ -1505,7 +1505,7 @@ pattern_char_isalpha(char c, bool is_multibyte, return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z'); else if (is_multibyte && IS_HIGHBIT_SET(c)) return true; - else if (locale && locale->provider == COLLPROVIDER_ICU) + else if (locale && locale->provider != COLLPROVIDER_LIBC) return IS_HIGHBIT_SET(c) || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z'); else if (locale && locale->provider == COLLPROVIDER_LIBC) -- 2.34.1
From 51c86d422942152c960d02c7478483d3b21f1390 Mon Sep 17 00:00:00 2001 From: Jeff Davis <j...@j-davis.com> Date: Wed, 9 Oct 2024 22:28:15 -0700 Subject: [PATCH v18] Fix missed case for builtin collation provider. A missed check for the builtin collation provider could result in falling through to call isalpha(). This does not appear to have practical consequences because it only happens for characters in the ASCII range. Regardless, the builtin provider should not be calling libc functions, so backpatch. Backpatch-through: 17 --- src/backend/utils/adt/like_support.c | 6 ++---- 1 file changed, 2 insertions(+), 4 deletions(-) diff --git a/src/backend/utils/adt/like_support.c b/src/backend/utils/adt/like_support.c index 79c4ddc757..8b15509a3b 100644 --- a/src/backend/utils/adt/like_support.c +++ b/src/backend/utils/adt/like_support.c @@ -1500,13 +1500,11 @@ pattern_char_isalpha(char c, bool is_multibyte, return (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z'); else if (is_multibyte && IS_HIGHBIT_SET(c)) return true; - else if (locale->provider == COLLPROVIDER_ICU) + else if (locale->provider != COLLPROVIDER_LIBC) return IS_HIGHBIT_SET(c) || (c >= 'A' && c <= 'Z') || (c >= 'a' && c <= 'z'); - else if (locale->provider == COLLPROVIDER_LIBC) - return isalpha_l((unsigned char) c, locale->info.lt); else - return isalpha((unsigned char) c); + return isalpha_l((unsigned char) c, locale->info.lt); } -- 2.34.1