On Fri, Jan 17, 2025 at 04:06:20PM -0800, Jeff Davis wrote: > Committed 0001 and 0002.
> Upon reviewing the discussion threads, I removed the Unicode "adjust to > Cased" behavior when titlecasing. As Peter pointed out[1], it doesn't > match the documentation or expectations for INITCAP(). While commit d3d0983 changed most of the non-test pg_u_*() "bool posix" arguments, it left a pg_u_isalnum(u, true) in strtitle_builtin() subroutine initcap_wbnext(). The above paragraph may or may not be saying that's intentional. Example of the consequence at non-ASCII decimal digits: SELECT str, re, regexp_count(str COLLATE pg_c_utf8, re) AS count_c_utf8, regexp_count(str COLLATE pg_unicode_fast, re) AS count_unicode_fast, regexp_count(str COLLATE unicode, re) AS count_unicode, initcap(str COLLATE pg_c_utf8) AS initcap_c_utf8, initcap(str COLLATE pg_unicode_fast) AS initcap_unicode_fast, initcap(str COLLATE unicode) AS initcap_unicode FROM (VALUES (U&'foo\0661bar baz')) AS str_t(str), (VALUES ('[[:digit:]]')) AS re_t(re) ORDER BY 1, 2; str │ foo١bar baz re │ [[:digit:]] count_c_utf8 │ 0 count_unicode_fast │ 1 count_unicode │ 1 initcap_c_utf8 │ Foo١Bar Baz initcap_unicode_fast │ Foo١Bar Baz initcap_unicode │ Foo١bar Baz Should initcap_wbnext() pass in a locale-dependent "bool posix" argument like the others calls the commit changed? Related message from the development of pg_c_utf8, which you shared downthread: https://www.postgresql.org/message-id/610d7f1b-c68c-4eb8-a03d-1515da304c58%40manitou-mail.org Long-term, pg_u_isword() should have a "bool posix" argument. Currently, only tests call that function. If it got a non-test caller, https://www.unicode.org/reports/tr18/#word would have pg_u_isword() follow the choice of posix compatibility like pg_u_isalnum() does.