Re: Initcap works differently with different locale providers

Oleg Tselebrovskiy Tue, 05 Aug 2025 02:01:18 -0700

Jeff Davis wrote at 2025-08-05 03:59:

One more thing: we should also change it to "... to  upper case (or
title case) and the rest to lower case...". Title case is for scripts
that have characters like 'ǅ' (U+01C5).


Done based upon second version of previous patch. Again, there are two
versions - the first one has a mention of digraphs, like 'ǅ' (U+01C5),
and the second one doesn't. And again, don't know which version is
better - title case without mentioning digraphs could be interpreted
as "don't capitalise articles and prepositions" or just "don't
capitalize articles", since the definition of "title case" is vague.
We have a "write your own function" clause, but still.

Maybe we should add an example of a digraph to the first patch to
make it more clear, if we go that path.

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 74a16af04ad..b32ec6e2cea 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3147,12 +3147,15 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
         <returnvalue>text</returnvalue>
        </para>
        <para>
-        Converts the first letter of each word to upper case and the
-        rest to lower case. When using the <literal>libc</literal> locale
-        provider, words are sequences of alphanumeric characters separated
-        by non-alphanumeric characters; when using the ICU locale provider,
-        words are separated according to
-        <ulink url="https://www.unicode.org/reports/tr29/#Word_Boundaries";>Unicode Standard Annex #29</ulink>.
+        Converts the first letter of each word to upper case (or title case
+        if the letter is a digraph) and the rest to lower case.
+       </para>
+       <para>
+        This function is primarily used for convenient
+        display, and the specific result should not be relied upon because of
+        the differences between locale providers and between different
+        ICU versions. If specific word boundary rules are desired,
+        it is recomended to write a custom function.
        </para>
        <para>
         <literal>initcap('hi THOMAS')</literal>

diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 74a16af04ad..f799b34dca7 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3147,12 +3147,15 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
         <returnvalue>text</returnvalue>
        </para>
        <para>
-        Converts the first letter of each word to upper case and the
-        rest to lower case. When using the <literal>libc</literal> locale
-        provider, words are sequences of alphanumeric characters separated
-        by non-alphanumeric characters; when using the ICU locale provider,
-        words are separated according to
-        <ulink url="https://www.unicode.org/reports/tr29/#Word_Boundaries";>Unicode Standard Annex #29</ulink>.
+        Converts the first letter of each word to upper case (or title case)
+        and the rest to lower case.
+       </para>
+       <para>
+        This function is primarily used for convenient
+        display, and the specific result should not be relied upon because of
+        the differences between locale providers and between different
+        ICU versions. If specific word boundary rules are desired,
+        it is recomended to write a custom function.
        </para>
        <para>
         <literal>initcap('hi THOMAS')</literal>

Re: Initcap works differently with different locale providers

Reply via email to