Alexander Korotkov wrote at 2025-07-28 17:23:
On Mon, Jul 28, 2025 at 1:20 PM Alexander Korotkov
<aekorot...@gmail.com> wrote:
On 25 Sep 2024, at 18:13, Oleg Tselebrovskiy
<o.tselebrovs...@postgrespro.ru> wrote:
Greetings, everyone!
One of our clients has found a difference in behaviour of initcap
function when
using different locale providers, shown below
postgres=# create database test_db_1 locale_provider=icu
locale="ru_RU.UTF-8" template=template0;
NOTICE: using standard form "ru-RU" for ICU locale "ru_RU.UTF-8"
CREATE DATABASE
postgres=# \c test_db_1;
You are now connected to database "test_db_1" as user "postgres".
test_db_1=# select initcap('ЧиЮ А.Ю.');
initcap
----------
Чию А.ю.
(1 row)
test_db_1=# select initcap('joHn d.e.');
initcap
-----------
John D.e.
(1 row)
postgres=# create database test_db_2 locale_provider=libc
locale="ru_RU.UTF-8" template=template0;
CREATE DATABASE
postgres=# \c test_db_2
You are now connected to database "test_db_2" as user "postgres".
test_db_2=# select initcap('ЧиЮ А.Ю.');
initcap
----------
Чию А.Ю.
(1 row)
test_db_2=# select initcap('joHn d.e.');
initcap
-----------
John D.E.
(1 row)
And an easier reproduction (should work for REL_12_STABLE and up)
postgres=# SELECT initcap('first.second' COLLATE "en-x-icu");
initcap
--------------
First.second
(1 row)
postgres=# SELECT initcap('first.second' COLLATE "en_US");
initcap
--------------
First.Second
(1 row)
This behaviour is reproducible on REL_12_STABLE and up to master
I don't believe that this is an erroneous behaviour, just a differing
one, hence
just a documentation change proposition
I suggest adding a clarification that this function works differently
with libc
and ICU providers because there is a difference in what a "word" is
between them
In libc a word is a sequence of alphanumeric characters, separated by
non-alphanumeric characters (as it is written in documentation right
now)
In ICU words are divided according to Unicode® Standard Annex #29 [1]
Similar issue was briefly discussed in [2]
The suggested documentation patch is attached (versions for
REL_13_STABLE+ and
for REL_12_STABLE only)
[1]: https://www.unicode.org/reports/tr29/#Word_Boundaries
[2]:
https://www.postgresql.org/message-id/CAEwbS1R8pwhRkwRo3XsPt24ErBNtFWuReAZhVPJwA3oqo148tA%40mail.gmail.com
Oleg Tselebrovskiy, Postgres
Professional<v1-0001-string-functions.patch><v1-0002-string-functions-REL_12.patch>
I can confirm inicap works with libc and libicu as you stated. The
documentation patch looks good to me. I’ve written a commit message.
The REL_12_STABLE branch is not relevant anymore as it’s out of
support. I’m going to push this if no objections.
I'm sorry for these many messages. My email client just gone crazy.
Must be fixed now.
------
Regards,
Alexander Korotkov
Supabase
Commit message looks good to me, also no objections on ignoring
REL_12_STABLE :)
Thank you!
Regards, Oleg Tselebrovskiy