ICU_LOCALE set database default icu collation but not working as intended.

jian he Thu, 26 May 2022 08:24:40 -0700

Base on this thread:
https://www.postgresql.org/message-id/20220305083830.lpz3k3yku5lmm5xs%40jrouhaud
ordering reference:
https://unicode-org.github.io/cldr-staging/charts/latest/collation/en_US_POSIX.html


<https://www.postgresql.org/message-id/20220305083830.lpz3k3yku5lmm5xs%40jrouhaud>CREATE
DATABASE dbicu1 LOCALE_PROVIDER icu LOCALE 'en_US.UTF-8' ICU_LOCALE
'en-u-kf-upper' TEMPLATE 'template0';
CREATE DATABASE dbicu2 LOCALE_PROVIDER icu LOCALE 'en_US.UTF-8' ICU_LOCALE
'en-u-kr-latn-digit' TEMPLATE 'template0';
--same script apply to dbicu1 dbicu2
BEGIN;
CREATE COLLATION upperfirst (
    provider = icu,
    locale = 'en-u-kf-upper'
);
CREATE TABLE icu (
    def text,
    en text COLLATE "en_US",
    upfirst text COLLATE upperfirst,
    test_kr text
);
INSERT INTO icu
    VALUES ('a', 'a', 'a', '1 a'), ('b', 'b', 'b', 'A 11'), ('A', 'A', 'A',
'A 19'), ('B', 'B', 'B', '8 p');
INSERT INTO icu
    VALUES ('a', 'a', 'a', 'a 7');
INSERT INTO icu
    VALUES ('a', 'a', 'a', 'Œ 1');
COMMIT;
-----------------------
--dbicu1
SELECT def AS def FROM icu ORDER BY def; --since only character. all works
fine.
SELECT test_kr FROM icu  ORDER BY def;
/*
  test_kr
---------
 A 19
 1 a
 a 7
 Œ 1
 8 p
 A 11
 */


--dbicu2
SELECT def AS def FROM icu ORDER BY def; --since only character. all works
fine.
SELECT test_kr FROM icu  ORDER BY def;
/*
 test_kr
---------
 1 a
 a 7
 Œ 1
 A 19
 A 11
 8 p
(6 rows)
*/

Since dbicu1 and dbicu2 set the default collation then
In dbicu1, I should expect the ordering:
 number >> Upper case alphabet letter  >> lower case alphabet letter>>
character Œ (U+0153)

In dbicu2, I should expect the ordering:
 lower case  alphabet letter >> Upper case alphabet letter >> number >>
character Œ (U+0153)

As you can see one letter works fine for dbicu1, dbicu2. However, it does
not work on more characters.
Or The result is correct, but something I misunderstood?

I am not sure this is my personal misunderstanding.
In the above examples, the first character of column *test_kr*
is so different that the comparison is based on the first letter.
If the first letter is the same then compute the second letter..
So for whatever collation, I should expect 'A 19' to be adjacent with 'A
11'?



-- 
 I recommend David Deutsch's <<The Beginning of Infinity>>

  Jian

ICU_LOCALE set database default icu collation but not working as intended.

Reply via email to