When I looked at the bug:
https://postgr.es/m/CALDQics_oBEYfOnu_zH6yw9WR1waPCmcrqxQ8+39hK3Op=z...@mail.gmail.com I noticed that the DDL around collations is inconsistent. For instance, CREATE COLLATION[1] uses LOCALE, LC_COLLATE, and LC_CTYPE parameters to specify either libc locales or an icu locale; whereas CREATE DATABASE[2] uses LOCALE, LC_COLLATE, and LC_CTYPE always for libc, and ICU_LOCALE if the default collation is ICU. The catalog representation is strange in a different way: datcollate/collcollate are always for libc, and daticulocale is for icu. That means anything that deals with those fields needs to pick the right one based on the provider. If this were a clean slate, it would make more sense if it were something like: datcollate/collcollate: to instantiate pg_locale_t datctype/collctype: to instantiate pg_locale_t datlibccollate: used by libc elsewhere datlibcctype: used by libc elsewhere daticulocale/colliculocale: remove these fields That way, if you are instantiating a pg_locale_t, you always just pass datcollate/datctype/collcollate/collctype, regardless of the provider (pg_newlocale_from_collation() would figure it out). And if you are going to do something straight with libc, you always use datlibccollate/datlibcctype. Aside: why don't we support different collate/ctype with ICU? It appears that u_strToTitle/u_strToUpper/u_strToLower just accept a string "locale", and it would be easy enough to pass it whatever is in datctype/collctype, right? We should validate that it's a valid locale; but other than that, I don't see the problem. Thoughts? Implementation-wise, I suppose this could create some annoyances in pg_dump. [1] https://www.postgresql.org/docs/devel/sql-createcollation.html [2] https://www.postgresql.org/docs/devel/sql-createdatabase.html [3] https://unicode-org.github.io/icu-docs/apidoc/released/icu4c/ustring_8h.html -- Jeff Davis PostgreSQL Contributor Team - AWS