On 08.03.23 19:25, Jeff Davis wrote:
Why is "unicode" only provided for the UTF-8 encoding? For "ucs_basic" that makes some sense, because the implementation only works in UTF-8. But here we are using ICU, and the "und" locale should work for any ICU-supported encoding. I suggest that we use collencoding=-1 for "unicode", and the docs can just add a note next to "ucs_basic" that it only works for UTF-8, because that's the weird case.
make sense
For the docs, I suggest that you clarify that "ucs_basic" has the same behavior as the C locale does *in the UTF-8 encoding*. Not all users might pick up on the subtlety that the C locale has different behaviors in different encodings.
Ok, word-smithed a bit more. How about this patch version?
From a8e33d010f60cceb9442123bd0531451875df313 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut <pe...@eisentraut.org> Date: Thu, 9 Mar 2023 11:14:28 +0100 Subject: [PATCH v2] Add standard collation UNICODE Discussion: https://www.postgresql.org/message-id/flat/1293e382-2093-a2bf-a397-c04e8f83d...@enterprisedb.com --- doc/src/sgml/charset.sgml | 31 ++++++++++++++++++++++++++++--- src/bin/initdb/initdb.c | 10 +++++++--- 2 files changed, 35 insertions(+), 6 deletions(-) diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index 3032392b80..12fabb7372 100644 --- a/doc/src/sgml/charset.sgml +++ b/doc/src/sgml/charset.sgml @@ -659,9 +659,34 @@ <title>Standard Collations</title> </para> <para> - Additionally, the SQL standard collation name <literal>ucs_basic</literal> - is available for encoding <literal>UTF8</literal>. It is equivalent - to <literal>C</literal> and sorts by Unicode code point. + Additionally, two SQL standard collation names are available: + + <variablelist> + <varlistentry> + <term><literal>unicode</literal></term> + <listitem> + <para> + This collation sorts using the Unicode Collation Algorithm with the + Default Unicode Collation Element Table. It is available in all + encodings. ICU support is required to use this collation. (This + collation has the same behavior as the ICU root locale; see <xref + linkend="collation-managing-predefined-icu-und-x-icu"/>.) + </para> + </listitem> + </varlistentry> + + <varlistentry> + <term><literal>ucs_basic</literal></term> + <listitem> + <para> + This collation sorts by Unicode code point. It is only available for + encoding <literal>UTF8</literal>. (This collation has the same + behavior as the libc locale specification <literal>C</literal> in + <literal>UTF8</literal> encoding.) + </para> + </listitem> + </varlistentry> + </variablelist> </para> </sect3> diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 5e3c6a27c4..d303cc5609 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -1486,10 +1486,14 @@ static void setup_collation(FILE *cmdfd) { /* - * Add an SQL-standard name. We don't want to pin this, so it doesn't go - * in pg_collation.h. But add it before reading system collations, so - * that it wins if libc defines a locale named ucs_basic. + * Add SQL-standard names. We don't want to pin these, so they don't go + * in pg_collation.dat. But add them before reading system collations, so + * that they win if libc defines a locale with the same name. */ + PG_CMD_PRINTF("INSERT INTO pg_collation (oid, collname, collnamespace, collowner, collprovider, collisdeterministic, collencoding, colliculocale)" + "VALUES (pg_nextoid('pg_catalog.pg_collation', 'oid', 'pg_catalog.pg_collation_oid_index'), 'unicode', 'pg_catalog'::regnamespace, %u, '%c', true, -1, 'und');\n\n", + BOOTSTRAP_SUPERUSERID, COLLPROVIDER_ICU); + PG_CMD_PRINTF("INSERT INTO pg_collation (oid, collname, collnamespace, collowner, collprovider, collisdeterministic, collencoding, collcollate, collctype)" "VALUES (pg_nextoid('pg_catalog.pg_collation', 'oid', 'pg_catalog.pg_collation_oid_index'), 'ucs_basic', 'pg_catalog'::regnamespace, %u, '%c', true, %d, 'C', 'C');\n\n", BOOTSTRAP_SUPERUSERID, COLLPROVIDER_LIBC, PG_UTF8); base-commit: 36ea345f8fa616fd9b40576310e54145aa70c1a1 -- 2.39.2