Re: Add standard collation UNICODE

Peter Eisentraut Thu, 09 Mar 2023 02:21:50 -0800

On 08.03.23 19:25, Jeff Davis wrote:

Why is "unicode" only provided for the UTF-8 encoding? For "ucs_basic"
that makes some sense, because the implementation only works in UTF-8.
But here we are using ICU, and the "und" locale should work for any
ICU-supported encoding. I suggest that we use collencoding=-1 for
"unicode", and the docs can just add a note next to "ucs_basic" that it
only works for UTF-8, because that's the weird case.


make sense

For the docs, I suggest that you clarify that "ucs_basic" has the same
behavior as the C locale does *in the UTF-8 encoding*. Not all users
might pick up on the subtlety that the C locale has different behaviors
in different encodings.


Ok, word-smithed a bit more.

How about this patch version?

From a8e33d010f60cceb9442123bd0531451875df313 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut <[email protected]>
Date: Thu, 9 Mar 2023 11:14:28 +0100
Subject: [PATCH v2] Add standard collation UNICODE

Discussion: 
https://www.postgresql.org/message-id/flat/[email protected]
---
 doc/src/sgml/charset.sgml | 31 ++++++++++++++++++++++++++++---
 src/bin/initdb/initdb.c   | 10 +++++++---
 2 files changed, 35 insertions(+), 6 deletions(-)

diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 3032392b80..12fabb7372 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -659,9 +659,34 @@ <title>Standard Collations</title>
    </para>
 
    <para>
-    Additionally, the SQL standard collation name <literal>ucs_basic</literal>
-    is available for encoding <literal>UTF8</literal>.  It is equivalent
-    to <literal>C</literal> and sorts by Unicode code point.
+    Additionally, two SQL standard collation names are available:
+
+    <variablelist>
+     <varlistentry>
+      <term><literal>unicode</literal></term>
+      <listitem>
+       <para>
+        This collation sorts using the Unicode Collation Algorithm with the
+        Default Unicode Collation Element Table.  It is available in all
+        encodings.  ICU support is required to use this collation.  (This
+        collation has the same behavior as the ICU root locale; see <xref
+        linkend="collation-managing-predefined-icu-und-x-icu"/>.)
+       </para>
+      </listitem>
+     </varlistentry>
+
+     <varlistentry>
+      <term><literal>ucs_basic</literal></term>
+      <listitem>
+       <para>
+        This collation sorts by Unicode code point.  It is only available for
+        encoding <literal>UTF8</literal>.  (This collation has the same
+        behavior as the libc locale specification <literal>C</literal> in
+        <literal>UTF8</literal> encoding.)
+       </para>
+      </listitem>
+     </varlistentry>
+    </variablelist>
    </para>
   </sect3>
 
diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c
index 5e3c6a27c4..d303cc5609 100644
--- a/src/bin/initdb/initdb.c
+++ b/src/bin/initdb/initdb.c
@@ -1486,10 +1486,14 @@ static void
 setup_collation(FILE *cmdfd)
 {
        /*
-        * Add an SQL-standard name.  We don't want to pin this, so it doesn't 
go
-        * in pg_collation.h.  But add it before reading system collations, so
-        * that it wins if libc defines a locale named ucs_basic.
+        * Add SQL-standard names.  We don't want to pin these, so they don't go
+        * in pg_collation.dat.  But add them before reading system collations, 
so
+        * that they win if libc defines a locale with the same name.
         */
+       PG_CMD_PRINTF("INSERT INTO pg_collation (oid, collname, collnamespace, 
collowner, collprovider, collisdeterministic, collencoding, colliculocale)"
+                                 "VALUES 
(pg_nextoid('pg_catalog.pg_collation', 'oid', 
'pg_catalog.pg_collation_oid_index'), 'unicode', 'pg_catalog'::regnamespace, 
%u, '%c', true, -1, 'und');\n\n",
+                                 BOOTSTRAP_SUPERUSERID, COLLPROVIDER_ICU);
+
        PG_CMD_PRINTF("INSERT INTO pg_collation (oid, collname, collnamespace, 
collowner, collprovider, collisdeterministic, collencoding, collcollate, 
collctype)"
                                  "VALUES 
(pg_nextoid('pg_catalog.pg_collation', 'oid', 
'pg_catalog.pg_collation_oid_index'), 'ucs_basic', 'pg_catalog'::regnamespace, 
%u, '%c', true, %d, 'C', 'C');\n\n",
                                  BOOTSTRAP_SUPERUSERID, COLLPROVIDER_LIBC, 
PG_UTF8);

base-commit: 36ea345f8fa616fd9b40576310e54145aa70c1a1
-- 
2.39.2

Re: Add standard collation UNICODE

Reply via email to