Re: Multi-byte character case-folding

Daniel Verite Tue, 07 Jul 2020 04:34:38 -0700

        Tom Lane wrote:

> CREATE TABLE public."myÉclass" (
>    f1 text
> );
> 
> If we start to case-fold É, then the only way to access this table will
> be by double-quoting its name, which the application probably is not
> expecting (else it would have double-quoted in the original CREATE TABLE).


This problem already exists when migrating from a mono-byte database
to a multi-byte database, since downcase_identifier()  does use
tolower() for mono-byte databases.

db9=# show server_encoding ;
 server_encoding 
-----------------
 LATIN9
(1 row)

db9=# create table MYÉCLASS (f1 text);
CREATE TABLE

db9=# \d
          List of relations
 Schema |   Name   | Type  |  Owner   
--------+----------+-------+----------
 public | myéclass | table | postgres
(1 row)

db9=# select * from MYÉCLASS;
 f1 
----
(0 rows)

pg_dump will dump this as

CREATE TABLE public."myéclass" (
    f1 text
);

So far so good. But after importing this into an UTF-8 database,
the same "select * from MYÉCLASS" that used to work now fails:

u8=# show server_encoding ;
 server_encoding 
-----------------
 UTF8
(1 row)

u8=# select * from MYÉCLASS;
ERROR:  relation "myÉclass" does not exist


The compromise that is mentioned in downcase_identifier() justifying
this inconsistency is not very convincing, because the issues in case
folding due to linguistic differences exist both in mono-byte and
multi-byte encodings. For instance, if it's fine to trust the locale
to downcase 'İ' in a LATIN5 db, it should be okay in a UTF-8 db too.


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

Re: Multi-byte character case-folding

Reply via email to