On Wed, 2023-06-14 at 15:55 -0700, Jeff Davis wrote: > The locale "C" (and equivalently, "POSIX") is not really a libc > locale; > it's implemented internally with memcmp for collation and > pg_ascii_tolower, etc., for ctype. > > The attached patch implements a new collation provider, "builtin", > which only supports "C" and "POSIX".
Rebased patch attached. I got some generally positive comments, but it needs some more feedback on the specifics to be committable. This might be a good time to summarize my thoughts on collation after my work in v16: * Picking a database default collation other than UCS_BASIC (a.k.a. "C", a.k.a. memcmp(), a.k.a. provider=builtin) is something that should be done intentionally. It's an impactful choice that affects semantics, performance, and upgrades/deployment. Beyond that, our implementation still lacks a good way to manage versions of collation provider libraries and track object dependencies in a safe way to prevent index corruption, so the safest choice is really just to use stable memcmp() semantics. * The defaults for initdb seem bad in a number of ways, but it's too hard to change that default now (I tried in v16 and reverted it). So the job of reasonable choices is left for higher-level tools and documentation. * We can handle the collation and character classification independently. The main use case is to set the collation to memcmp() semantics (for stability and performance) and set the character classification to something interesting (on the grounds that it's more likely to be stable and less likely to be used in an index than a collation). Right now the only way to do that is to use the libc provider and set the collation to C and the ctype to a libc locale. But there is also a use case for having ICU as the provider for character classification. One option is to have separate datcolprovider=b (builtin provider) and datctypeprovider=i, so that the collation would be handled with memcmp and the character classification daticulocale. It feels like we're growing the fields in pg_database a little too much, but the use case seems valid, and perhaps we can reorganize the catalog representation a bit. -- Jeff Davis PostgreSQL Contributor Team - AWS
From ec35f2c1e31c3d793f5eb4982cb562b701ce71fd Mon Sep 17 00:00:00 2001 From: Jeff Davis <j...@j-davis.com> Date: Mon, 1 May 2023 15:38:29 -0700 Subject: [PATCH v12] Introduce collation provider "builtin". Only supports locale "C" (or equivalently, "POSIX"). Provides locale-unaware semantics that are implemented as fast byte operations in Postgres, independent of the operating system or any provider libraries. Equivalent (in semantics and implementation) to the libc provider with locale "C", except that LC_COLLATE and LC_CTYPE can be set independently. Use provider "builtin" for built-in collation "ucs_basic" instead of libc. Discussion: https://postgr.es/m/ab925f69-5f9d-f85e-b87c-bd2a44798...@joeconway.com --- doc/src/sgml/charset.sgml | 89 +++++++++++++++++---- doc/src/sgml/ref/create_collation.sgml | 11 ++- doc/src/sgml/ref/create_database.sgml | 8 +- doc/src/sgml/ref/createdb.sgml | 2 +- doc/src/sgml/ref/initdb.sgml | 7 +- src/backend/catalog/pg_collation.c | 7 +- src/backend/commands/collationcmds.c | 103 +++++++++++++++++++++---- src/backend/commands/dbcommands.c | 69 ++++++++++++++--- src/backend/utils/adt/pg_locale.c | 27 ++++++- src/backend/utils/init/postinit.c | 10 ++- src/bin/initdb/initdb.c | 24 ++++-- src/bin/initdb/t/001_initdb.pl | 49 +++++++++++- src/bin/pg_dump/pg_dump.c | 15 +++- src/bin/pg_upgrade/t/002_pg_upgrade.pl | 35 +++++++-- src/bin/psql/describe.c | 4 +- src/bin/scripts/createdb.c | 2 +- src/bin/scripts/t/020_createdb.pl | 56 ++++++++++++++ src/include/catalog/pg_collation.dat | 3 +- src/include/catalog/pg_collation.h | 3 + src/test/icu/t/010_database.pl | 18 ++--- src/test/regress/expected/collate.out | 25 +++++- src/test/regress/sql/collate.sql | 10 +++ 22 files changed, 493 insertions(+), 84 deletions(-) diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index ed84465996..b38cf82f83 100644 --- a/doc/src/sgml/charset.sgml +++ b/doc/src/sgml/charset.sgml @@ -342,22 +342,14 @@ initdb --locale=sv_SE <title>Locale Providers</title> <para> - <productname>PostgreSQL</productname> supports multiple <firstterm>locale - providers</firstterm>. This specifies which library supplies the locale - data. One standard provider name is <literal>libc</literal>, which uses - the locales provided by the operating system C library. These are the - locales used by most tools provided by the operating system. Another - provider is <literal>icu</literal>, which uses the external - ICU<indexterm><primary>ICU</primary></indexterm> library. ICU locales can - only be used if support for ICU was configured when PostgreSQL was built. + A locale provider specifies which library defines the locale behavior for + collations and character classifications. </para> <para> The commands and tools that select the locale settings, as described - above, each have an option to select the locale provider. The examples - shown earlier all use the <literal>libc</literal> provider, which is the - default. Here is an example to initialize a database cluster using the - ICU provider: + above, each have an option to select the locale provider. Here is an + example to initialize a database cluster using the ICU provider: <programlisting> initdb --locale-provider=icu --icu-locale=en </programlisting> @@ -370,12 +362,75 @@ initdb --locale-provider=icu --icu-locale=en </para> <para> - Which locale provider to use depends on individual requirements. For most - basic uses, either provider will give adequate results. For the libc - provider, it depends on what the operating system offers; some operating - systems are better than others. For advanced uses, ICU offers more locale - variants and customization options. + Regardless of the locale provider, the operating system is still used to + provide some locale-aware behavior, such as messages (see <xref + linkend="guc-lc-messages"/>). </para> + + <para> + The available locale providers are listed below. + </para> + + <sect3 id="locale-provider-builtin"> + <title>Builtin</title> + <para> + The <literal>builtin</literal> provider uses simple built-in operations + which are not locale-aware. Only the <literal>C</literal> (or + equivalently, <literal>POSIX</literal>) locales are supported for this + provider. + </para> + <para> + The collation and character classification behavior is equivalent to + using the <literal>libc</literal> provider with locale + <literal>C</literal>, except that <literal>LC_COLLATE</literal> and + <literal>LC_CTYPE</literal> can be set independently. + </para> + <note> + <para> + When using the <literal>builtin</literal> locale provider, behavior may + depend on the database encoding. + </para> + </note> + </sect3> + <sect3 id="locale-provider-icu"> + <title>ICU</title> + <para> + The <literal>icu</literal> provider uses the external + ICU<indexterm><primary>ICU</primary></indexterm> + library. <productname>PostgreSQL</productname> must have been configured + with support. + </para> + <para> + ICU provides collation and character classification behavior that is + independent of the operating system and database encoding, which is + preferable if you expect to transition to other platforms without any + change in results. <literal>LC_COLLATE</literal> and + <literal>LC_CTYPE</literal> can be set independently of the ICU locale. + </para> + <note> + <para> + For the ICU provider, results may depend on the version of the ICU + library used, as it is updated to reflect changes in natural language + over time. + </para> + </note> + </sect3> + <sect3 id="locale-provider-libc"> + <title>libc</title> + <para> + The <literal>libc</literal> provider uses the operating system's C + library. The collation and character classification behavior is + controlled by the settings <literal>LC_COLLATE</literal> and + <literal>LC_CTYPE</literal>, so they cannot be set independently. + </para> + <note> + <para> + The same locale name may have different behavior on different platforms + when using the libc provider. + </para> + </note> + </sect3> + </sect2> <sect2 id="icu-locales"> <title>ICU Locales</title> diff --git a/doc/src/sgml/ref/create_collation.sgml b/doc/src/sgml/ref/create_collation.sgml index b86a9bbb9c..8cc3b525e6 100644 --- a/doc/src/sgml/ref/create_collation.sgml +++ b/doc/src/sgml/ref/create_collation.sgml @@ -96,6 +96,11 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace <replaceable>locale</replaceable>, you cannot specify either of those parameters. </para> + <para> + If <replaceable>provider</replaceable> is <literal>builtin</literal>, + then <replaceable>locale</replaceable> must be specified and set to + either <literal>C</literal> or <literal>POSIX</literal>. + </para> </listitem> </varlistentry> @@ -129,9 +134,9 @@ CREATE COLLATION [ IF NOT EXISTS ] <replaceable>name</replaceable> FROM <replace <listitem> <para> Specifies the provider to use for locale services associated with this - collation. Possible values are - <literal>icu</literal><indexterm><primary>ICU</primary></indexterm> - (if the server was built with ICU support) or <literal>libc</literal>. + collation. Possible values are <literal>builtin</literal>, + <literal>icu</literal><indexterm><primary>ICU</primary></indexterm> (if + the server was built with ICU support) or <literal>libc</literal>. <literal>libc</literal> is the default. See <xref linkend="locale-providers"/> for details. </para> diff --git a/doc/src/sgml/ref/create_database.sgml b/doc/src/sgml/ref/create_database.sgml index b2c8aef1ad..dbf2a8b771 100644 --- a/doc/src/sgml/ref/create_database.sgml +++ b/doc/src/sgml/ref/create_database.sgml @@ -162,6 +162,12 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable> linkend="create-database-lc-ctype"/>, or <xref linkend="create-database-icu-locale"/> individually. </para> + <para> + If <xref linkend="create-database-locale-provider"/> is + <literal>builtin</literal>, then <replaceable>locale</replaceable> + must be specified and set to either <literal>C</literal> or + <literal>POSIX</literal>. + </para> <tip> <para> The other locale settings <xref linkend="guc-lc-messages"/>, <xref @@ -245,7 +251,7 @@ CREATE DATABASE <replaceable class="parameter">name</replaceable> <listitem> <para> Specifies the provider to use for the default collation in this - database. Possible values are + database. Possible values are <literal>builtin</literal>, <literal>icu</literal><indexterm><primary>ICU</primary></indexterm> (if the server was built with ICU support) or <literal>libc</literal>. By default, the provider is the same as that of the <xref diff --git a/doc/src/sgml/ref/createdb.sgml b/doc/src/sgml/ref/createdb.sgml index e4647d5ce7..d3e815f659 100644 --- a/doc/src/sgml/ref/createdb.sgml +++ b/doc/src/sgml/ref/createdb.sgml @@ -171,7 +171,7 @@ PostgreSQL documentation </varlistentry> <varlistentry> - <term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term> + <term><option>--locale-provider={<literal>builtin</literal>|<literal>libc</literal>|<literal>icu</literal>}</option></term> <listitem> <para> Specifies the locale provider for the database's default collation. diff --git a/doc/src/sgml/ref/initdb.sgml b/doc/src/sgml/ref/initdb.sgml index 22f1011781..95e6529c8a 100644 --- a/doc/src/sgml/ref/initdb.sgml +++ b/doc/src/sgml/ref/initdb.sgml @@ -286,6 +286,11 @@ PostgreSQL documentation environment that <command>initdb</command> runs in. Locale support is described in <xref linkend="locale"/>. </para> + <para> + If <option>--locale-provider</option> is <literal>builtin</literal>, + <option>--locale</option> must be specified and set to + <literal>C</literal> (or equivalently, <literal>POSIX</literal>). + </para> </listitem> </varlistentry> @@ -315,7 +320,7 @@ PostgreSQL documentation </varlistentry> <varlistentry id="app-initdb-option-locale-provider"> - <term><option>--locale-provider={<literal>libc</literal>|<literal>icu</literal>}</option></term> + <term><option>--locale-provider={<literal>builtin</literal>|<literal>libc</literal>|<literal>icu</literal>}</option></term> <listitem> <para> This option sets the locale provider for databases created in the new diff --git a/src/backend/catalog/pg_collation.c b/src/backend/catalog/pg_collation.c index fd022e6fc2..0b1ba359b6 100644 --- a/src/backend/catalog/pg_collation.c +++ b/src/backend/catalog/pg_collation.c @@ -68,7 +68,12 @@ CollationCreate(const char *collname, Oid collnamespace, Assert(collname); Assert(collnamespace); Assert(collowner); - Assert((collcollate && collctype) || colliculocale); + Assert((collprovider == COLLPROVIDER_BUILTIN && + !collcollate && !collctype && !colliculocale) || + (collprovider == COLLPROVIDER_LIBC && + collcollate && collctype && !colliculocale) || + (collprovider == COLLPROVIDER_ICU && + !collcollate && !collctype && colliculocale)); /* * Make sure there is no existing collation of same name & encoding. diff --git a/src/backend/commands/collationcmds.c b/src/backend/commands/collationcmds.c index cc239b4d14..c0cda5b5b0 100644 --- a/src/backend/commands/collationcmds.c +++ b/src/backend/commands/collationcmds.c @@ -66,6 +66,7 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e DefElem *deterministicEl = NULL; DefElem *rulesEl = NULL; DefElem *versionEl = NULL; + char *builtin_locale = NULL; char *collcollate; char *collctype; char *colliculocale; @@ -215,7 +216,9 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e if (collproviderstr) { - if (pg_strcasecmp(collproviderstr, "icu") == 0) + if (pg_strcasecmp(collproviderstr, "builtin") == 0) + collprovider = COLLPROVIDER_BUILTIN; + else if (pg_strcasecmp(collproviderstr, "icu") == 0) collprovider = COLLPROVIDER_ICU; else if (pg_strcasecmp(collproviderstr, "libc") == 0) collprovider = COLLPROVIDER_LIBC; @@ -230,7 +233,11 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e if (localeEl) { - if (collprovider == COLLPROVIDER_LIBC) + if (collprovider == COLLPROVIDER_BUILTIN) + { + builtin_locale = defGetString(localeEl); + } + else if (collprovider == COLLPROVIDER_LIBC) { collcollate = defGetString(localeEl); collctype = defGetString(localeEl); @@ -245,7 +252,22 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e if (lcctypeEl) collctype = defGetString(lcctypeEl); - if (collprovider == COLLPROVIDER_LIBC) + if (collprovider == COLLPROVIDER_BUILTIN) + { + if (!builtin_locale) + ereport(ERROR, + (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), + errmsg("parameter \"locale\" must be specified"))); + + if (strcmp(builtin_locale, "C") != 0 && + strcmp(builtin_locale, "POSIX") != 0) + ereport(ERROR, + (errcode(ERRCODE_FEATURE_NOT_SUPPORTED), + errmsg("collation provider \"builtin\" does not support locale \"%s\"", + builtin_locale), + errhint("The built-in collation provider only supports the \"C\" and \"POSIX\" locales."))); + } + else if (collprovider == COLLPROVIDER_LIBC) { if (!collcollate) ereport(ERROR, @@ -302,7 +324,17 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e (errcode(ERRCODE_INVALID_OBJECT_DEFINITION), errmsg("ICU rules cannot be specified unless locale provider is ICU"))); - if (collprovider == COLLPROVIDER_ICU) + if (collprovider == COLLPROVIDER_BUILTIN) + { + /* + * Behavior may be different in different encodings, so set + * collencoding to the current database encoding. No validation is + * required, because the "builtin" provider is compatible with any + * encoding. + */ + collencoding = GetDatabaseEncoding(); + } + else if (collprovider == COLLPROVIDER_ICU) { #ifdef USE_ICU /* @@ -331,7 +363,18 @@ DefineCollation(ParseState *pstate, List *names, List *parameters, bool if_not_e } if (!collversion) - collversion = get_collation_actual_version(collprovider, collprovider == COLLPROVIDER_ICU ? colliculocale : collcollate); + { + char *locale; + + if (collprovider == COLLPROVIDER_ICU) + locale = colliculocale; + else if (collprovider == COLLPROVIDER_LIBC) + locale = collcollate; + else + locale = NULL; /* COLLPROVIDER_BUILTIN */ + + collversion = get_collation_actual_version(collprovider, locale); + } newoid = CollationCreate(collName, collNamespace, @@ -406,6 +449,7 @@ AlterCollation(AlterCollationStmt *stmt) Form_pg_collation collForm; Datum datum; bool isnull; + char *locale; char *oldversion; char *newversion; ObjectAddress address; @@ -430,8 +474,20 @@ AlterCollation(AlterCollationStmt *stmt) datum = SysCacheGetAttr(COLLOID, tup, Anum_pg_collation_collversion, &isnull); oldversion = isnull ? NULL : TextDatumGetCString(datum); - datum = SysCacheGetAttrNotNull(COLLOID, tup, collForm->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate); - newversion = get_collation_actual_version(collForm->collprovider, TextDatumGetCString(datum)); + if (collForm->collprovider == COLLPROVIDER_ICU) + { + datum = SysCacheGetAttrNotNull(COLLOID, tup, Anum_pg_collation_colliculocale); + locale = TextDatumGetCString(datum); + } + else if (collForm->collprovider == COLLPROVIDER_LIBC) + { + datum = SysCacheGetAttrNotNull(COLLOID, tup, Anum_pg_collation_collcollate); + locale = TextDatumGetCString(datum); + } + else + locale = NULL; /* COLLPROVIDER_BUILTIN */ + + newversion = get_collation_actual_version(collForm->collprovider, locale); /* cannot change from NULL to non-NULL or vice versa */ if ((!oldversion && newversion) || (oldversion && !newversion)) @@ -495,11 +551,18 @@ pg_collation_actual_version(PG_FUNCTION_ARGS) provider = ((Form_pg_database) GETSTRUCT(dbtup))->datlocprovider; - datum = SysCacheGetAttrNotNull(DATABASEOID, dbtup, - provider == COLLPROVIDER_ICU ? - Anum_pg_database_daticulocale : Anum_pg_database_datcollate); - - locale = TextDatumGetCString(datum); + if (provider == COLLPROVIDER_ICU) + { + datum = SysCacheGetAttrNotNull(DATABASEOID, dbtup, Anum_pg_database_daticulocale); + locale = TextDatumGetCString(datum); + } + else if (provider == COLLPROVIDER_LIBC) + { + datum = SysCacheGetAttrNotNull(DATABASEOID, dbtup, Anum_pg_database_datcollate); + locale = TextDatumGetCString(datum); + } + else + locale = NULL; /* COLLPROVIDER_BUILTIN */ ReleaseSysCache(dbtup); } @@ -516,11 +579,19 @@ pg_collation_actual_version(PG_FUNCTION_ARGS) provider = ((Form_pg_collation) GETSTRUCT(colltp))->collprovider; Assert(provider != COLLPROVIDER_DEFAULT); - datum = SysCacheGetAttrNotNull(COLLOID, colltp, - provider == COLLPROVIDER_ICU ? - Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate); - locale = TextDatumGetCString(datum); + if (provider == COLLPROVIDER_ICU) + { + datum = SysCacheGetAttrNotNull(COLLOID, colltp, Anum_pg_collation_colliculocale); + locale = TextDatumGetCString(datum); + } + else if (provider == COLLPROVIDER_LIBC) + { + datum = SysCacheGetAttrNotNull(COLLOID, colltp, Anum_pg_collation_collcollate); + locale = TextDatumGetCString(datum); + } + else + locale = NULL; /* COLLPROVIDER_BUILTIN */ ReleaseSysCache(colltp); } diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index 307729ab7e..ee497c2b29 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -909,7 +909,9 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) { char *locproviderstr = defGetString(dlocprovider); - if (pg_strcasecmp(locproviderstr, "icu") == 0) + if (pg_strcasecmp(locproviderstr, "builtin") == 0) + dblocprovider = COLLPROVIDER_BUILTIN; + else if (pg_strcasecmp(locproviderstr, "icu") == 0) dblocprovider = COLLPROVIDER_ICU; else if (pg_strcasecmp(locproviderstr, "libc") == 0) dblocprovider = COLLPROVIDER_LIBC; @@ -1194,9 +1196,17 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) */ if (src_collversion && !dcollversion) { - char *actual_versionstr; + char *actual_versionstr; + char *locale; - actual_versionstr = get_collation_actual_version(dblocprovider, dblocprovider == COLLPROVIDER_ICU ? dbiculocale : dbcollate); + if (dblocprovider == COLLPROVIDER_ICU) + locale = dbiculocale; + else if (dblocprovider == COLLPROVIDER_LIBC) + locale = dbcollate; + else + locale = NULL; /* COLLPROVIDER_BUILTIN */ + + actual_versionstr = get_collation_actual_version(dblocprovider, locale); if (!actual_versionstr) ereport(ERROR, (errmsg("template database \"%s\" has a collation version, but no actual collation version could be determined", @@ -1224,7 +1234,18 @@ createdb(ParseState *pstate, const CreatedbStmt *stmt) * collation version, which is normally only the case for template0. */ if (dbcollversion == NULL) - dbcollversion = get_collation_actual_version(dblocprovider, dblocprovider == COLLPROVIDER_ICU ? dbiculocale : dbcollate); + { + char *locale; + + if (dblocprovider == COLLPROVIDER_ICU) + locale = dbiculocale; + else if (dblocprovider == COLLPROVIDER_LIBC) + locale = dbcollate; + else + locale = NULL; /* COLLPROVIDER_BUILTIN */ + + dbcollversion = get_collation_actual_version(dblocprovider, locale); + } /* Resolve default tablespace for new database */ if (dtablespacename && dtablespacename->arg) @@ -2444,6 +2465,7 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt) ObjectAddress address; Datum datum; bool isnull; + char *locale; char *oldversion; char *newversion; @@ -2470,10 +2492,24 @@ AlterDatabaseRefreshColl(AlterDatabaseRefreshCollStmt *stmt) datum = heap_getattr(tuple, Anum_pg_database_datcollversion, RelationGetDescr(rel), &isnull); oldversion = isnull ? NULL : TextDatumGetCString(datum); - datum = heap_getattr(tuple, datForm->datlocprovider == COLLPROVIDER_ICU ? Anum_pg_database_daticulocale : Anum_pg_database_datcollate, RelationGetDescr(rel), &isnull); - if (isnull) - elog(ERROR, "unexpected null in pg_database"); - newversion = get_collation_actual_version(datForm->datlocprovider, TextDatumGetCString(datum)); + if (datForm->datlocprovider == COLLPROVIDER_ICU) + { + datum = heap_getattr(tuple, Anum_pg_database_daticulocale, RelationGetDescr(rel), &isnull); + if (isnull) + elog(ERROR, "unexpected null in pg_database"); + locale = TextDatumGetCString(datum); + } + else if (datForm->datlocprovider == COLLPROVIDER_LIBC) + { + datum = heap_getattr(tuple, Anum_pg_database_datcollate, RelationGetDescr(rel), &isnull); + if (isnull) + elog(ERROR, "unexpected null in pg_database"); + locale = TextDatumGetCString(datum); + } + else + locale = NULL; /* COLLPROVIDER_BUILTIN */ + + newversion = get_collation_actual_version(datForm->datlocprovider, locale); /* cannot change from NULL to non-NULL or vice versa */ if ((!oldversion && newversion) || (oldversion && !newversion)) @@ -2658,6 +2694,7 @@ pg_database_collation_actual_version(PG_FUNCTION_ARGS) HeapTuple tp; char datlocprovider; Datum datum; + char *locale; char *version; tp = SearchSysCache1(DATABASEOID, ObjectIdGetDatum(dbid)); @@ -2668,8 +2705,20 @@ pg_database_collation_actual_version(PG_FUNCTION_ARGS) datlocprovider = ((Form_pg_database) GETSTRUCT(tp))->datlocprovider; - datum = SysCacheGetAttrNotNull(DATABASEOID, tp, datlocprovider == COLLPROVIDER_ICU ? Anum_pg_database_daticulocale : Anum_pg_database_datcollate); - version = get_collation_actual_version(datlocprovider, TextDatumGetCString(datum)); + if (datlocprovider == COLLPROVIDER_ICU) + { + datum = SysCacheGetAttrNotNull(DATABASEOID, tp, Anum_pg_database_daticulocale); + locale = TextDatumGetCString(datum); + } + else if (datlocprovider == COLLPROVIDER_LIBC) + { + datum = SysCacheGetAttrNotNull(DATABASEOID, tp, Anum_pg_database_datcollate); + locale = TextDatumGetCString(datum); + } + else + locale = NULL; /* COLLPROVIDER_BUILTIN */ + + version = get_collation_actual_version(datlocprovider, locale); ReleaseSysCache(tp); diff --git a/src/backend/utils/adt/pg_locale.c b/src/backend/utils/adt/pg_locale.c index aa9da99308..b6b832344b 100644 --- a/src/backend/utils/adt/pg_locale.c +++ b/src/backend/utils/adt/pg_locale.c @@ -1265,7 +1265,12 @@ lookup_collation_cache(Oid collation, bool set_flags) elog(ERROR, "cache lookup failed for collation %u", collation); collform = (Form_pg_collation) GETSTRUCT(tp); - if (collform->collprovider == COLLPROVIDER_LIBC) + if (collform->collprovider == COLLPROVIDER_BUILTIN) + { + cache_entry->collate_is_c = true; + cache_entry->ctype_is_c = true; + } + else if (collform->collprovider == COLLPROVIDER_LIBC) { Datum datum; const char *collcollate; @@ -1318,6 +1323,9 @@ lc_collate_is_c(Oid collation) static int result = -1; char *localeptr; + if (default_locale.provider == COLLPROVIDER_BUILTIN) + return true; + if (default_locale.provider == COLLPROVIDER_ICU) return false; @@ -1371,6 +1379,9 @@ lc_ctype_is_c(Oid collation) static int result = -1; char *localeptr; + if (default_locale.provider == COLLPROVIDER_BUILTIN) + return true; + if (default_locale.provider == COLLPROVIDER_ICU) return false; @@ -1518,8 +1529,10 @@ pg_newlocale_from_collation(Oid collid) { if (default_locale.provider == COLLPROVIDER_ICU) return &default_locale; - else + else if (default_locale.provider == COLLPROVIDER_LIBC) return (pg_locale_t) 0; + else + elog(ERROR, "cannot open collation with provider \"builtin\""); } cache_entry = lookup_collation_cache(collid, false); @@ -1544,7 +1557,11 @@ pg_newlocale_from_collation(Oid collid) result.provider = collform->collprovider; result.deterministic = collform->collisdeterministic; - if (collform->collprovider == COLLPROVIDER_LIBC) + if (collform->collprovider == COLLPROVIDER_BUILTIN) + { + elog(ERROR, "cannot open collation with provider \"builtin\""); + } + else if (collform->collprovider == COLLPROVIDER_LIBC) { const char *collcollate; const char *collctype pg_attribute_unused(); @@ -1623,6 +1640,7 @@ pg_newlocale_from_collation(Oid collid) collversionstr = TextDatumGetCString(datum); + Assert(collform->collprovider != COLLPROVIDER_BUILTIN); datum = SysCacheGetAttrNotNull(COLLOID, tp, collform->collprovider == COLLPROVIDER_ICU ? Anum_pg_collation_colliculocale : Anum_pg_collation_collcollate); actual_versionstr = get_collation_actual_version(collform->collprovider, @@ -1674,6 +1692,9 @@ get_collation_actual_version(char collprovider, const char *collcollate) { char *collversion = NULL; + if (collprovider == COLLPROVIDER_BUILTIN) + return NULL; + #ifdef USE_ICU if (collprovider == COLLPROVIDER_ICU) { diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c index f31b85c013..e0114269a7 100644 --- a/src/backend/utils/init/postinit.c +++ b/src/backend/utils/init/postinit.c @@ -461,10 +461,18 @@ CheckMyDatabase(const char *name, bool am_superuser, bool override_allow_connect { char *actual_versionstr; char *collversionstr; + char *locale; collversionstr = TextDatumGetCString(datum); - actual_versionstr = get_collation_actual_version(dbform->datlocprovider, dbform->datlocprovider == COLLPROVIDER_ICU ? iculocale : collate); + if (dbform->datlocprovider == COLLPROVIDER_ICU) + locale = iculocale; + else if (dbform->datlocprovider == COLLPROVIDER_LIBC) + locale = collate; + else + locale = NULL; /* COLLPROVIDER_BUILTIN */ + + actual_versionstr = get_collation_actual_version(dbform->datlocprovider, locale); if (!actual_versionstr) /* should not happen */ elog(WARNING, diff --git a/src/bin/initdb/initdb.c b/src/bin/initdb/initdb.c index 3f4167682a..f36209c0ce 100644 --- a/src/bin/initdb/initdb.c +++ b/src/bin/initdb/initdb.c @@ -2448,7 +2448,7 @@ usage(const char *progname) " set default locale in the respective category for\n" " new databases (default taken from environment)\n")); printf(_(" --no-locale equivalent to --locale=C\n")); - printf(_(" --locale-provider={libc|icu}\n" + printf(_(" --locale-provider={builtin|libc|icu}\n" " set default locale provider for new databases\n")); printf(_(" --pwfile=FILE read password for the new superuser from file\n")); printf(_(" -T, --text-search-config=CFG\n" @@ -2598,7 +2598,15 @@ setup_locale_encoding(void) { setlocales(); - if (locale_provider == COLLPROVIDER_LIBC && + if (locale_provider == COLLPROVIDER_BUILTIN && + strcmp(lc_ctype, "C") == 0 && + strcmp(lc_collate, "C") == 0 && + strcmp(lc_time, "C") == 0 && + strcmp(lc_numeric, "C") == 0 && + strcmp(lc_monetary, "C") == 0 && + strcmp(lc_messages, "C") == 0) + printf(_("The database cluster will be initialized with no locale.\n")); + else if (locale_provider == COLLPROVIDER_LIBC && strcmp(lc_ctype, lc_collate) == 0 && strcmp(lc_ctype, lc_time) == 0 && strcmp(lc_ctype, lc_numeric) == 0 && @@ -2609,9 +2617,11 @@ setup_locale_encoding(void) else { printf(_("The database cluster will be initialized with this locale configuration:\n")); - printf(_(" provider: %s\n"), collprovider_name(locale_provider)); - if (icu_locale) - printf(_(" ICU locale: %s\n"), icu_locale); + printf(_(" default collation provider: %s\n"), collprovider_name(locale_provider)); + if (locale_provider == COLLPROVIDER_BUILTIN) + printf(_(" default collation locale: %s\n"), "C"); + else if (locale_provider == COLLPROVIDER_ICU) + printf(_(" default collation locale: %s\n"), icu_locale); printf(_(" LC_COLLATE: %s\n" " LC_CTYPE: %s\n" " LC_MESSAGES: %s\n" @@ -3272,7 +3282,9 @@ main(int argc, char *argv[]) "-c debug_discard_caches=1"); break; case 15: - if (strcmp(optarg, "icu") == 0) + if (strcmp(optarg, "builtin") == 0) + locale_provider = COLLPROVIDER_BUILTIN; + else if (strcmp(optarg, "icu") == 0) locale_provider = COLLPROVIDER_ICU; else if (strcmp(optarg, "libc") == 0) locale_provider = COLLPROVIDER_LIBC; diff --git a/src/bin/initdb/t/001_initdb.pl b/src/bin/initdb/t/001_initdb.pl index 2d7469d2fc..9d5a093f59 100644 --- a/src/bin/initdb/t/001_initdb.pl +++ b/src/bin/initdb/t/001_initdb.pl @@ -126,7 +126,7 @@ if ($ENV{with_icu} eq 'yes') '--lc-monetary=C', '--lc-time=C', "$tempdir/data4" ], - qr/^\s+ICU locale:\s+und\n/ms, + qr/^\s+default collation locale:\s+und\n/ms, 'options --locale-provider=icu --locale=und --lc-*=C'); command_fails_like( @@ -172,6 +172,53 @@ else 'locale provider ICU fails since no ICU support'); } +command_ok( + [ + 'initdb', '--no-sync', '--locale-provider=builtin', "$tempdir/data6" + ], + 'locale provider builtin' +); + +command_ok( + [ + 'initdb', '--no-sync', '--locale-provider=builtin', '--locale=C', + "$tempdir/data7" + ], + 'locale provider builtin with --locale' +); + +command_ok( + [ + 'initdb', '--no-sync', '--locale-provider=builtin', '--lc-collate=C', + "$tempdir/data8" + ], + 'locale provider builtin with --lc-collate' +); + +command_ok( + [ + 'initdb', '--no-sync', '--locale-provider=builtin', '--lc-ctype=C', + "$tempdir/data9" + ], + 'locale provider builtin with --lc-ctype' +); + +command_fails( + [ + 'initdb', '--no-sync', '--locale-provider=builtin', '--icu-locale=en', + "$tempdir/dataX" + ], + 'fails for locale provider builtin with ICU locale' +); + +command_fails( + [ + 'initdb', '--no-sync', '--locale-provider=builtin', '--icu-rules=""', + "$tempdir/dataX" + ], + 'fails for locale provider builtin with ICU rules' +); + command_fails( [ 'initdb', '--no-sync', '--locale-provider=xyz', "$tempdir/dataX" ], 'fails for invalid locale provider'); diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index c73e9a11da..56c9863972 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -3070,7 +3070,9 @@ dumpDatabase(Archive *fout) } appendPQExpBufferStr(creaQry, " LOCALE_PROVIDER = "); - if (datlocprovider[0] == 'c') + if (datlocprovider[0] == 'b') + appendPQExpBufferStr(creaQry, "builtin"); + else if (datlocprovider[0] == 'c') appendPQExpBufferStr(creaQry, "libc"); else if (datlocprovider[0] == 'i') appendPQExpBufferStr(creaQry, "icu"); @@ -13444,7 +13446,9 @@ dumpCollation(Archive *fout, const CollInfo *collinfo) fmtQualifiedDumpable(collinfo)); appendPQExpBufferStr(q, "provider = "); - if (collprovider[0] == 'c') + if (collprovider[0] == 'b') + appendPQExpBufferStr(q, "builtin"); + else if (collprovider[0] == 'c') appendPQExpBufferStr(q, "libc"); else if (collprovider[0] == 'i') appendPQExpBufferStr(q, "icu"); @@ -13465,6 +13469,13 @@ dumpCollation(Archive *fout, const CollInfo *collinfo) /* no locale -- the default collation cannot be reloaded anyway */ } + else if (collprovider[0] == 'b') + { + if (collcollate || collctype || colliculocale || collicurules) + pg_log_warning("invalid collation \"%s\"", qcollname); + + appendPQExpBufferStr(q, ", locale = 'C'"); + } else if (collprovider[0] == 'i') { if (fout->remoteVersion >= 150000) diff --git a/src/bin/pg_upgrade/t/002_pg_upgrade.pl b/src/bin/pg_upgrade/t/002_pg_upgrade.pl index a5688a1cf2..2c484c7b17 100644 --- a/src/bin/pg_upgrade/t/002_pg_upgrade.pl +++ b/src/bin/pg_upgrade/t/002_pg_upgrade.pl @@ -114,22 +114,45 @@ my $original_locale = "C"; my $original_iculocale = ""; my $provider_field = "'c' AS datlocprovider"; my $iculocale_field = "NULL AS daticulocale"; -if ($oldnode->pg_version >= 15 && $ENV{with_icu} eq 'yes') +if ($oldnode->pg_version >= 15) { $provider_field = "datlocprovider"; $iculocale_field = "daticulocale"; - $original_provider = "i"; - $original_iculocale = "fr-CA"; + + if ($ENV{with_icu} eq 'yes') + { + $original_provider = "i"; + $original_iculocale = "fr-CA"; + } +} + +# use builtin provider instead of libc, if supported +if ($oldnode->pg_version >= 16 && $ENV{with_icu} ne 'yes') +{ + $original_provider = "b"; } my @initdb_params = @custom_opts; push @initdb_params, ('--encoding', 'UTF-8'); push @initdb_params, ('--locale', $original_locale); -if ($original_provider eq "i") + +# add --locale-provider, if supported +if ($oldnode->pg_version >= 15) { - push @initdb_params, ('--locale-provider', 'icu'); - push @initdb_params, ('--icu-locale', 'fr-CA'); + if ($original_provider eq "b") + { + push @initdb_params, ('--locale-provider', 'builtin'); + } + elsif ($original_provider eq "i") + { + push @initdb_params, ('--locale-provider', 'icu'); + push @initdb_params, ('--icu-locale', 'fr-CA'); + } + elsif ($original_provider eq "c") + { + push @initdb_params, ('--locale-provider', 'libc'); + } } $node_params{extra} = \@initdb_params; diff --git a/src/bin/psql/describe.c b/src/bin/psql/describe.c index 45f6a86b87..5ad920e0f0 100644 --- a/src/bin/psql/describe.c +++ b/src/bin/psql/describe.c @@ -932,7 +932,7 @@ listAllDbs(const char *pattern, bool verbose) gettext_noop("Encoding")); if (pset.sversion >= 150000) appendPQExpBuffer(&buf, - " CASE d.datlocprovider WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n", + " CASE d.datlocprovider WHEN 'b' THEN 'builtin' WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n", gettext_noop("Locale Provider")); else appendPQExpBuffer(&buf, @@ -4934,7 +4934,7 @@ listCollations(const char *pattern, bool verbose, bool showSystem) if (pset.sversion >= 100000) appendPQExpBuffer(&buf, - " CASE c.collprovider WHEN 'd' THEN 'default' WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n", + " CASE c.collprovider WHEN 'd' THEN 'default' WHEN 'b' THEN 'builtin' WHEN 'c' THEN 'libc' WHEN 'i' THEN 'icu' END AS \"%s\",\n", gettext_noop("Provider")); else appendPQExpBuffer(&buf, diff --git a/src/bin/scripts/createdb.c b/src/bin/scripts/createdb.c index 9ca86a3e53..8f8995964c 100644 --- a/src/bin/scripts/createdb.c +++ b/src/bin/scripts/createdb.c @@ -296,7 +296,7 @@ help(const char *progname) printf(_(" --lc-ctype=LOCALE LC_CTYPE setting for the database\n")); printf(_(" --icu-locale=LOCALE ICU locale setting for the database\n")); printf(_(" --icu-rules=RULES ICU rules setting for the database\n")); - printf(_(" --locale-provider={libc|icu}\n" + printf(_(" --locale-provider={builtin|libc|icu}\n" " locale provider for the database's default collation\n")); printf(_(" -O, --owner=OWNER database user to own the new database\n")); printf(_(" -S, --strategy=STRATEGY database creation strategy wal_log or file_copy\n")); diff --git a/src/bin/scripts/t/020_createdb.pl b/src/bin/scripts/t/020_createdb.pl index 40291924e5..768169305b 100644 --- a/src/bin/scripts/t/020_createdb.pl +++ b/src/bin/scripts/t/020_createdb.pl @@ -105,6 +105,62 @@ else 'create database with ICU fails since no ICU support'); } +$node->command_ok( + [ + 'createdb', '-T', 'template0', '--locale-provider=builtin', + 'tbuiltin1' + ], + 'create database with provider "builtin"' +); + +$node->command_ok( + [ + 'createdb', '-T', 'template0', '--locale-provider=builtin', + '--locale=C', 'tbuiltin2' + ], + 'create database with provider "builtin" and locale "C"' +); + +$node->command_ok( + [ + 'createdb', '-T', 'template0', '--locale-provider=builtin', + '--lc-collate=C', 'tbuiltin3' + ], + 'create database with provider "builtin" and LC_COLLATE=C' +); + +$node->command_ok( + [ + 'createdb', '-T', 'template0', '--locale-provider=builtin', + '--lc-ctype=C', 'tbuiltin4' + ], + 'create database with provider "builtin" and LC_CTYPE=C' +); + +$node->command_fails( + [ + 'createdb', '-T', 'template0', '--locale-provider=builtin', + '--icu-locale=en', 'tbuiltin5' + ], + 'create database with provider "builtin" and ICU_LOCALE="en"' +); + +$node->command_fails( + [ + 'createdb', '-T', 'template0', '--locale-provider=builtin', + '--icu-rules=""', 'tbuiltin6' + ], + 'create database with provider "builtin" and ICU_RULES=""' +); + +$node->command_fails( + [ + 'createdb', '-T', 'template1', '--locale-provider=builtin', + '--locale=C', 'tbuiltin7' + ], + 'create database with provider "builtin" not matching template' +); + $node->command_fails([ 'createdb', 'foobar1' ], 'fails if database already exists'); diff --git a/src/include/catalog/pg_collation.dat b/src/include/catalog/pg_collation.dat index b6a69d1d42..cfb53807ed 100644 --- a/src/include/catalog/pg_collation.dat +++ b/src/include/catalog/pg_collation.dat @@ -24,8 +24,7 @@ collname => 'POSIX', collprovider => 'c', collencoding => '-1', collcollate => 'POSIX', collctype => 'POSIX' }, { oid => '962', descr => 'sorts by Unicode code point', - collname => 'ucs_basic', collprovider => 'c', collencoding => '6', - collcollate => 'C', collctype => 'C' }, + collname => 'ucs_basic', collprovider => 'b', collencoding => '6' }, { oid => '963', descr => 'sorts using the Unicode Collation Algorithm with default settings', collname => 'unicode', collprovider => 'i', collencoding => '-1', diff --git a/src/include/catalog/pg_collation.h b/src/include/catalog/pg_collation.h index bfa3568451..4009c4ec93 100644 --- a/src/include/catalog/pg_collation.h +++ b/src/include/catalog/pg_collation.h @@ -65,6 +65,7 @@ DECLARE_UNIQUE_INDEX_PKEY(pg_collation_oid_index, 3085, CollationOidIndexId, on #ifdef EXPOSE_TO_CLIENT_CODE #define COLLPROVIDER_DEFAULT 'd' +#define COLLPROVIDER_BUILTIN 'b' #define COLLPROVIDER_ICU 'i' #define COLLPROVIDER_LIBC 'c' @@ -73,6 +74,8 @@ collprovider_name(char c) { switch (c) { + case COLLPROVIDER_BUILTIN: + return "builtin"; case COLLPROVIDER_ICU: return "icu"; case COLLPROVIDER_LIBC: diff --git a/src/test/icu/t/010_database.pl b/src/test/icu/t/010_database.pl index 0e9446cebe..df3fba13ed 100644 --- a/src/test/icu/t/010_database.pl +++ b/src/test/icu/t/010_database.pl @@ -59,14 +59,14 @@ is( $node1->psql( 0, "C locale works for ICU"); -# Test that LOCALE works for ICU locales if LC_COLLATE and LC_CTYPE -# are specified -is( $node1->psql( - 'postgres', - q{CREATE DATABASE dbicu2 LOCALE_PROVIDER icu LOCALE '@colStrength=primary' - LC_COLLATE='C' LC_CTYPE='C' TEMPLATE template0 ENCODING UTF8} - ), - 0, - "LOCALE works for ICU locales if LC_COLLATE and LC_CTYPE are specified"); +my ($ret, $stdout, $stderr) = $node1->psql('postgres', + q{CREATE DATABASE dbicu LOCALE_PROVIDER builtin LOCALE 'C' TEMPLATE dbicu} +); +isnt($ret, 0, + "locale provider must match template: exit code not 0"); +like( + $stderr, + qr/ERROR: new locale provider \(builtin\) does not match locale provider of the template database \(icu\)/, + "locale provider must match template: error message"); done_testing(); diff --git a/src/test/regress/expected/collate.out b/src/test/regress/expected/collate.out index 0649564485..5b28de1b47 100644 --- a/src/test/regress/expected/collate.out +++ b/src/test/regress/expected/collate.out @@ -650,6 +650,27 @@ EXPLAIN (COSTS OFF) (3 rows) -- CREATE/DROP COLLATION +CREATE COLLATION builtin_c ( PROVIDER = builtin, LOCALE = "C" ); +CREATE COLLATION builtin_posix ( PROVIDER = builtin, LOCALE = "POSIX" ); +SELECT b FROM collate_test1 ORDER BY b COLLATE builtin_c; + b +----- + ABD + Abc + abc + bbc +(4 rows) + +CREATE COLLATION builtin2 ( PROVIDER = builtin ); -- fails +ERROR: parameter "locale" must be specified +CREATE COLLATION builtin2 ( PROVIDER = builtin, LOCALE = "en_US" ); -- fails +ERROR: collation provider "builtin" does not support locale "en_US" +HINT: The built-in collation provider only supports the "C" and "POSIX" locales. +CREATE COLLATION builtin2 ( PROVIDER = builtin, LC_CTYPE = "C", LC_COLLATE = "C" ); -- fails +ERROR: parameter "locale" must be specified +CREATE COLLATION builtin2 ( PROVIDER = builtin, LOCALE = "POSIX", LC_CTYPE = "POSIX" ); -- fails +ERROR: conflicting or redundant options +DETAIL: LOCALE cannot be specified together with LC_COLLATE or LC_CTYPE. CREATE COLLATION mycoll1 FROM "C"; CREATE COLLATION mycoll2 ( LC_COLLATE = "POSIX", LC_CTYPE = "POSIX" ); CREATE COLLATION mycoll3 FROM "default"; -- intentionally unsupported @@ -754,7 +775,7 @@ DETAIL: FROM cannot be specified together with any other options. -- must get rid of them. -- DROP SCHEMA collate_tests CASCADE; -NOTICE: drop cascades to 19 other objects +NOTICE: drop cascades to 21 other objects DETAIL: drop cascades to table collate_test1 drop cascades to table collate_test_like drop cascades to table collate_test2 @@ -771,6 +792,8 @@ drop cascades to function dup(anyelement) drop cascades to table collate_test20 drop cascades to table collate_test21 drop cascades to table collate_test22 +drop cascades to collation builtin_c +drop cascades to collation builtin_posix drop cascades to collation mycoll2 drop cascades to table collate_test23 drop cascades to view collate_on_int diff --git a/src/test/regress/sql/collate.sql b/src/test/regress/sql/collate.sql index c3d40fc195..01d5c69fe4 100644 --- a/src/test/regress/sql/collate.sql +++ b/src/test/regress/sql/collate.sql @@ -244,6 +244,16 @@ EXPLAIN (COSTS OFF) -- CREATE/DROP COLLATION +CREATE COLLATION builtin_c ( PROVIDER = builtin, LOCALE = "C" ); +CREATE COLLATION builtin_posix ( PROVIDER = builtin, LOCALE = "POSIX" ); + +SELECT b FROM collate_test1 ORDER BY b COLLATE builtin_c; + +CREATE COLLATION builtin2 ( PROVIDER = builtin ); -- fails +CREATE COLLATION builtin2 ( PROVIDER = builtin, LOCALE = "en_US" ); -- fails +CREATE COLLATION builtin2 ( PROVIDER = builtin, LC_CTYPE = "C", LC_COLLATE = "C" ); -- fails +CREATE COLLATION builtin2 ( PROVIDER = builtin, LOCALE = "POSIX", LC_CTYPE = "POSIX" ); -- fails + CREATE COLLATION mycoll1 FROM "C"; CREATE COLLATION mycoll2 ( LC_COLLATE = "POSIX", LC_CTYPE = "POSIX" ); CREATE COLLATION mycoll3 FROM "default"; -- intentionally unsupported -- 2.34.1