Re: Documentation improvement patch
Thank you for your feedback, Daniel. My thoughts are below: /- Change the definition of a replication slot. + Changes the definition of a replication slot. Reading this page it seems we are mixing tense in many places, some say "Change the" and "Read some" and elsewhere we use "Drops the". Maybe a more holistic approach would be better for this page to improve consistency? /I agree, let's add "s" in all cases for the sake of consistency. /- Not enabled by default because it is resource intensive. + Not enabled by default because it is resource-intensive. We use both spellings in multiple places, shouldn't all be changed?/ Agreed, changing all instances to "resource-intensive". /- COPY and other file-access functions. + the COPY command and file-access functions. ... - COPY and other file-access functions. + the COPY command and file-access functions. ... - COPY and other functions which allow executing a + the COPY command and functions, which allow executing a I'm not sure about these, I think we use COPY without the the "the COPY command" decoration in many places so I think it's more consistent like this. /I actually think we should add the decoration here because "COPY and other file-access functions" sounds a bit confusing since COPY is not a file-access function and we seem to put it in the list. Even though I agree that everybody knows COPY is a command, not a function. /- to call functions defined in the standard internal library, by using an + to call functions defined in the standard internal function library by using an interface similar to their SQL signature. Isn't it a bit redundant to say "internal function library" when we are already talking about function definitions?/ I agree that it may seem redundant, I added "function" here for the sake of consistency with lines 1829/1830 (if applied to the master branch) where the documentation mentions "standard internal*function* library". Please, let me know what you think of the last two points for me to send the updated patch. -- Oleg Sibiryakov On 10.10.2025 10:15, Daniel Gustafsson wrote: On 10 Sep 2025, at 09:54, Oleg wrote: Dear all, I have prepared a patch containing some minor inconsistencies in the documentation. Please, take a look. I will be looking forward to your feedback. Thanks for the patch, while most of these are obvious improvements I have a few comments on some: - Change the definition of a replication slot. + Changes the definition of a replication slot. Reading this page it seems we are mixing tense in many places, some say "Change the" and "Read some" and elsewhere we use "Drops the". Maybe a more holistic approach would be better for this page to improve consistency? - Not enabled by default because it is resource intensive. + Not enabled by default because it is resource-intensive. We use both spellings in multiple places, shouldn't all be changed? - COPY and other file-access functions. + the COPY command and file-access functions. ... - COPY and other file-access functions. + the COPY command and file-access functions. ... - COPY and other functions which allow executing a + the COPY command and functions, which allow executing a I'm not sure about these, I think we use COPY without the the "the COPY command" decoration in many places so I think it's more consistent like this. - to call functions defined in the standard internal library, by using an + to call functions defined in the standard internal function library by using an interface similar to their SQL signature. Isn't it a bit redundant to say "internal function library" when we are already talking about function definitions? The patch shall be applied to the REL_18_STABLE branch. As you mentioned downthread, this is also for master. Our workflow is to always apply to master and backpatch from there. -- Daniel Gustafsson
Documentation improvement patch
Dear all, I have prepared a patch containing some minor inconsistencies in the documentation. Please, take a look. I will be looking forward to your feedback. The patch shall be applied to the REL_18_STABLE branch. -- Regards, Oleg Sibiryakov Technical Writer Postgres Professional, The Russian Postgres Company https://postgrespro.ru diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index d1e103ed779..e589a8d6884 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1234,7 +1234,7 @@ include_dir 'conf.d' -The library/libraries to use for validating OAuth connection tokens. If +Sets the library/libraries to use for validating OAuth connection tokens. If only one validator library is provided, it will be used by default for any OAuth connections; otherwise, all oauth HBA entries @@ -1400,7 +1400,7 @@ include_dir 'conf.d' Specifies a list of cipher suites that are allowed by connections using TLS version 1.3. Multiple cipher suites can be -specified by using a colon separated list. If left blank, the default +specified by using a colon-separated list. If left blank, the default set of cipher suites in OpenSSL will be used. @@ -2432,7 +2432,7 @@ include_dir 'conf.d' -Sets the maximum number of open files each server subprocess is +Sets the maximum number of files each server subprocess is allowed to open simultaneously; files already opened in the postmaster are not counted toward this limit. The default is one thousand files. diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml index 781a01067f7..9b032fbf675 100644 --- a/doc/src/sgml/postgres-fdw.sgml +++ b/doc/src/sgml/postgres-fdw.sgml @@ -1226,7 +1226,7 @@ postgres=# SELECT postgres_fdw_disconnect_all(); PostgresFdwCleanupResult - Waiting for transaction abort on remote server. + Waiting for transaction abort on a remote server. diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml index af476c82fcc..2101442c90f 100644 --- a/doc/src/sgml/postgres.sgml +++ b/doc/src/sgml/postgres.sgml @@ -49,7 +49,7 @@ break is not needed in a wider output rendering. -After you have successfully completed this tutorial you will want to +After you have successfully completed this tutorial, you will want to read the section to gain a better understanding of the SQL language, or for information about developing applications with diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index d336ee38f58..80eadfc0e1a 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -1636,7 +1636,7 @@ SELCT 1/0; Likewise the server expects the client to not begin the SSL negotiation until it receives the server's - single byte response to the SSL request. If the + single-byte response to the SSL request. If the client begins the SSL negotiation immediately without waiting for the server response to be received it can reduce connection latency by one round-trip. However this comes at the cost of not being @@ -2394,7 +2394,7 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;" - Change the definition of a replication slot. + Changes the definition of a replication slot. See for more about replication slots. This command is currently only supported for logical replication slots. diff --git a/doc/src/sgml/ref/pg_recvlogical.sgml b/doc/src/sgml/ref/pg_recvlogical.sgml index 263ebdeeab4..2a0de0cfb63 100644 --- a/doc/src/sgml/ref/pg_recvlogical.sgml +++ b/doc/src/sgml/ref/pg_recvlogical.sgml @@ -84,7 +84,7 @@ PostgreSQL documentation -The --slot and --dbname are required +The --slot and --dbname options are required for this action. @@ -104,7 +104,7 @@ PostgreSQL documentation -The --slot is required for this action. +The --slot option is required for this action. @@ -121,8 +121,8 @@ PostgreSQL documentation -The --slot and --dbname, ---file are required for this action. +The --slot, --dbname, and +--file options are required for this action. diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml index ab252d9fc74..e2b2a0ea26f 100644 --- a/doc/src/sgml/ref/pgbench.sgml +++ b/doc/src/sgml/ref/pgbench.sgml @@ -2826,7 +2826,7 @@ statement latencies in milliseconds, failures and retries: start a connection to the database server / the socket for connecting the client to the database server has become invalid).
Re: Documentation improvement patch
Dear PostgreSQL Community, This is a kind reminder regarding my documentation patch submitted a month ago. I am still very interested in contributing these improvements and would be grateful for a review when time permits. The patch can be also applied to the master branch. Thank you for your consideration. -- Regards, Oleg Sibiryakov Technical Writer Postgres Professional, The Russian Postgres Company https://postgrespro.ru On 10.09.2025 10:54, Oleg wrote: Dear all, I have prepared a patch containing some minor inconsistencies in the documentation. Please, take a look. I will be looking forward to your feedback. The patch shall be applied to the REL_18_STABLE branch. -- Regards, Oleg Sibiryakov Technical Writer Postgres Professional, The Russian Postgres Company https://postgrespro.ru
Re: Documentation improvement patch
Dear Daniel, Thank you for your prompt feedback. Attached, please find the updated documentation patch, which incorporates your suggestions from both the first and second rounds of review. -- Oleg Sibiryakov On 22.10.2025 11:02, Daniel Gustafsson wrote: On 13 Oct 2025, at 12:51, Oleg wrote: - COPY and other functions which allow executing a + the COPY command and functions, which allow executing a I'm not sure about these, I think we use COPY without the the "the COPY command" decoration in many places so I think it's more consistent like this. I actually think we should add the decoration here because "COPY and other file-access functions" sounds a bit confusing since COPY is not a file-access function and we seem to put it in the list. Even though I agree that everybody knows COPY is a command, not a function. We refer to SQL commands by just their names all over the documentation without saying "an EXPLAIN command" etc, and I think this falls in that same category. - to call functions defined in the standard internal library, by using an + to call functions defined in the standard internal function library by using an interface similar to their SQL signature. Isn't it a bit redundant to say "internal function library" when we are already talking about function definitions? I agree that it may seem redundant, I added "function" here for the sake of consistency with lines 1829/1830 (if applied to the master branch) where the documentation mentions "standard internal function library". I hadn't seen that, but with that in mind I agree that being consistent is good so I'll withdraw that comment. -- Daniel Gustafsson diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 0a2a8b49fdb..71c2bbf7615 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -1232,11 +1232,11 @@ include_dir 'conf.d' oauth_validator_libraries configuration parameter -The library/libraries to use for validating OAuth connection tokens. If +Sets the library/libraries to use for validating OAuth connection tokens. If only one validator library is provided, it will be used by default for any OAuth connections; otherwise, all oauth HBA entries must explicitly set a validator chosen from this list. If set to an empty string (the default), OAuth connections will be @@ -1398,11 +1398,11 @@ include_dir 'conf.d' Specifies a list of cipher suites that are allowed by connections using TLS version 1.3. Multiple cipher suites can be -specified by using a colon separated list. If left blank, the default +specified by using a colon-separated list. If left blank, the default set of cipher suites in OpenSSL will be used. This parameter can only be set in the @@ -2430,11 +2430,11 @@ include_dir 'conf.d' max_files_per_process configuration parameter -Sets the maximum number of open files each server subprocess is +Sets the maximum number of files each server subprocess is allowed to open simultaneously; files already opened in the postmaster are not counted toward this limit. The default is one thousand files. diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml index 593202f4fb2..fe8d73e1f8c 100644 --- a/doc/src/sgml/installation.sgml +++ b/doc/src/sgml/installation.sgml @@ -3168,11 +3168,11 @@ ninja install -DPG_TEST_EXTRA=TEST_SUITES Enable additional test suites, which are not run by default because they are not secure to run on a multiuser system, require special -software to run, or are resource intensive. The argument is a +software to run, or are resource-intensive. The argument is a whitespace-separated list of tests to enable. See for details. If the PG_TEST_EXTRA environment variable is set when the tests are run, it overrides this setup-time option. diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml index 781a01067f7..9b032fbf675 100644 --- a/doc/src/sgml/postgres-fdw.sgml +++ b/doc/src/sgml/postgres-fdw.sgml @@ -1224,11 +1224,11 @@ postgres=# SELECT postgres_fdw_disconnect_all(); PostgresFdwCleanupResult - Waiting for transaction abort on remote server. + Waiting for transaction abort on a remote server. diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml index af476c82fcc..2101442c90f 100644 --- a/doc/src/sgml/postgres.sgml +++ b/doc/src/sgml/postgres.sgml @@ -47,11 +47,11 @@ break is not needed in a wider output rendering. important
Re: Documentation improvement patch
Dear Daniel, Could you please provide your feedback on the last two points? Once I have it, I will send the updated patch immediately to finalize the improvements. Thank you, Oleg On 13.10.2025 13:51, Oleg wrote: Thank you for your feedback, Daniel. My thoughts are below: /- Change the definition of a replication slot. + Changes the definition of a replication slot. Reading this page it seems we are mixing tense in many places, some say "Change the" and "Read some" and elsewhere we use "Drops the". Maybe a more holistic approach would be better for this page to improve consistency? /I agree, let's add "s" in all cases for the sake of consistency. /- Not enabled by default because it is resource intensive. + Not enabled by default because it is resource-intensive. We use both spellings in multiple places, shouldn't all be changed?/ Agreed, changing all instances to "resource-intensive". /- COPY and other file-access functions. + the COPY command and file-access functions. ... - COPY and other file-access functions. + the COPY command and file-access functions. ... - COPY and other functions which allow executing a + the COPY command and functions, which allow executing a I'm not sure about these, I think we use COPY without the the "the COPY command" decoration in many places so I think it's more consistent like this. /I actually think we should add the decoration here because "COPY and other file-access functions" sounds a bit confusing since COPY is not a file-access function and we seem to put it in the list. Even though I agree that everybody knows COPY is a command, not a function. /- to call functions defined in the standard internal library, by using an + to call functions defined in the standard internal function library by using an interface similar to their SQL signature. Isn't it a bit redundant to say "internal function library" when we are already talking about function definitions?/ I agree that it may seem redundant, I added "function" here for the sake of consistency with lines 1829/1830 (if applied to the master branch) where the documentation mentions "standard internal*function* library". Please, let me know what you think of the last two points for me to send the updated patch. -- Oleg Sibiryakov On 10.10.2025 10:15, Daniel Gustafsson wrote: On 10 Sep 2025, at 09:54, Oleg wrote: Dear all, I have prepared a patch containing some minor inconsistencies in the documentation. Please, take a look. I will be looking forward to your feedback. Thanks for the patch, while most of these are obvious improvements I have a few comments on some: - Change the definition of a replication slot. + Changes the definition of a replication slot. Reading this page it seems we are mixing tense in many places, some say "Change the" and "Read some" and elsewhere we use "Drops the". Maybe a more holistic approach would be better for this page to improve consistency? - Not enabled by default because it is resource intensive. + Not enabled by default because it is resource-intensive. We use both spellings in multiple places, shouldn't all be changed? - COPY and other file-access functions. + the COPY command and file-access functions. ... - COPY and other file-access functions. + the COPY command and file-access functions. ... - COPY and other functions which allow executing a + the COPY command and functions, which allow executing a I'm not sure about these, I think we use COPY without the the "the COPY command" decoration in many places so I think it's more consistent like this. - to call functions defined in the standard internal library, by using an + to call functions defined in the standard internal function library by using an interface similar to their SQL signature. Isn't it a bit redundant to say "internal function library" when we are already talking about function definitions? The patch shall be applied to the REL_18_STABLE branch. As you mentioned downthread, this is also for master. Our workflow is to always apply to master and backpatch from there. -- Daniel Gustafsson
Documentation improvement patch
Dear all, I have prepared a patch containing some minor inconsistencies in the documentation. Please, take a look. The inconsistencies were noticed by: Ekaterina Kiryanova, Elena Indrupskaya, Maxim Yablokov, Anna Uraskova, Elena Karavaeva, and me. We will be looking forward to your feedback. The patch shall be applied to the REL_17_STABLE branch. -- Regards, Oleg Sibiryakov Technical Writer Postgres Professional, The Russian Postgres Company https://postgrespro.ru diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index a63cc71efa2..7a905fd6a3a 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -8029,7 +8029,7 @@ SCRAM-SHA-256$<iteration count>:&l If true, the associated replication slots (i.e. the main slot and the - table sync slots) in the upstream database are enabled to be + table synchronization slots) in the upstream database are enabled to be synchronized to the standbys diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index 834cb30c85a..a76e9579a14 100644 --- a/doc/src/sgml/charset.sgml +++ b/doc/src/sgml/charset.sgml @@ -382,8 +382,8 @@ initdb --locale-provider=icu --icu-locale=en The C locale behavior is identical to the - C locale in the libc provider. When using this - locale, the behavior may depend on the database encoding. + C locale in the libc provider. When + using this locale, the behavior may depend on the database encoding. The C.UTF-8 locale is available only for when the @@ -400,7 +400,7 @@ initdb --locale-provider=icu --icu-locale=en The icu provider uses the external - ICUICU + ICU library. PostgreSQL must have been configured with support. @@ -862,8 +862,9 @@ SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR"; This SQL standard collation sorts using the Unicode Collation Algorithm with the Default Unicode Collation Element Table. It is available in all encodings. ICU support is required to use this -collation, and behavior may change if Postgres is built with a -different version of ICU. (This collation has the same behavior as +collation, and behavior may change if +PostgreSQL is built with a different version +of ICU. (This collation has the same behavior as the ICU root locale; see .) @@ -897,7 +898,7 @@ SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR"; expressions), it uses the POSIX Compatible variant of Unicode https://www.unicode.org/reports/tr18/#Compatibility_Properties";>Compatibility Properties. Behavior is efficient and stable within a -Postgres major version. This collation is +PostgreSQL major version. This collation is only available for encoding UTF8. diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index 7c60aeab4f6..2027308f89f 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -708,9 +708,10 @@ include_dir 'conf.d' -PostgreSQL sizes certain resources based directly on the value of -max_connections. Increasing its value leads to -higher allocation of those resources, including shared memory. +PostgreSQL sizes certain resources based +directly on the value of max_connections. Increasing +its value leads to higher allocation of those resources, including +shared memory. @@ -9384,7 +9385,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; If transaction_timeout is shorter or equal to idle_in_transaction_session_timeout or statement_timeout -then the longer timeout is ignored. +then the longer timeout is ignored. @@ -10842,7 +10843,7 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir' Turning this setting off is intended for environments where the configuration of PostgreSQL is managed by some external tool. -In such environments, a well intentioned superuser might +In such environments, a well-intentioned superuser might mistakenly use ALTER SYSTEM to change the configuration instead of using the external tool. This might result in unintended behavior, such as the external tool diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index 220683b5eb4..1d2987e628d 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -313,7 +313,7 @@ INSERT INTO people (id, name, address) VALUE (DEFAULT, 'C', The data type of an identity column must be one of the data types supported - by sequences. (See .) The properties + by sequences (see ). The properties of
Re: Documentation improvement patch
Thank you for your feedback. 1. Since we do not want to use here, I suggest we hyphenate it as "built-in". What's your take on it? 2. Leaving not-null is fine. -- Oleg Sibiryakov On 06.09.2024 16:20, Daniel Gustafsson wrote: On 5 Sep 2024, at 11:33, Oleg Sibiryakov wrote: Dear all, I have prepared a patch containing some minor inconsistencies in the documentation. Please, take a look. The inconsistencies were noticed by: Ekaterina Kiryanova, Elena Indrupskaya, Maxim Yablokov, Anna Uraskova, Elena Karavaeva, and me. We will be looking forward to your feedback. The patch shall be applied to the REL_17_STABLE branch. Most of these seem fine, but I need another read-through to digest them fully. Just a few small comments: -Specifies the builtin provider locale for the database default -collation order and character classification, overriding the setting -. The builtin provider locale for the database +default collation order and character classification, overriding the +setting . The . +Specifies the locale name when the builtin provider +is used. Locale support is described in . I don't think this use of "builtin" refers to the config value but rather the type of locale, so I think it's correct to not use here. -for not-null constraints at all, so they are not +for NOT NULL constraints at all, so they are not This seems mostly to be a question of taste, I don't think not-null is incorrect here. -- Daniel Gustafsson
Re: Documentation improvement patch
Here is a patch without the builtin/built-in corrections (find attached). But I still believe the issue should be discussed further. We actually have two options: it is either a spelling mistake (since built-in should written with a hyphen), or we miss the tag (since it is actually also a value). So I do think we cannot really leave it as is. -- Oleg Sibiryakov On 11.09.2024 12:53, Peter Eisentraut wrote: On 10.09.24 15:02, Daniel Gustafsson wrote: On 10 Sep 2024, at 13:46, Oleg Sibiryakov wrote: 1. Since we do not want to use here, I suggest we hyphenate it as "built-in". What's your take on it? I think that's the right choice given the hyphenation used in the rest of the docs. There are a few more places on that same page which should be built-in rather than builtin to separate the concept from the parameter value. I suspect that this would lead to the opposite confusion, people complaining that the provider is called "builtin" not "built-in". Arguably, the other providers are also "built in". There are no user-pluggable providers at this time. diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index a63cc71efa2..7a905fd6a3a 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -8012,41 +8012,41 @@ SCRAM-SHA-256$<iteration count>:&l for authentication subrunasowner bool If true, the subscription will be run with the permissions of the subscription owner subfailover bool If true, the associated replication slots (i.e. the main slot and the - table sync slots) in the upstream database are enabled to be + table synchronization slots) in the upstream database are enabled to be synchronized to the standbys subconninfo text Connection string to the upstream database subslotname name Name of the replication slot in the upstream database (also used for the local replication origin name); diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index 834cb30c85a..a76e9579a14 100644 --- a/doc/src/sgml/charset.sgml +++ b/doc/src/sgml/charset.sgml @@ -365,59 +365,59 @@ initdb --locale-provider=icu --icu-locale=en Regardless of the locale provider, the operating system is still used to provide some locale-aware behavior, such as messages (see ). The available locale providers are listed below: builtin The builtin provider uses built-in operations. Only the C and C.UTF-8 locales are supported for this provider. The C locale behavior is identical to the - C locale in the libc provider. When using this - locale, the behavior may depend on the database encoding. + C locale in the libc provider. When + using this locale, the behavior may depend on the database encoding. The C.UTF-8 locale is available only for when the database encoding is UTF-8, and the behavior is based on Unicode. The collation uses the code point values only. The regular expression character classes are based on the "POSIX Compatible" semantics, and the case mapping is the "simple" variant. icu The icu provider uses the external - ICUICU + ICU library. PostgreSQL must have been configured with support. ICU provides collation and character classification behavior that is independent of the operating system and database encoding, which is preferable if you expect to transition to other platforms without any change in results. LC_COLLATE and LC_CTYPE can be set independently of the ICU locale. For the ICU provider, results may depend on the version of the ICU library used, as it is updated to reflect changes in natural language over time. @@ -845,76 +845,77 @@ SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR"; separate collate and ctype settings, so they are always the same. Also, ICU collations are independent of the encoding, so there is always only one ICU collation of a given name in a database. Standard Collations On all platforms, the following collations are supported: unicode This SQL standard collation sorts using the Unicode Collation Algorithm with the Default Unicode Collation Element Table. It is available in all encodings. ICU support is re
Re: [DOCS] Add JSON to the list of acronyms in the documentation's appendix
On Thu, Jan 11, 2018 at 7:22 PM, Bruce Momjian wrote: > On Tue, Oct 24, 2017 at 08:18:49PM +, [email protected] wrote: >> The following documentation comment has been logged on the website: >> >> Page: https://www.postgresql.org/docs/10/static/acronyms.html >> Description: >> >> Hello all, >> >> I propose to add 'JSON' (for JavaScript Object Notation, >> http://json.org) to >> the list of acronyms in the documentation's appendix >> (https://www.postgresql.org/account/comments/new/10/acronyms.html/). > > Good idea. Patch applied --- it will appear in PG 11. Then, why not add JSONB ! > > -- > Bruce Momjian http://momjian.us > EnterpriseDB http://enterprisedb.com > > + As you are, so once was I. As I am, so you will be. + > + Ancient Roman grave inscription +
Re: Images in the official documentation
On Sat, Feb 24, 2018 at 4:04 AM, Peter Eisentraut wrote: > On 2/23/18 11:21, Tom Lane wrote: >> In the distant >> past, as I recall, we had a GIF or two; but we abandoned that on the >> grounds that it was unmaintainable and also incompatible with some >> documentation output formats. I'm not too sure what the state of >> play is on the latter point, now that we've switched to XML. > > The complications with the image formats in the past were mainly around > what ((pdf)jade)tex would accept. The tools have shifted a bit now, and > the zoo formats is a different one. Nothing that a few make rules > couldn't address, though, I think. > > The issue of how to manage the sources is still the same, though. SVG format is ascii based vector format. We made experimental pdf with pictures http://www.sai.msu.su/~megera/postgres/files/postgres-11-diagram.pdf (GIN AM diagram, Appendix L). Appendix L also demonstrates our sample database with step-by-step introduction to Postgres for beginners. We have a separate book for beginners, which we released under BSD license and it's available on russian/english languages. Our experience shows, that people really appreciate it. I hope we will have time at PGCon to discuss documentation somehow. > > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services >
Re: Images in the official documentation
On Mon, Feb 26, 2018 at 10:23 PM, Tom Lane wrote: > Craig Ringer writes: >> On 26 February 2018 at 12:16, Tom Lane wrote: >>> How can we resolve these issues? > >> Question the assumptions and requirements. Why do we actually _need_ >> diffable, mergeable images? Sure, it'd be *nice*, but what's the real world >> impact if we don't have it? > > Well, I'll tell you exactly why I'm being sticky about this: we've been > down this road before. We used to have some figures in .gif format, > and one of the problems with them was they were too hard to update. > I don't buy the "they won't need updates" argument for a second, either. > For instance, I recall that one of the images we had was a diagram of > the system catalog cross-references, and it was constantly out of date > because of the difficulty of updating it. > > Admittedly, this was 15+ years ago. Maybe the state of the art in > figure editors has advanced to the point where it won't be so hard. > But color me suspicious. In case you missed, a couple of years ago we discussed this on pgcon: Heikki's version: https://wiki.postgresql.org/wiki/Figures_%26_Pics_in_Docs Emre suggested to use Markdeep (BSD license), http://casual-effects.com/markdeep/ http://www.sai.msu.su/~megera/postgres/gin-ascii-v2.md.html It looks good for small diagrams, but will not work for complex stuff, such as pg_catalog structure. > > regards, tom lane >
Re: Dead link in ltree documentation
Postgres Professional: http://www.postgrespro.com The Russian Postgres Company On Wed, Apr 4, 2018 at 12:59 PM, PG Doc comments form wrote: > The following documentation comment has been logged on the website: > > Page: https://www.postgresql.org/docs/10/static/ltree.html > Description: > > Hi, > > https://www.postgresql.org/docs/current/static/ltree.html links to > www.dmoz.org which now returns a 403, since being closed down in 2017. > > Maybe it could link to the mirror https://dmoztools.net/ or the wikipedia > page instead. Attached is a small patch. ltree.sgml.patch Description: Binary data
Re: Dead link in ltree documentation
On Wed, Apr 4, 2018 at 8:17 PM, Alvaro Herrera wrote: > David G. Johnston wrote: > >> I'm not seeing the value in providing a link, especially one that we don't >> control, here. Futhermore, we could probably drop the whole "In >> practice..." sentence. But if not at least put a period after "limitation" >> and drop the example and link. > > +1 remove the sentence. Attached is a new patch, which removed the whole sentence with example link. > > -- > Álvaro Herrerahttps://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company ltree.sgml.patch Description: Binary data
document json[b] limitation
Hi there, Attached is a small patch, which documents the maximum size of json[b] types. Probably, it's worth to patch previous releases, where the types were introduced. Best regards, Oleg -- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company json.sgml.patch Description: Binary data
Re: document json[b] limitation
On Wed, Apr 25, 2018 at 2:12 AM, Tom Lane wrote: > Oleg Bartunov writes: >> Attached is a small patch, which documents the maximum size of >> json[b] types. Probably, it's worth to patch previous releases, where >> the types were introduced. > > If you said "maximum size is 1GB", period, I'd believe it ... although > I'm pretty sure that general limitation is already documented elsewhere. > I don't believe that it's possible to make a 256 Gb jsonb. How will > that fit in the varlena header? Oops, it should be 256 Mb :) > > regards, tom lane > -- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: document json[b] limitation
On Wed, Apr 25, 2018 at 6:50 PM, Oleg Bartunov wrote: > On Wed, Apr 25, 2018 at 2:12 AM, Tom Lane wrote: >> Oleg Bartunov writes: >>> Attached is a small patch, which documents the maximum size of >>> json[b] types. Probably, it's worth to patch previous releases, where >>> the types were introduced. >> >> If you said "maximum size is 1GB", period, I'd believe it ... although >> I'm pretty sure that general limitation is already documented elsewhere. >> I don't believe that it's possible to make a 256 Gb jsonb. How will >> that fit in the varlena header? > > Oops, it should be 256 Mb :) patch attached. > >> >> regards, tom lane >> > > > > -- > Postgres Professional: http://www.postgrespro.com > The Russian Postgres Company -- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company json.sgml.patch Description: Binary data
bloom documentation patch
Hi, Please, consider attached patch, which improves contrib/bloom documentation. Best regards, Oleg -- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company bloom.sgml.patch Description: Binary data
Re: bloom documentation patch
On Mon, Oct 15, 2018 at 12:48 AM Thomas Munro wrote: > > On Mon, Oct 15, 2018 at 10:15 AM Oleg Bartunov > wrote: > > Please, consider attached patch, which improves contrib/bloom documentation. > > Hello Oleg, I have no comment on the technical details but here is > some proof-reading of the English: > > + Length of each signature (index entry) in bits, it is rounded > up to the nearest > + multiple of 16. The default is 80 bits and maximum > is > > s/, it is/. It is/ > s/and maximum/and the maximum/ > > + Bloom AM doesn't supports unique indexes. > > s/supports/support/ > > + Bloom AM doesn't supports NULL values. > > s/supports/support/ > Thanks, Thomas, new patch attached. > -- > Thomas Munro > http://www.enterprisedb.com > -- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company bloom.sgml.patch Description: Binary data
Re: remove deprecated @@@ operator ?
On Sun, Oct 21, 2018 at 11:24 PM Tom Lane wrote: > > Oleg Bartunov writes: > > The commit 9b5c8d45f62bd3d243a40cc84deb93893f2f5122 is now 10+ years > > old, may be we could remove deprecated @@@ operator ? > > Is it actually causing any problem? AFAICS it's just a couple extra > pg_operator entries, so why not leave it? > > I'd be +1 for removing it from the docs, though ... attached a tiny patch for docs > > regards, tom lane -- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company func.sgml.patch Description: Binary data
Re: First SVG graphic
On Wed, Nov 28, 2018 at 8:33 PM Jürgen Purtz wrote: > > After one week no response at all? Neither positive nor negative. It seems > that the community has little interest in the SVG issue. Or in my suggestion? First of all, I am BIG + for having diagrams in our documentation. I once estimated the number of diagrams in our official documentation and it was only 50 or so, that means, it is possible to make them more or less centralized, at least for the initial version. If Jurgen+ agree to work on this I would be happy to help them in the parts I was working on. For the initial version we could even provide the generated images along with SVG-source files. > > Jürgen Purtz > > -- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: Return codes for archive and restore commands
On Thu, Nov 29, 2018 at 5:40 AM Stephen Frost wrote: > > Greetings, > > * Michael Paquier ([email protected]) wrote: > > On Wed, Nov 28, 2018 at 11:00:31AM +, PG Doc comments form wrote: > > > For the archive command: > > > <=128 There are not errors in the PostgreSQL log (messages with severity > > > equal or higher than ERROR). Firstly 3 messages of type LOG about fault, > > > then WARNING about this and pause for 1 minute, then repeated. > > > >=129 FATAL error in the PostgeSQL log. The message about stoping an > > > >archive > > > process, but not the database. Repeated after roughly 16 seconds. > > > > This code is around for some time, and comes from this commit: > > commit: 3ad0728c817bf8abd2c76bd11d856967509b307c > > author: Tom Lane > > date: Tue, 21 Nov 2006 20:59:53 + > > committer: Tom Lane > > date: Tue, 21 Nov 2006 20:59:53 + > > On systems that have setsid(2) (which should be just about everything except > > Windows), arrange for each postmaster child process to be its own process > > group leader, and deliver signals SIGINT, SIGTERM, SIGQUIT to the whole > > process group not only the direct child process. This provides saner > > behavior > > for archive and recovery scripts; in particular, it's possible to shut down > > a > > warm-standby recovery server using "pg_ctl stop -m immediate", since > > delivery > > of SIGQUIT to the startup subprocess will result in killing the waiting > > recovery_command. Also, this makes Query Cancel and statement_timeout apply > > to scripts being run from backends via system(). (There is no support in > > the > > core backend for that, but it's widely done using untrusted PLs.) Per gripe > > from Stephen Harris and subsequent discussion. > > > > The relevant part if pgarch_archiveXlog() in pgarch.c, and this part > > is most relevant: > > * Per the Single Unix Spec, shells report exit status > 128 when a > > * called command died on a signal. > > > > > In this case PostgreSQL tries confirm rules for return codes of a unix > > > shell. A unix shell return 126 in the case of "command not executable", > > > 127 > > > in the case "command not found", 128+# of signal in the case if > > > application > > > interrupted by uncatched signal. > > > > If you were to rewrite those paragraphs or make them more precise, how > > would you actually shape your suggestions? I personally quite like the > > current formulations, but I am rather used to it to be honest. > > This is another example, at least imv, of why we really need to move > away from archive_command as an interface for doing WAL archiving. +1 > > Having discussed this quite a bit lately with David Steele and Magnus, > it's pretty clear that we need to completely rip out how this works > today and rewrite it based around an extension model where a background > worker can start up and essentially take the place of the archiver > process, with flexibility to jump forward through the WAL stream, > communicate clearly with other processes, handle failure to do so > gracefully based on the specific cases, etc. > > We could then possibly write an extension to be included that mimics > what archive_command does today, but imv we should immediately consider > it deprecated and encourage people to move off of it. > > Thanks! > > Stephen -- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
SQL-2016 in docs
I noticed that in our docs for PG12 there is no SQL-2016, but we actually have JSON Path implementation committed, which is a part of SQL-2016 standard. One missing feature - is datetime support. Peter, will you add this or I prepare the patch ? Oleg -- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: SQL-2016 in docs
On Mon, May 27, 2019 at 2:33 PM Peter Eisentraut wrote: > > On 2019-05-12 10:14, Oleg Bartunov wrote: > > I noticed that in our docs for PG12 there is no SQL-2016, but we actually > > have JSON Path implementation committed, which is a part of SQL-2016 > > standard. One missing feature - is datetime support. Peter, will you > > add this or I prepare the patch ? > > I did a rough check of the SQL:2016 JSON path specification versus our > regression tests, and came up with the attached supported feature list. > Would you like to confirm it? I confirm it. > > -- > Peter Eisentraut http://www.2ndQuadrant.com/ > PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: TOC: List of Figures
On 2 Jul 2019, at 11:13, Jürgen Purtz wrote: After the integration of figures into the documentation it may be helpful to extent the TOC with a 'List of Figures'. Any opinion? If yes: The same for 'List of Tables' and 'List of Examples'? There is a simple way to enable this feature: change line 56 of stylesheet-html-common.xsl to: "book toc,title,figure,table,example". As shown in a previous thread this leads to an ugly swelling of the TOC similar to the formerly handling of release notes - especially for tables and examples -, see attachment 1. +1 The alternative is a downshift of the postings by one level, see attachment 2. How to realize this behavior is shown in attachment 3.
Re: Documentation improvement patch
Thank you, Daniel. -- Oleg Sibiryakov On 02.10.2024 15:58, Daniel Gustafsson wrote: On 2 Oct 2024, at 10:09, Oleg Sibiryakov wrote: Thank you for your kind feedback! I will take due note of the comments in the next documentation patches as well. I have made all the changes as per your feedback and also corrected paragraph reflow. The third version of the patch is attached for your consideration. Thanks, I have gone over and applied most of these changes. I did leave out a few (like the libc one) where the current page had multiple different versions. -- Daniel Gustafsson
Re: Documentation improvement patch
Thank you for your kind feedback! I will take due note of the comments in the next documentation patches as well. I have made all the changes as per your feedback and also corrected paragraph reflow. The third version of the patch is attached for your consideration. -- Oleg Sibiryakov On 01.10.2024 11:59, Daniel Gustafsson wrote: On 1 Oct 2024, at 10:04, Oleg Sibiryakov wrote: Here is a kind reminder of a small documentation improvement patch, which we started discussing a month ago. I removed all the controversial points touched upon in this thread. Please, take a look once again at your convenience. In general, when submitting a docs patch it's better to not reflow the paragraphs when a modified line becomes too long. Reading a 4 line diff where only one thing changed in the first becomes harder than reading a single line diff where the line is long. The committer can ensure the lines are reflowed prior to a commit, or it can be left as the final revision of a patch submission once all changes are discussed- A few comments on this version of the patch: - ICUICU + ICU I don't think removing the name of the library changing the sentence from "The icu provider uses the external ICU library" to "The icu provider uses the external library" is an improvement. - by sequences. (See .) The properties + by sequences (see ). The properties This is a common construction in our docs, if it's considered to be a bad practice the case should be argued (separately) for removing all of them instead. - Comma separated list of publication names for which to subscribe + Comma-separated list of publication names for which to subscribe There are two more cases of "comma separated" (config.sgml and copy.sgml), should they be changed too? - the failover if required, enable the subscription, and refresh the - subscription. See + the failover if required, enable the subscription, + and refresh the subscription. See This refers to the act of failing over, not the property value failover, and should not be in . -for not-null constraints at all, so they are not +for NOT NULL constraints at all, so they are not I'm still not convinced that this change makes the documentation more readable. - the MERGE command will perform a FULL - join between data_source - and the target table. For this to work, at least one + the MERGE command will perform a + FULL JOIN between + data_source and the target + table. For this to work, at least one This paragraph discuss various join types, keeping it lowercase "join" matches the remainder of the paragraph and makes it more readable IMHO. It's not discussing syntax the user is expected to type so need to make it so. -- Daniel Gustafsson diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index bfb97865e18..964c819a02d 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -8024,41 +8024,41 @@ SCRAM-SHA-256$<iteration count>:&l for authentication subrunasowner bool If true, the subscription will be run with the permissions of the subscription owner subfailover bool If true, the associated replication slots (i.e. the main slot and the - table sync slots) in the upstream database are enabled to be + table synchronization slots) in the upstream database are enabled to be synchronized to the standbys subconninfo text Connection string to the upstream database subslotname name Name of the replication slot in the upstream database (also used for the local replication origin name); diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index 834cb30c85a..dbbf7fc3726 100644 --- a/doc/src/sgml/charset.sgml +++ b/doc/src/sgml/charset.sgml @@ -365,41 +365,41 @@ initdb --locale-provider=icu --icu-locale=en Regardless of the locale provider, the operating system is still used to provide some locale-aware behavior, such as messages (see ). The available locale providers are listed below: builtin The builtin provider uses built-in operations. Only the C and C.UTF-8 locales are supported for this provider. The C locale behavior is identical to the - C locale in the libc provider. When using this + C locale in the libc provider. When using this locale, the behavior may depend on the database encoding. The C.UTF-8 locale is available only for when the database encoding is UTF-8, and
Re: Documentation improvement patch
Dear all, Here is a kind reminder of a small documentation improvement patch, which we started discussing a month ago. I removed all the controversial points touched upon in this thread. Please, take a look once again at your convenience. The patch shall be applied to the master branch this time. -- Regards, Oleg Sibiryakov Technical Writer Postgres Professional, The Russian Postgres Company https://postgrespro.ru On 13.09.2024 13:50, Oleg Sibiryakov wrote: Here is a patch without the builtin/built-in corrections (find attached). But I still believe the issue should be discussed further. We actually have two options: it is either a spelling mistake (since built-in should written with a hyphen), or we miss the tag (since it is actually also a value). So I do think we cannot really leave it as is. diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index bfb97865e18..964c819a02d 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -8024,41 +8024,41 @@ SCRAM-SHA-256$<iteration count>:&l for authentication subrunasowner bool If true, the subscription will be run with the permissions of the subscription owner subfailover bool If true, the associated replication slots (i.e. the main slot and the - table sync slots) in the upstream database are enabled to be + table synchronization slots) in the upstream database are enabled to be synchronized to the standbys subconninfo text Connection string to the upstream database subslotname name Name of the replication slot in the upstream database (also used for the local replication origin name); diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml index 834cb30c85a..a76e9579a14 100644 --- a/doc/src/sgml/charset.sgml +++ b/doc/src/sgml/charset.sgml @@ -365,59 +365,59 @@ initdb --locale-provider=icu --icu-locale=en Regardless of the locale provider, the operating system is still used to provide some locale-aware behavior, such as messages (see ). The available locale providers are listed below: builtin The builtin provider uses built-in operations. Only the C and C.UTF-8 locales are supported for this provider. The C locale behavior is identical to the - C locale in the libc provider. When using this - locale, the behavior may depend on the database encoding. + C locale in the libc provider. When + using this locale, the behavior may depend on the database encoding. The C.UTF-8 locale is available only for when the database encoding is UTF-8, and the behavior is based on Unicode. The collation uses the code point values only. The regular expression character classes are based on the "POSIX Compatible" semantics, and the case mapping is the "simple" variant. icu The icu provider uses the external - ICUICU + ICU library. PostgreSQL must have been configured with support. ICU provides collation and character classification behavior that is independent of the operating system and database encoding, which is preferable if you expect to transition to other platforms without any change in results. LC_COLLATE and LC_CTYPE can be set independently of the ICU locale. For the ICU provider, results may depend on the version of the ICU library used, as it is updated to reflect changes in natural language over time. @@ -845,76 +845,77 @@ SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR"; separate collate and ctype settings, so they are always the same. Also, ICU collations are independent of the encoding, so there is always only one ICU collation of a given name in a database. Standard Collations On all platforms, the following collations are supported: unicode This SQL standard collation sorts using the Unicode Collation Algorithm with the Default Unicode Collation Element Table. It is available in all encodings. ICU support is required to use this -collation, and behavior may change if Postgres is built with a -different version of ICU. (This collation has the same behavior as +collation, and behavior may change if +PostgreSQL is built with a different version +of ICU. (This collation has the same behav
Initcap works differently with different locale providers
Greetings, everyone!
One of our clients has found a difference in behaviour of initcap
function when
using different locale providers, shown below
postgres=# create database test_db_1 locale_provider=icu
locale="ru_RU.UTF-8" template=template0;
NOTICE: using standard form "ru-RU" for ICU locale "ru_RU.UTF-8"
CREATE DATABASE
postgres=# \c test_db_1;
You are now connected to database "test_db_1" as user "postgres".
test_db_1=# select initcap('ЧиЮ А.Ю.');
initcap
--
Чию А.ю.
(1 row)
test_db_1=# select initcap('joHn d.e.');
initcap
---
John D.e.
(1 row)
postgres=# create database test_db_2 locale_provider=libc
locale="ru_RU.UTF-8" template=template0;
CREATE DATABASE
postgres=# \c test_db_2
You are now connected to database "test_db_2" as user "postgres".
test_db_2=# select initcap('ЧиЮ А.Ю.');
initcap
--
Чию А.Ю.
(1 row)
test_db_2=# select initcap('joHn d.e.');
initcap
---
John D.E.
(1 row)
And an easier reproduction (should work for REL_12_STABLE and up)
postgres=# SELECT initcap('first.second' COLLATE "en-x-icu");
initcap
--
First.second
(1 row)
postgres=# SELECT initcap('first.second' COLLATE "en_US");
initcap
--
First.Second
(1 row)
This behaviour is reproducible on REL_12_STABLE and up to master
I don't believe that this is an erroneous behaviour, just a differing
one, hence
just a documentation change proposition
I suggest adding a clarification that this function works differently
with libc
and ICU providers because there is a difference in what a "word" is
between them
In libc a word is a sequence of alphanumeric characters, separated by
non-alphanumeric characters (as it is written in documentation right
now)
In ICU words are divided according to Unicode® Standard Annex #29 [1]
Similar issue was briefly discussed in [2]
The suggested documentation patch is attached (versions for
REL_13_STABLE+ and
for REL_12_STABLE only)
[1]: https://www.unicode.org/reports/tr29/#Word_Boundaries
[2]:
https://www.postgresql.org/message-id/CAEwbS1R8pwhRkwRo3XsPt24ErBNtFWuReAZhVPJwA3oqo148tA%40mail.gmail.com
Oleg Tselebrovskiy, Postgres Professionaldiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1bde4091ca6..3ce5ad1d1f1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3100,8 +3100,11 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
Converts the first letter of each word to upper case and the
-rest to lower case. Words are sequences of alphanumeric
-characters separated by non-alphanumeric characters.
+rest to lower case. When using the libc locale
+provider, words are sequences of alphanumeric characters separated
+by non-alphanumeric characters; when using the ICU locale provider,
+words are separated according to
+https://www.unicode.org/reports/tr29/#Word_Boundaries";>Unicode® Standard Annex #29.
initcap('hi THOMAS')
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 487bb103637..1cd281dd90b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1932,8 +1932,11 @@
text
Convert the first letter of each word to upper case and the
-rest to lower case. Words are sequences of alphanumeric
-characters separated by non-alphanumeric characters.
+rest to lower case. When using the libc locale
+provider, words are sequences of alphanumeric characters separated
+ by non-alphanumeric characters; when using the ICU locale provider,
+ words are separated according to
+ https://www.unicode.org/reports/tr29/#Word_Boundaries";>Unicode® Standard Annex #29.
initcap('hi THOMAS')
Hi Thomas
Re: Initcap works differently with different locale providers
Alexander Korotkov wrote at 2025-07-28 17:23:
On Mon, Jul 28, 2025 at 1:20 PM Alexander Korotkov
wrote:
On 25 Sep 2024, at 18:13, Oleg Tselebrovskiy
wrote:
Greetings, everyone!
One of our clients has found a difference in behaviour of initcap
function when
using different locale providers, shown below
postgres=# create database test_db_1 locale_provider=icu
locale="ru_RU.UTF-8" template=template0;
NOTICE: using standard form "ru-RU" for ICU locale "ru_RU.UTF-8"
CREATE DATABASE
postgres=# \c test_db_1;
You are now connected to database "test_db_1" as user "postgres".
test_db_1=# select initcap('ЧиЮ А.Ю.');
initcap
--
Чию А.ю.
(1 row)
test_db_1=# select initcap('joHn d.e.');
initcap
---
John D.e.
(1 row)
postgres=# create database test_db_2 locale_provider=libc
locale="ru_RU.UTF-8" template=template0;
CREATE DATABASE
postgres=# \c test_db_2
You are now connected to database "test_db_2" as user "postgres".
test_db_2=# select initcap('ЧиЮ А.Ю.');
initcap
--
Чию А.Ю.
(1 row)
test_db_2=# select initcap('joHn d.e.');
initcap
---
John D.E.
(1 row)
And an easier reproduction (should work for REL_12_STABLE and up)
postgres=# SELECT initcap('first.second' COLLATE "en-x-icu");
initcap
--
First.second
(1 row)
postgres=# SELECT initcap('first.second' COLLATE "en_US");
initcap
--
First.Second
(1 row)
This behaviour is reproducible on REL_12_STABLE and up to master
I don't believe that this is an erroneous behaviour, just a differing
one, hence
just a documentation change proposition
I suggest adding a clarification that this function works differently
with libc
and ICU providers because there is a difference in what a "word" is
between them
In libc a word is a sequence of alphanumeric characters, separated by
non-alphanumeric characters (as it is written in documentation right
now)
In ICU words are divided according to Unicode® Standard Annex #29 [1]
Similar issue was briefly discussed in [2]
The suggested documentation patch is attached (versions for
REL_13_STABLE+ and
for REL_12_STABLE only)
[1]: https://www.unicode.org/reports/tr29/#Word_Boundaries
[2]:
https://www.postgresql.org/message-id/CAEwbS1R8pwhRkwRo3XsPt24ErBNtFWuReAZhVPJwA3oqo148tA%40mail.gmail.com
Oleg Tselebrovskiy, Postgres
Professional
I can confirm inicap works with libc and libicu as you stated. The
documentation patch looks good to me. I’ve written a commit message.
The REL_12_STABLE branch is not relevant anymore as it’s out of
support. I’m going to push this if no objections.
I'm sorry for these many messages. My email client just gone crazy.
Must be fixed now.
--
Regards,
Alexander Korotkov
Supabase
Commit message looks good to me, also no objections on ignoring
REL_12_STABLE :)
Thank you!
Regards, Oleg Tselebrovskiy
Re: Initcap works differently with different locale providers
Jeff Davis wrote at 2025-07-31 02:58:
Apologies for the late answer to a review
First, it doesn't mention the "builtin" provider, which uses the same
word break rules as libc.
Completely forgot about builtin provider in the first patch, my bad
Second, word boundaries can be complex, and I'm wondering if we should
not be so precise about what ICU does or doesn't do. For instance, ICU
has options like U_TITLECASE_ADJUST_TO_CASED,
U_TITLECASE_NO_BREAK_ADJUSTMENT, etc., and I'm not sure exactly
which one of those we use.
While [1] describes the default word boundary rules and could be useful
as a starting point, I agree that in reality it probably is more
complicated. I didn't exactly find any place where
U_TITLECASE_ADJUST_TO_CASED and alike are set in non-test code, but
U_TITLECASE_ADJUST_TO_CASED was used as a default prior to ICU 60,
so initcap() will also behave differently depending on ICU version
I'd prefer that we try to explain that INITCAP() is intended for
convenient display, and the specific result should not be relied upon
(at least for ICU; maybe for all providers). If you want specific word
boundary rules, write your own function.
First patch just adds this warning about not relying on initcap() exact
result. The second one is the same, but removes the part "what is a
word"
since it's could be moot because we recommend writing custom functions,
so understanding what is a word is not exactly needed. Still on the
fence
about which patch is better, though
Thoughts?
[1]: https://www.unicode.org/reports/tr29/#Word_Boundaries
Regards, Oleg Tselebrovskiydiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 74a16af04ad..8a44e0ae593 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3148,12 +3148,19 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
Converts the first letter of each word to upper case and the
-rest to lower case. When using the libc locale
-provider, words are sequences of alphanumeric characters separated
-by non-alphanumeric characters; when using the ICU locale provider,
-words are separated according to
+rest to lower case. When using the libc or
+ builtin locale provider, words are sequences
+of alphanumeric characters separated by non-alphanumeric characters;
+when using the ICU locale provider, words are separated according to
https://www.unicode.org/reports/tr29/#Word_Boundaries";>Unicode Standard Annex #29.
+
+This function is primarily used for convenient
+display, and the specific result should not be relied upon because of
+the differences between locale providers and between different
+ICU versions. If specific word boundary rules are desired,
+it is recomended to write a custom function.
+
initcap('hi THOMAS')
Hi Thomas
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 74a16af04ad..c071d6df366 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3148,11 +3148,14 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
Converts the first letter of each word to upper case and the
-rest to lower case. When using the libc locale
-provider, words are sequences of alphanumeric characters separated
-by non-alphanumeric characters; when using the ICU locale provider,
-words are separated according to
-https://www.unicode.org/reports/tr29/#Word_Boundaries";>Unicode Standard Annex #29.
+rest to lower case.
+
+
+This function is primarily used for convenient
+display, and the specific result should not be relied upon because of
+the differences between locale providers and between different
+ICU versions. If specific word boundary rules are desired,
+it is recomended to write a custom function.
initcap('hi THOMAS')
Re: Initcap works differently with different locale providers
Jeff Davis wrote at 2025-08-05 03:59:
One more thing: we should also change it to "... to upper case (or
title case) and the rest to lower case...". Title case is for scripts
that have characters like 'Dž' (U+01C5).
Done based upon second version of previous patch. Again, there are two
versions - the first one has a mention of digraphs, like 'Dž' (U+01C5),
and the second one doesn't. And again, don't know which version is
better - title case without mentioning digraphs could be interpreted
as "don't capitalise articles and prepositions" or just "don't
capitalize articles", since the definition of "title case" is vague.
We have a "write your own function" clause, but still.
Maybe we should add an example of a digraph to the first patch to
make it more clear, if we go that path.diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 74a16af04ad..b32ec6e2cea 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3147,12 +3147,15 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
text
-Converts the first letter of each word to upper case and the
-rest to lower case. When using the libc locale
-provider, words are sequences of alphanumeric characters separated
-by non-alphanumeric characters; when using the ICU locale provider,
-words are separated according to
-https://www.unicode.org/reports/tr29/#Word_Boundaries";>Unicode Standard Annex #29.
+Converts the first letter of each word to upper case (or title case
+if the letter is a digraph) and the rest to lower case.
+
+
+This function is primarily used for convenient
+display, and the specific result should not be relied upon because of
+the differences between locale providers and between different
+ICU versions. If specific word boundary rules are desired,
+it is recomended to write a custom function.
initcap('hi THOMAS')
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 74a16af04ad..f799b34dca7 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3147,12 +3147,15 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
text
-Converts the first letter of each word to upper case and the
-rest to lower case. When using the libc locale
-provider, words are sequences of alphanumeric characters separated
-by non-alphanumeric characters; when using the ICU locale provider,
-words are separated according to
-https://www.unicode.org/reports/tr29/#Word_Boundaries";>Unicode Standard Annex #29.
+Converts the first letter of each word to upper case (or title case)
+and the rest to lower case.
+
+
+This function is primarily used for convenient
+display, and the specific result should not be relied upon because of
+the differences between locale providers and between different
+ICU versions. If specific word boundary rules are desired,
+it is recomended to write a custom function.
initcap('hi THOMAS')
