Re: Documentation improvement patch

2025-10-18 Thread Oleg

Thank you for your feedback, Daniel.

My thoughts are below:

/- Change the definition of a replication slot. + Changes the definition 
of a replication slot. Reading this page it seems we are mixing tense in 
many places, some say "Change the" and "Read some" and elsewhere we use 
"Drops the". Maybe a more holistic approach would be better for this 
page to improve consistency? /I agree, let's add "s" in all cases for the sake of consistency.


/- Not enabled by default because it is resource intensive. + Not 
enabled by default because it is resource-intensive. We use both 
spellings in multiple places, shouldn't all be changed?/


Agreed, changing all instances to "resource-intensive".

/- COPY and other file-access functions. + the 
COPY command and file-access functions. ... - 
COPY and other file-access functions. + the 
COPY command and file-access functions. ... - 
COPY and other functions which allow executing a + 
the COPY command and functions, which allow executing 
a I'm not sure about these, I think we use COPY without the the "the 
COPY command" decoration in many places so I think it's more consistent 
like this. /I actually think we should add the decoration here because "COPY and other file-access functions"

sounds a bit confusing since COPY is not a file-access function and we seem to 
put it in the list. Even though I
agree that everybody knows COPY is a command, not a function.

/- to call functions defined in the standard internal library, by using 
an + to call functions defined in the standard internal function library 
by using an interface similar to their SQL signature. Isn't it a bit 
redundant to say "internal function library" when we are already talking 
about function definitions?/


I agree that it may seem redundant, I added "function" here for the sake of 
consistency with lines 1829/1830 (if applied to the master branch)
where the documentation mentions "standard internal*function* library".

Please, let me know what you think of the last two points for me to send the 
updated patch.

--
Oleg Sibiryakov

On 10.10.2025 10:15, Daniel Gustafsson wrote:

On 10 Sep 2025, at 09:54, Oleg wrote:

Dear all,
I have prepared a patch containing some minor inconsistencies in the 
documentation. Please, take a look.
I will be looking forward to your feedback.

Thanks for the patch, while most of these are obvious improvements I have a few
comments on some:


-   Change the definition of a replication slot.
+   Changes the definition of a replication slot.
Reading this page it seems we are mixing tense in many places, some say "Change
the" and "Read some" and elsewhere we use "Drops the".  Maybe a more holistic
approach would be better for this page to improve consistency?


-   Not enabled by default because it is resource intensive.
+   Not enabled by default because it is resource-intensive.
We use both spellings in multiple places, shouldn't all be changed?


-   COPY and other file-access functions.
+   the COPY command and file-access functions.
 ...
-   COPY and other file-access functions.
+   the COPY command and file-access functions.
 ...
-   COPY and other functions which allow executing a
+   the COPY command and functions, which allow 
executing a
I'm not sure about these, I think we use COPY without the the "the COPY
command" decoration in many places so I think it's more consistent like this.


- to call functions defined in the standard internal library, by using an
+ to call functions defined in the standard internal function library by 
using an
   interface similar to their SQL signature.
Isn't it a bit redundant to say "internal function library" when we are already
talking about function definitions?


The patch shall be applied to the REL_18_STABLE branch.

As you mentioned downthread, this is also for master.  Our workflow is to
always apply to master and backpatch from there.

--
Daniel Gustafsson




Documentation improvement patch

2025-09-10 Thread Oleg

Dear all,

I have prepared a patch containing some minor inconsistencies in the 
documentation. Please, take a look.


I will be looking forward to your feedback.

The patch shall be applied to the REL_18_STABLE branch.

--
Regards,
Oleg Sibiryakov
Technical Writer
Postgres Professional, The Russian Postgres Company
https://postgrespro.ru
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index d1e103ed779..e589a8d6884 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1234,7 +1234,7 @@ include_dir 'conf.d'
   
   

-The library/libraries to use for validating OAuth connection tokens. If
+Sets the library/libraries to use for validating OAuth connection tokens. If
 only one validator library is provided, it will be used by default for
 any OAuth connections; otherwise, all
 oauth HBA entries
@@ -1400,7 +1400,7 @@ include_dir 'conf.d'

 Specifies a list of cipher suites that are allowed by connections using
 TLS version 1.3.  Multiple cipher suites can be
-specified by using a colon separated list. If left blank, the default
+specified by using a colon-separated list. If left blank, the default
 set of cipher suites in OpenSSL will be used.

 
@@ -2432,7 +2432,7 @@ include_dir 'conf.d'
   
   

-Sets the maximum number of open files each server subprocess is
+Sets the maximum number of files each server subprocess is
 allowed to open simultaneously;  files already opened in the
 postmaster are not counted toward this limit. The default is one
 thousand files.
diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 781a01067f7..9b032fbf675 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -1226,7 +1226,7 @@ postgres=# SELECT postgres_fdw_disconnect_all();
 PostgresFdwCleanupResult
 
  
-  Waiting for transaction abort on remote server.
+  Waiting for transaction abort on a remote server.
  
 

diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index af476c82fcc..2101442c90f 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -49,7 +49,7 @@ break is not needed in a wider output rendering.

 

-After you have successfully completed this tutorial you will want to
+After you have successfully completed this tutorial, you will want to
 read the  section to gain a better understanding
 of the SQL language, or  for
 information about developing applications with
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index d336ee38f58..80eadfc0e1a 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -1636,7 +1636,7 @@ SELCT 1/0;

  Likewise the server expects the client to not begin
  the SSL negotiation until it receives the server's
- single byte response to the SSL request.  If the
+ single-byte response to the SSL request.  If the
  client begins the SSL negotiation immediately without
  waiting for the server response to be received it can reduce connection
  latency by one round-trip.  However this comes at the cost of not being
@@ -2394,7 +2394,7 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;"
  
  
   
-   Change the definition of a replication slot.
+   Changes the definition of a replication slot.
See  for more about
replication slots. This command is currently only supported for logical
replication slots.
diff --git a/doc/src/sgml/ref/pg_recvlogical.sgml b/doc/src/sgml/ref/pg_recvlogical.sgml
index 263ebdeeab4..2a0de0cfb63 100644
--- a/doc/src/sgml/ref/pg_recvlogical.sgml
+++ b/doc/src/sgml/ref/pg_recvlogical.sgml
@@ -84,7 +84,7 @@ PostgreSQL documentation

 

-The --slot and --dbname are required
+The --slot and --dbname options are required
 for this action.

 
@@ -104,7 +104,7 @@ PostgreSQL documentation

 

-The --slot is required for this action.
+The --slot option is required for this action.

   
  
@@ -121,8 +121,8 @@ PostgreSQL documentation

 

-The --slot and --dbname,
---file are required for this action.
+The --slot, --dbname, and
+--file options are required for this action.

 

diff --git a/doc/src/sgml/ref/pgbench.sgml b/doc/src/sgml/ref/pgbench.sgml
index ab252d9fc74..e2b2a0ea26f 100644
--- a/doc/src/sgml/ref/pgbench.sgml
+++ b/doc/src/sgml/ref/pgbench.sgml
@@ -2826,7 +2826,7 @@ statement latencies in milliseconds, failures and retries:
  start a connection to the database server / the socket for connecting
  the client to the database server has become invalid). 

Re: Documentation improvement patch

2025-10-09 Thread Oleg

Dear PostgreSQL Community,

This is a kind reminder regarding my documentation patch submitted a 
month ago.


I am still very interested in contributing these improvements and would 
be grateful for a review when time permits.


The patch can be also applied to the master branch.

Thank you for your consideration.

--
Regards,
Oleg Sibiryakov
Technical Writer
Postgres Professional, The Russian Postgres Company
https://postgrespro.ru

On 10.09.2025 10:54, Oleg wrote:


Dear all,

I have prepared a patch containing some minor inconsistencies in the 
documentation. Please, take a look.


I will be looking forward to your feedback.

The patch shall be applied to the REL_18_STABLE branch.

--
Regards,
Oleg Sibiryakov
Technical Writer
Postgres Professional, The Russian Postgres Company
https://postgrespro.ru


Re: Documentation improvement patch

2025-10-25 Thread Oleg

Dear Daniel,

Thank you for your prompt feedback.

Attached, please find the updated documentation patch, which 
incorporates your suggestions from both the first and second rounds of 
review.


--
Oleg Sibiryakov

On 22.10.2025 11:02, Daniel Gustafsson wrote:

On 13 Oct 2025, at 12:51, Oleg wrote:
- COPY and other functions which allow executing a
+ the COPY command and functions, which allow executing a
I'm not sure about these, I think we use COPY without the the "the COPY
command" decoration in many places so I think it's more consistent like this.

I actually think we should add the decoration here because "COPY 
and other file-access functions"
sounds a bit confusing since COPY is not a file-access function and we seem to 
put it in the list. Even though I
agree that everybody knows COPY is a command, not a function.

We refer to SQL commands by just their names all over the documentation without
saying "an EXPLAIN command" etc, and I think this falls in that same category.


- to call functions defined in the standard internal library, by using an
+ to call functions defined in the standard internal function library by using 
an
interface similar to their SQL signature.
Isn't it a bit redundant to say "internal function library" when we are already
talking about function definitions?

I agree that it may seem redundant, I added "function" here for the sake of 
consistency with lines 1829/1830 (if applied to the master branch)
where the documentation mentions "standard internal function library".

I hadn't seen that, but with that in mind I agree that being consistent is good
so I'll withdraw that comment.

--
Daniel Gustafsson


diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 0a2a8b49fdb..71c2bbf7615 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -1232,11 +1232,11 @@ include_dir 'conf.d'
oauth_validator_libraries configuration parameter
   
   
   

-The library/libraries to use for validating OAuth connection tokens. If
+Sets the library/libraries to use for validating OAuth connection tokens. If
 only one validator library is provided, it will be used by default for
 any OAuth connections; otherwise, all
 oauth HBA entries
 must explicitly set a validator chosen from this
 list. If set to an empty string (the default), OAuth connections will be
@@ -1398,11 +1398,11 @@ include_dir 'conf.d'
   
   

 Specifies a list of cipher suites that are allowed by connections using
 TLS version 1.3.  Multiple cipher suites can be
-specified by using a colon separated list. If left blank, the default
+specified by using a colon-separated list. If left blank, the default
 set of cipher suites in OpenSSL will be used.

 

 This parameter can only be set in the
@@ -2430,11 +2430,11 @@ include_dir 'conf.d'
max_files_per_process configuration parameter
   
   
   

-Sets the maximum number of open files each server subprocess is
+Sets the maximum number of files each server subprocess is
 allowed to open simultaneously;  files already opened in the
 postmaster are not counted toward this limit. The default is one
 thousand files.


diff --git a/doc/src/sgml/installation.sgml b/doc/src/sgml/installation.sgml
index 593202f4fb2..fe8d73e1f8c 100644
--- a/doc/src/sgml/installation.sgml
+++ b/doc/src/sgml/installation.sgml
@@ -3168,11 +3168,11 @@ ninja install
   -DPG_TEST_EXTRA=TEST_SUITES
   

 Enable additional test suites, which are not run by default because
 they are not secure to run on a multiuser system, require special
-software to run, or are resource intensive.  The argument is a
+software to run, or are resource-intensive.  The argument is a
 whitespace-separated list of tests to enable. See
  for details. If the
 PG_TEST_EXTRA environment variable is set when the
 tests are run, it overrides this setup-time option.

diff --git a/doc/src/sgml/postgres-fdw.sgml b/doc/src/sgml/postgres-fdw.sgml
index 781a01067f7..9b032fbf675 100644
--- a/doc/src/sgml/postgres-fdw.sgml
+++ b/doc/src/sgml/postgres-fdw.sgml
@@ -1224,11 +1224,11 @@ postgres=# SELECT postgres_fdw_disconnect_all();
   

 PostgresFdwCleanupResult
 
  
-  Waiting for transaction abort on remote server.
+  Waiting for transaction abort on a remote server.
  
 

 

diff --git a/doc/src/sgml/postgres.sgml b/doc/src/sgml/postgres.sgml
index af476c82fcc..2101442c90f 100644
--- a/doc/src/sgml/postgres.sgml
+++ b/doc/src/sgml/postgres.sgml
@@ -47,11 +47,11 @@ break is not needed in a wider output rendering.
 important

Re: Documentation improvement patch

2025-10-21 Thread Oleg

Dear Daniel,

Could you please provide your feedback on the last two points?
Once I have it, I will send the updated patch immediately to finalize 
the improvements.


Thank you,
Oleg

On 13.10.2025 13:51, Oleg wrote:

Thank you for your feedback, Daniel.
My thoughts are below:
/- Change the definition of a replication slot. + Changes the 
definition of a replication slot. Reading this page it seems we are 
mixing tense in many places, some say "Change the" and "Read some" and 
elsewhere we use "Drops the". Maybe a more holistic approach would be 
better for this page to improve consistency? /I agree, let's add "s" in all cases for the sake of consistency.


/- Not enabled by default because it is resource intensive. + Not 
enabled by default because it is resource-intensive. We use both 
spellings in multiple places, shouldn't all be changed?/


Agreed, changing all instances to "resource-intensive".

/- COPY and other file-access functions. + the 
COPY command and file-access functions. ... - 
COPY and other file-access functions. + the 
COPY command and file-access functions. ... - 
COPY and other functions which allow executing a + 
the COPY command and functions, which allow 
executing a I'm not sure about these, I think we use COPY without the 
the "the COPY command" decoration in many places so I think it's more 
consistent like this. /I actually think we should add the decoration here because "COPY and other file-access functions"

sounds a bit confusing since COPY is not a file-access function and we seem to 
put it in the list. Even though I
agree that everybody knows COPY is a command, not a function.

/- to call functions defined in the standard internal library, by 
using an + to call functions defined in the standard internal function 
library by using an interface similar to their SQL signature. Isn't it 
a bit redundant to say "internal function library" when we are already 
talking about function definitions?/


I agree that it may seem redundant, I added "function" here for the sake of 
consistency with lines 1829/1830 (if applied to the master branch)
where the documentation mentions "standard internal*function* library".

Please, let me know what you think of the last two points for me to send the 
updated patch.

--
Oleg Sibiryakov
On 10.10.2025 10:15, Daniel Gustafsson wrote:

On 10 Sep 2025, at 09:54, Oleg wrote:

Dear all,
I have prepared a patch containing some minor inconsistencies in the 
documentation. Please, take a look.
I will be looking forward to your feedback.

Thanks for the patch, while most of these are obvious improvements I have a few
comments on some:


-   Change the definition of a replication slot.
+   Changes the definition of a replication slot.
Reading this page it seems we are mixing tense in many places, some say "Change
the" and "Read some" and elsewhere we use "Drops the".  Maybe a more holistic
approach would be better for this page to improve consistency?


-   Not enabled by default because it is resource intensive.
+   Not enabled by default because it is resource-intensive.
We use both spellings in multiple places, shouldn't all be changed?


-   COPY and other file-access functions.
+   the COPY command and file-access functions.
 ...
-   COPY and other file-access functions.
+   the COPY command and file-access functions.
 ...
-   COPY and other functions which allow executing a
+   the COPY command and functions, which allow 
executing a
I'm not sure about these, I think we use COPY without the the "the COPY
command" decoration in many places so I think it's more consistent like this.


- to call functions defined in the standard internal library, by using an
+ to call functions defined in the standard internal function library by 
using an
   interface similar to their SQL signature.
Isn't it a bit redundant to say "internal function library" when we are already
talking about function definitions?


The patch shall be applied to the REL_18_STABLE branch.

As you mentioned downthread, this is also for master.  Our workflow is to
always apply to master and backpatch from there.

--
Daniel Gustafsson




Documentation improvement patch

2024-09-05 Thread Oleg Sibiryakov

Dear all,

I have prepared a patch containing some minor inconsistencies in the 
documentation. Please, take a look.


The inconsistencies were noticed by: Ekaterina Kiryanova, Elena 
Indrupskaya, Maxim Yablokov, Anna Uraskova, Elena Karavaeva, and me.


We will be looking forward to your feedback.

The patch shall be applied to the REL_17_STABLE branch.

--
Regards,
Oleg Sibiryakov
Technical Writer
Postgres Professional, The Russian Postgres Company
https://postgrespro.ru
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index a63cc71efa2..7a905fd6a3a 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -8029,7 +8029,7 @@ SCRAM-SHA-256$<iteration count>:&l
   
   
If true, the associated replication slots (i.e. the main slot and the
-   table sync slots) in the upstream database are enabled to be
+   table synchronization slots) in the upstream database are enabled to be
synchronized to the standbys
   
  
diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 834cb30c85a..a76e9579a14 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -382,8 +382,8 @@ initdb --locale-provider=icu --icu-locale=en
   
   
The C locale behavior is identical to the
-   C locale in the libc provider. When using this
-   locale, the behavior may depend on the database encoding.
+   C locale in the libc provider. When
+   using this locale, the behavior may depend on the database encoding.
   
   
The C.UTF-8 locale is available only for when the
@@ -400,7 +400,7 @@ initdb --locale-provider=icu --icu-locale=en
  
   
The icu provider uses the external
-   ICUICU
+   ICU
library. PostgreSQL must have been
configured with support.
   
@@ -862,8 +862,9 @@ SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
 This SQL standard collation sorts using the Unicode Collation
 Algorithm with the Default Unicode Collation Element Table.  It is
 available in all encodings.  ICU support is required to use this
-collation, and behavior may change if Postgres is built with a
-different version of ICU.  (This collation has the same behavior as
+collation, and behavior may change if
+PostgreSQL is built with a different version
+of ICU.  (This collation has the same behavior as
 the ICU root locale; see .)

@@ -897,7 +898,7 @@ SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
 expressions), it uses the POSIX Compatible variant of Unicode https://www.unicode.org/reports/tr18/#Compatibility_Properties";>Compatibility
 Properties.  Behavior is efficient and stable within a
-Postgres major version.  This collation is
+PostgreSQL major version.  This collation is
 only available for encoding UTF8.

   
diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml
index 7c60aeab4f6..2027308f89f 100644
--- a/doc/src/sgml/config.sgml
+++ b/doc/src/sgml/config.sgml
@@ -708,9 +708,10 @@ include_dir 'conf.d'

 

-PostgreSQL sizes certain resources based directly on the value of
-max_connections. Increasing its value leads to
-higher allocation of those resources, including shared memory.
+PostgreSQL sizes certain resources based
+directly on the value of max_connections. Increasing
+its value leads to higher allocation of those resources, including
+shared memory.

 

@@ -9384,7 +9385,7 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv;

 If transaction_timeout is shorter or equal to
 idle_in_transaction_session_timeout or statement_timeout
-then the longer timeout is ignored.
+then the longer timeout is ignored.

 

@@ -10842,7 +10843,7 @@ dynamic_library_path = 'C:\tools\postgresql;H:\my_project\lib;$libdir'
 Turning this setting off is intended for environments where the
 configuration of PostgreSQL is managed by
 some external tool.
-In such environments, a well intentioned superuser might
+In such environments, a well-intentioned superuser might
 mistakenly use ALTER SYSTEM
 to change the configuration instead of using the external tool.
 This might result in unintended behavior, such as the external tool
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 220683b5eb4..1d2987e628d 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -313,7 +313,7 @@ INSERT INTO people (id, name, address) VALUE (DEFAULT, 'C',
 
   
The data type of an identity column must be one of the data types supported
-   by sequences.  (See .)  The properties
+   by sequences (see ).  The properties
of

Re: Documentation improvement patch

2024-09-10 Thread Oleg Sibiryakov

Thank you for your feedback.

1. Since we do not want to use  here, I suggest we hyphenate it 
as "built-in". What's your take on it?

2. Leaving not-null is fine.

--
Oleg Sibiryakov

On 06.09.2024 16:20, Daniel Gustafsson wrote:

On 5 Sep 2024, at 11:33, Oleg Sibiryakov  wrote:

Dear all,
I have prepared a patch containing some minor inconsistencies in the 
documentation. Please, take a look.
The inconsistencies were noticed by: Ekaterina Kiryanova, Elena Indrupskaya, 
Maxim Yablokov, Anna Uraskova, Elena Karavaeva, and me.
We will be looking forward to your feedback.
The patch shall be applied to the REL_17_STABLE branch.

Most of these seem fine, but I need another read-through to digest them fully.
Just a few small comments:

-Specifies the builtin provider locale for the database default
-collation order and character classification, overriding the setting
-.  The builtin provider locale for the 
database
+default collation order and character classification, overriding the
+setting .  The .
+Specifies the locale name when the builtin provider
+is used. Locale support is described in .


I don't think this use of "builtin" refers to the config value but rather the
type of locale, so I think it's correct to not use  here.


-for not-null constraints at all, so they are not
+for NOT NULL constraints at all, so they are not

This seems mostly to be a question of taste, I don't think not-null is
incorrect here.

--
Daniel Gustafsson








Re: Documentation improvement patch

2024-09-13 Thread Oleg Sibiryakov

Here is a patch without the builtin/built-in corrections (find attached).

But I still believe the issue should be discussed further.
We actually have two options: it is either a spelling mistake (since 
built-in should written with a hyphen), or we miss the  tag 
(since it is actually also a value).


So I do think we cannot really leave it as is.

--
Oleg Sibiryakov

On 11.09.2024 12:53, Peter Eisentraut wrote:

On 10.09.24 15:02, Daniel Gustafsson wrote:
On 10 Sep 2024, at 13:46, Oleg Sibiryakov 
 wrote:


1. Since we do not want to use  here, I suggest we 
hyphenate it as "built-in". What's your take on it?


I think that's the right choice given the hyphenation used in the 
rest of the
docs.  There are a few more places on that same page which should be 
built-in

rather than builtin to separate the concept from the parameter value.


I suspect that this would lead to the opposite confusion, people 
complaining that the provider is called "builtin" not "built-in".


Arguably, the other providers are also "built in".  There are no 
user-pluggable providers at this time.



diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index a63cc71efa2..7a905fd6a3a 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -8012,41 +8012,41 @@ SCRAM-SHA-256$<iteration count>:&l
for authentication
   
  
 
  
   
subrunasowner bool
   
   
If true, the subscription will be run with the permissions
of the subscription owner
   
  
 
  
   
subfailover bool
   
   
If true, the associated replication slots (i.e. the main slot and the
-   table sync slots) in the upstream database are enabled to be
+   table synchronization slots) in the upstream database are enabled to be
synchronized to the standbys
   
  
 
  
   
subconninfo text
   
   
Connection string to the upstream database
   
  
 
  
   
subslotname name
   
   
Name of the replication slot in the upstream database (also used
for the local replication origin name);
diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 834cb30c85a..a76e9579a14 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -365,59 +365,59 @@ initdb --locale-provider=icu --icu-locale=en
 Regardless of the locale provider, the operating system is still used to
 provide some locale-aware behavior, such as messages (see ).

 

 The available locale providers are listed below:

 

 
  builtin
  
   
The builtin provider uses built-in operations. Only
the C and C.UTF-8 locales are
supported for this provider.
   
   
The C locale behavior is identical to the
-   C locale in the libc provider. When using this
-   locale, the behavior may depend on the database encoding.
+   C locale in the libc provider. When
+   using this locale, the behavior may depend on the database encoding.
   
   
The C.UTF-8 locale is available only for when the
database encoding is UTF-8, and the behavior is
based on Unicode. The collation uses the code point values only. The
regular expression character classes are based on the "POSIX
Compatible" semantics, and the case mapping is the "simple" variant.
   
  
 
 
 
  icu
  
   
The icu provider uses the external
-   ICUICU
+   ICU
library. PostgreSQL must have been
configured with support.
   
   
ICU provides collation and character classification behavior that is
independent of the operating system and database encoding, which is
preferable if you expect to transition to other platforms without any
change in results. LC_COLLATE and
LC_CTYPE can be set independently of the ICU
locale.
   
   

 For the ICU provider, results may depend on the version of the ICU
 library used, as it is updated to reflect changes in natural language
 over time.

   
  
 
@@ -845,76 +845,77 @@ SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
 separate collate and ctype settings, so
 they are always the same.  Also, ICU collations are independent of the
 encoding, so there is always only one ICU collation of a given name in
 a database.

 

 Standard Collations
 

 On all platforms, the following collations are supported:
 
 
  
   unicode
   

 This SQL standard collation sorts using the Unicode Collation
 Algorithm with the Default Unicode Collation Element Table.  It is
 available in all encodings.  ICU support is re

Re: [DOCS] Add JSON to the list of acronyms in the documentation's appendix

2018-01-11 Thread Oleg Bartunov
On Thu, Jan 11, 2018 at 7:22 PM, Bruce Momjian  wrote:
> On Tue, Oct 24, 2017 at 08:18:49PM +, [email protected] wrote:
>> The following documentation comment has been logged on the website:
>>
>> Page: https://www.postgresql.org/docs/10/static/acronyms.html
>> Description:
>>
>> Hello all,
>>
>> I propose to add 'JSON' (for JavaScript Object Notation, 
>> http://json.org) to
>> the list of acronyms in the documentation's appendix
>> (https://www.postgresql.org/account/comments/new/10/acronyms.html/).
>
> Good idea.  Patch applied --- it will appear in PG 11.

Then, why not add JSONB !

>
> --
>   Bruce Momjian  http://momjian.us
>   EnterpriseDB http://enterprisedb.com
>
> + As you are, so once was I.  As I am, so you will be. +
> +  Ancient Roman grave inscription +



Re: Images in the official documentation

2018-02-24 Thread Oleg Bartunov
On Sat, Feb 24, 2018 at 4:04 AM, Peter Eisentraut
 wrote:
> On 2/23/18 11:21, Tom Lane wrote:
>> In the distant
>> past, as I recall, we had a GIF or two; but we abandoned that on the
>> grounds that it was unmaintainable and also incompatible with some
>> documentation output formats.  I'm not too sure what the state of
>> play is on the latter point, now that we've switched to XML.
>
> The complications with the image formats in the past were mainly around
> what ((pdf)jade)tex would accept.  The tools have shifted a bit now, and
> the zoo formats is a different one.  Nothing that a few make rules
> couldn't address, though, I think.
>
> The issue of how to manage the sources is still the same, though.

SVG format is ascii based vector format. We made experimental pdf with pictures
http://www.sai.msu.su/~megera/postgres/files/postgres-11-diagram.pdf
(GIN AM diagram, Appendix L).

Appendix L also demonstrates our sample database with step-by-step
introduction to Postgres for beginners.  We have a separate book for beginners,
which we released under BSD license and  it's available on
russian/english languages.
Our experience shows, that people really appreciate it. I hope we will
have time at PGCon
to discuss documentation somehow.

>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
>



Re: Images in the official documentation

2018-02-27 Thread Oleg Bartunov
On Mon, Feb 26, 2018 at 10:23 PM, Tom Lane  wrote:
> Craig Ringer  writes:
>> On 26 February 2018 at 12:16, Tom Lane  wrote:
>>> How can we resolve these issues?
>
>> Question the assumptions and requirements. Why do we actually _need_
>> diffable, mergeable images? Sure, it'd be *nice*, but what's the real world
>> impact if we don't have it?
>
> Well, I'll tell you exactly why I'm being sticky about this: we've been
> down this road before.  We used to have some figures in .gif format,
> and one of the problems with them was they were too hard to update.
> I don't buy the "they won't need updates" argument for a second, either.
> For instance, I recall that one of the images we had was a diagram of
> the system catalog cross-references, and it was constantly out of date
> because of the difficulty of updating it.
>
> Admittedly, this was 15+ years ago.  Maybe the state of the art in
> figure editors has advanced to the point where it won't be so hard.
> But color me suspicious.


In case you missed, a couple of years ago we discussed this on pgcon:

Heikki's version:
https://wiki.postgresql.org/wiki/Figures_%26_Pics_in_Docs

Emre suggested to use Markdeep (BSD license),
http://casual-effects.com/markdeep/
http://www.sai.msu.su/~megera/postgres/gin-ascii-v2.md.html

It looks good for small diagrams, but will not work for complex stuff,
such as pg_catalog structure.


>
> regards, tom lane
>



Re: Dead link in ltree documentation

2018-04-04 Thread Oleg Bartunov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


On Wed, Apr 4, 2018 at 12:59 PM, PG Doc comments form
 wrote:
> The following documentation comment has been logged on the website:
>
> Page: https://www.postgresql.org/docs/10/static/ltree.html
> Description:
>
> Hi,
>
> https://www.postgresql.org/docs/current/static/ltree.html links to
> www.dmoz.org which now returns a 403, since being closed down in 2017.
>
> Maybe it could link to the mirror https://dmoztools.net/ or the wikipedia
> page instead.

Attached is a small patch.


ltree.sgml.patch
Description: Binary data


Re: Dead link in ltree documentation

2018-04-05 Thread Oleg Bartunov
On Wed, Apr 4, 2018 at 8:17 PM, Alvaro Herrera  wrote:
> David G. Johnston wrote:
>
>> I'm not seeing the value in providing a link, especially one that we don't
>> control, here.  Futhermore, we could probably drop the whole "In
>> practice..." sentence.  But if not at least put a period after "limitation"
>> and drop the example and link.
>
> +1 remove the sentence.

Attached is a new patch, which removed the whole sentence with example link.

>
> --
> Álvaro Herrerahttps://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


ltree.sgml.patch
Description: Binary data


document json[b] limitation

2018-04-24 Thread Oleg Bartunov
Hi there,

Attached is a small patch, which documents the maximum size of
json[b] types. Probably, it's worth to patch previous releases, where
the types were introduced.

Best regards,
Oleg

-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


json.sgml.patch
Description: Binary data


Re: document json[b] limitation

2018-04-25 Thread Oleg Bartunov
On Wed, Apr 25, 2018 at 2:12 AM, Tom Lane  wrote:
> Oleg Bartunov  writes:
>> Attached is a small patch, which documents the maximum size of
>> json[b] types. Probably, it's worth to patch previous releases, where
>> the types were introduced.
>
> If you said "maximum size is 1GB", period, I'd believe it ... although
> I'm pretty sure that general limitation is already documented elsewhere.
> I don't believe that it's possible to make a 256 Gb jsonb.  How will
> that fit in the varlena header?

Oops, it should be 256 Mb :)

>
> regards, tom lane
>



-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: document json[b] limitation

2018-04-25 Thread Oleg Bartunov
On Wed, Apr 25, 2018 at 6:50 PM, Oleg Bartunov  wrote:
> On Wed, Apr 25, 2018 at 2:12 AM, Tom Lane  wrote:
>> Oleg Bartunov  writes:
>>> Attached is a small patch, which documents the maximum size of
>>> json[b] types. Probably, it's worth to patch previous releases, where
>>> the types were introduced.
>>
>> If you said "maximum size is 1GB", period, I'd believe it ... although
>> I'm pretty sure that general limitation is already documented elsewhere.
>> I don't believe that it's possible to make a 256 Gb jsonb.  How will
>> that fit in the varlena header?
>
> Oops, it should be 256 Mb :)

patch attached.

>
>>
>> regards, tom lane
>>
>
>
>
> --
> Postgres Professional: http://www.postgrespro.com
> The Russian Postgres Company



-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


json.sgml.patch
Description: Binary data


bloom documentation patch

2018-10-14 Thread Oleg Bartunov
Hi,

Please, consider attached patch, which improves contrib/bloom documentation.

Best regards,
Oleg
-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


bloom.sgml.patch
Description: Binary data


Re: bloom documentation patch

2018-10-17 Thread Oleg Bartunov
On Mon, Oct 15, 2018 at 12:48 AM Thomas Munro
 wrote:
>
> On Mon, Oct 15, 2018 at 10:15 AM Oleg Bartunov  
> wrote:
> > Please, consider attached patch, which improves contrib/bloom documentation.
>
> Hello Oleg, I have no comment on the technical details but here is
> some proof-reading of the English:
>
> +  Length of each signature (index entry) in bits, it is rounded
> up to the nearest
> +  multiple of 16. The default is 80 bits and maximum 
> is
>
> s/, it is/.  It is/
> s/and maximum/and the maximum/
>
> +   Bloom AM doesn't supports unique indexes.
>
> s/supports/support/
>
> +   Bloom AM doesn't supports NULL values.
>
> s/supports/support/
>

Thanks, Thomas, new patch attached.

> --
> Thomas Munro
> http://www.enterprisedb.com
>


-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


bloom.sgml.patch
Description: Binary data


Re: remove deprecated @@@ operator ?

2018-10-21 Thread Oleg Bartunov
On Sun, Oct 21, 2018 at 11:24 PM Tom Lane  wrote:
>
> Oleg Bartunov  writes:
> > The  commit 9b5c8d45f62bd3d243a40cc84deb93893f2f5122 is now 10+ years
> > old, may be we could remove deprecated @@@ operator ?
>
> Is it actually causing any problem?  AFAICS it's just a couple extra
> pg_operator entries, so why not leave it?
>
> I'd be +1 for removing it from the docs, though ...

attached a tiny patch for docs

>
> regards, tom lane



-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


func.sgml.patch
Description: Binary data


Re: First SVG graphic

2018-11-29 Thread Oleg Bartunov
On Wed, Nov 28, 2018 at 8:33 PM Jürgen Purtz  wrote:
>
> After one week no response at all? Neither positive nor negative. It seems 
> that the community has little interest in the SVG issue. Or in my suggestion?

First of all, I am BIG + for having diagrams in our documentation.

I once estimated the number of diagrams in our official documentation
and it was only  50 or so, that means, it is possible to make them
more or less centralized, at least for the initial version. If Jurgen+
agree to work on this I would be happy to help them in the parts I was
working on. For the initial version we could even provide the
generated images along with SVG-source files.

>
> Jürgen Purtz
>
>


-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: Return codes for archive and restore commands

2018-11-29 Thread Oleg Bartunov
On Thu, Nov 29, 2018 at 5:40 AM Stephen Frost  wrote:
>
> Greetings,
>
> * Michael Paquier ([email protected]) wrote:
> > On Wed, Nov 28, 2018 at 11:00:31AM +, PG Doc comments form wrote:
> > > For the archive command:
> > > <=128 There are not errors in the PostgreSQL log (messages with severity
> > > equal or higher than ERROR). Firstly 3 messages of type LOG about fault,
> > > then WARNING about this and pause for 1 minute, then repeated.
> > > >=129 FATAL error in the PostgeSQL log. The message about stoping an 
> > > >archive
> > > process, but not the database. Repeated after roughly 16 seconds.
> >
> > This code is around for some time, and comes from this commit:
> > commit: 3ad0728c817bf8abd2c76bd11d856967509b307c
> > author: Tom Lane 
> > date: Tue, 21 Nov 2006 20:59:53 +
> > committer: Tom Lane 
> > date: Tue, 21 Nov 2006 20:59:53 +
> > On systems that have setsid(2) (which should be just about everything except
> > Windows), arrange for each postmaster child process to be its own process
> > group leader, and deliver signals SIGINT, SIGTERM, SIGQUIT to the whole
> > process group not only the direct child process.  This provides saner 
> > behavior
> > for archive and recovery scripts; in particular, it's possible to shut down 
> > a
> > warm-standby recovery server using "pg_ctl stop -m immediate", since 
> > delivery
> > of SIGQUIT to the startup subprocess will result in killing the waiting
> > recovery_command.  Also, this makes Query Cancel and statement_timeout apply
> > to scripts being run from backends via system().  (There is no support in 
> > the
> > core backend for that, but it's widely done using untrusted PLs.)  Per gripe
> > from Stephen Harris and subsequent discussion.
> >
> > The relevant part if pgarch_archiveXlog() in pgarch.c, and this part
> > is most relevant:
> > * Per the Single Unix Spec, shells report exit status > 128 when a
> > * called command died on a signal.
> >
> > > In this case PostgreSQL tries confirm rules for return codes of a unix
> > > shell. A unix shell return 126 in the case of "command not executable", 
> > > 127
> > > in the case "command not found", 128+# of signal in the case if 
> > > application
> > > interrupted by uncatched signal.
> >
> > If you were to rewrite those paragraphs or make them more precise, how
> > would you actually shape your suggestions?  I personally quite like the
> > current formulations, but I am rather used to it to be honest.
>
> This is another example, at least imv, of why we really need to move
> away from archive_command as an interface for doing WAL archiving.

+1

>
> Having discussed this quite a bit lately with David Steele and Magnus,
> it's pretty clear that we need to completely rip out how this works
> today and rewrite it based around an extension model where a background
> worker can start up and essentially take the place of the archiver
> process, with flexibility to jump forward through the WAL stream,
> communicate clearly with other processes, handle failure to do so
> gracefully based on the specific cases, etc.
>
> We could then possibly write an extension to be included that mimics
> what archive_command does today, but imv we should immediately consider
> it deprecated and encourage people to move off of it.
>
> Thanks!
>
> Stephen



-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



SQL-2016 in docs

2019-05-12 Thread Oleg Bartunov
I noticed that in our docs for PG12 there is no SQL-2016, but we actually
have JSON Path implementation committed, which is a part of SQL-2016
standard. One missing feature - is datetime support.  Peter, will you
add this or I prepare the patch ?

Oleg
-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: SQL-2016 in docs

2019-05-30 Thread Oleg Bartunov
On Mon, May 27, 2019 at 2:33 PM Peter Eisentraut
 wrote:
>
> On 2019-05-12 10:14, Oleg Bartunov wrote:
> > I noticed that in our docs for PG12 there is no SQL-2016, but we actually
> > have JSON Path implementation committed, which is a part of SQL-2016
> > standard. One missing feature - is datetime support.  Peter, will you
> > add this or I prepare the patch ?
>
> I did a rough check of the SQL:2016 JSON path specification versus our
> regression tests, and came up with the attached supported feature list.
> Would you like to confirm it?

I confirm it.

>
> --
> Peter Eisentraut  http://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company




Re: TOC: List of Figures

2019-07-02 Thread Oleg Bartunov
On 2 Jul 2019, at 11:13, Jürgen Purtz  wrote:

After the integration of figures into the documentation it may be helpful
to extent the TOC with a 'List of Figures'. Any opinion?

If yes: The same for 'List of Tables' and 'List of Examples'?

There is a simple way to enable this feature: change line 56 of
stylesheet-html-common.xsl to: "book toc,title,figure,table,example". As
shown in a previous thread this leads to an ugly swelling of the TOC
similar to the formerly handling of release notes - especially for tables
and examples -, see attachment 1.

+1

The alternative is a downshift of the postings by one level, see attachment
2. How to realize this behavior is shown in attachment 3.





Re: Documentation improvement patch

2024-10-07 Thread Oleg Sibiryakov

Thank you, Daniel.

--
Oleg Sibiryakov

On 02.10.2024 15:58, Daniel Gustafsson wrote:

On 2 Oct 2024, at 10:09, Oleg Sibiryakov  wrote:

Thank you for your kind feedback! I will take due note of the comments in the 
next documentation patches as well.

I have made all the changes as per your feedback and also corrected paragraph 
reflow.

The third version of the patch is attached for your consideration.

Thanks, I have gone over and applied most of these changes.  I did leave out a
few (like the libc one) where the current page had multiple different versions.

--
Daniel Gustafsson








Re: Documentation improvement patch

2024-10-02 Thread Oleg Sibiryakov
Thank you for your kind feedback! I will take due note of the comments 
in the next documentation patches as well.


I have made all the changes as per your feedback and also corrected 
paragraph reflow.


The third version of the patch is attached for your consideration.

--
Oleg Sibiryakov

On 01.10.2024 11:59, Daniel Gustafsson wrote:

On 1 Oct 2024, at 10:04, Oleg Sibiryakov  wrote:
Here is a kind reminder of a small documentation improvement patch, which we 
started discussing a month ago.

I removed all the controversial points touched upon in this thread. Please, 
take a look once again at your convenience.

In general, when submitting a docs patch it's better to not reflow the
paragraphs when a modified line becomes too long.  Reading a 4 line diff where
only one thing changed in the first becomes harder than reading a single line
diff where the line is long.  The committer can ensure the lines are reflowed
prior to a commit, or it can be left as the final revision of a patch
submission once all changes are discussed-

A few comments on this version of the patch:


-   ICUICU
+   ICU

I don't think removing the name of the library changing the sentence from "The
icu provider uses the external ICU library" to "The icu provider uses the
external library" is an improvement.


-   by sequences.  (See .)  The properties
+   by sequences (see ).  The properties

This is a common construction in our docs, if it's considered to be a bad
practice the case should be argued (separately) for removing all of them
instead.


-   Comma separated list of publication names for which to subscribe
+   Comma-separated list of publication names for which to subscribe

There are two more cases of "comma separated" (config.sgml and copy.sgml),
should they be changed too?


-  the failover if required, enable the subscription, and refresh the
-  subscription. See
+  the failover if required, enable the subscription,
+  and refresh the subscription. See

This refers to the act of failing over, not the property value failover, and
should not be in .


-for not-null constraints at all, so they are not
+for NOT NULL constraints at all, so they are not

I'm still not convinced that this change makes the documentation more readable.


-   the MERGE command will perform a 
FULL
-   join between data_source
-   and the target table.  For this to work, at least one
+   the MERGE command will perform a
+   FULL JOIN between
+   data_source and the target
+   table. For this to work, at least one

This paragraph discuss various join types, keeping it lowercase "join" matches
the remainder of the paragraph and makes it more readable IMHO.  It's not
discussing syntax the user is expected to type so need to make it so.

--
Daniel Gustafsson
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index bfb97865e18..964c819a02d 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -8024,41 +8024,41 @@ SCRAM-SHA-256$<iteration count>:&l
for authentication
   
  
 
  
   
subrunasowner bool
   
   
If true, the subscription will be run with the permissions
of the subscription owner
   
  
 
  
   
subfailover bool
   
   
If true, the associated replication slots (i.e. the main slot and the
-   table sync slots) in the upstream database are enabled to be
+   table synchronization slots) in the upstream database are enabled to be
synchronized to the standbys
   
  
 
  
   
subconninfo text
   
   
Connection string to the upstream database
   
  
 
  
   
subslotname name
   
   
Name of the replication slot in the upstream database (also used
for the local replication origin name);
diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 834cb30c85a..dbbf7fc3726 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -365,41 +365,41 @@ initdb --locale-provider=icu --icu-locale=en
 Regardless of the locale provider, the operating system is still used to
 provide some locale-aware behavior, such as messages (see ).

 

 The available locale providers are listed below:

 

 
  builtin
  
   
The builtin provider uses built-in operations. Only
the C and C.UTF-8 locales are
supported for this provider.
   
   
The C locale behavior is identical to the
-   C locale in the libc provider. When using this
+   C locale in the libc provider. When using this
locale, the behavior may depend on the database encoding.
   
   
The C.UTF-8 locale is available only for when the
database encoding is UTF-8, and 

Re: Documentation improvement patch

2024-10-01 Thread Oleg Sibiryakov

Dear all,

Here is a kind reminder of a small documentation improvement patch, 
which we started discussing a month ago.


I removed all the controversial points touched upon in this thread. 
Please, take a look once again at your convenience.


The patch shall be applied to the master branch this time.

--
Regards,
Oleg Sibiryakov
Technical Writer
Postgres Professional, The Russian Postgres Company
https://postgrespro.ru

On 13.09.2024 13:50, Oleg Sibiryakov wrote:

Here is a patch without the builtin/built-in corrections (find attached).

But I still believe the issue should be discussed further.
We actually have two options: it is either a spelling mistake (since 
built-in should written with a hyphen), or we miss the  tag 
(since it is actually also a value).


So I do think we cannot really leave it as is.
diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index bfb97865e18..964c819a02d 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -8024,41 +8024,41 @@ SCRAM-SHA-256$<iteration count>:&l
for authentication
   
  
 
  
   
subrunasowner bool
   
   
If true, the subscription will be run with the permissions
of the subscription owner
   
  
 
  
   
subfailover bool
   
   
If true, the associated replication slots (i.e. the main slot and the
-   table sync slots) in the upstream database are enabled to be
+   table synchronization slots) in the upstream database are enabled to be
synchronized to the standbys
   
  
 
  
   
subconninfo text
   
   
Connection string to the upstream database
   
  
 
  
   
subslotname name
   
   
Name of the replication slot in the upstream database (also used
for the local replication origin name);
diff --git a/doc/src/sgml/charset.sgml b/doc/src/sgml/charset.sgml
index 834cb30c85a..a76e9579a14 100644
--- a/doc/src/sgml/charset.sgml
+++ b/doc/src/sgml/charset.sgml
@@ -365,59 +365,59 @@ initdb --locale-provider=icu --icu-locale=en
 Regardless of the locale provider, the operating system is still used to
 provide some locale-aware behavior, such as messages (see ).

 

 The available locale providers are listed below:

 

 
  builtin
  
   
The builtin provider uses built-in operations. Only
the C and C.UTF-8 locales are
supported for this provider.
   
   
The C locale behavior is identical to the
-   C locale in the libc provider. When using this
-   locale, the behavior may depend on the database encoding.
+   C locale in the libc provider. When
+   using this locale, the behavior may depend on the database encoding.
   
   
The C.UTF-8 locale is available only for when the
database encoding is UTF-8, and the behavior is
based on Unicode. The collation uses the code point values only. The
regular expression character classes are based on the "POSIX
Compatible" semantics, and the case mapping is the "simple" variant.
   
  
 
 
 
  icu
  
   
The icu provider uses the external
-   ICUICU
+   ICU
library. PostgreSQL must have been
configured with support.
   
   
ICU provides collation and character classification behavior that is
independent of the operating system and database encoding, which is
preferable if you expect to transition to other platforms without any
change in results. LC_COLLATE and
LC_CTYPE can be set independently of the ICU
locale.
   
   

 For the ICU provider, results may depend on the version of the ICU
 library used, as it is updated to reflect changes in natural language
 over time.

   
  
 
@@ -845,76 +845,77 @@ SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
 separate collate and ctype settings, so
 they are always the same.  Also, ICU collations are independent of the
 encoding, so there is always only one ICU collation of a given name in
 a database.

 

 Standard Collations
 

 On all platforms, the following collations are supported:
 
 
  
   unicode
   

 This SQL standard collation sorts using the Unicode Collation
 Algorithm with the Default Unicode Collation Element Table.  It is
 available in all encodings.  ICU support is required to use this
-collation, and behavior may change if Postgres is built with a
-different version of ICU.  (This collation has the same behavior as
+collation, and behavior may change if
+PostgreSQL is built with a different version
+of ICU.  (This collation has the same behav

Initcap works differently with different locale providers

2024-09-25 Thread Oleg Tselebrovskiy

Greetings, everyone!

One of our clients has found a difference in behaviour of initcap 
function when

using different locale providers, shown below

	postgres=# create database test_db_1 locale_provider=icu 
locale="ru_RU.UTF-8" template=template0;

NOTICE:  using standard form "ru-RU" for ICU locale "ru_RU.UTF-8"
CREATE DATABASE
postgres=# \c test_db_1;
You are now connected to database "test_db_1" as user "postgres".
test_db_1=# select initcap('ЧиЮ А.Ю.');
initcap
--
Чию А.ю.
(1 row)
test_db_1=# select initcap('joHn d.e.');
initcap
---
John D.e.
(1 row)
	postgres=# create database test_db_2 locale_provider=libc 
locale="ru_RU.UTF-8" template=template0;

CREATE DATABASE
postgres=# \c test_db_2
You are now connected to database "test_db_2" as user "postgres".
test_db_2=# select initcap('ЧиЮ А.Ю.');
initcap
--
Чию А.Ю.
(1 row)
test_db_2=# select initcap('joHn d.e.');
initcap
---
John D.E.
(1 row)

And an easier reproduction (should work for REL_12_STABLE and up)

postgres=# SELECT initcap('first.second' COLLATE "en-x-icu");
initcap
--
First.second
(1 row)
postgres=# SELECT initcap('first.second' COLLATE "en_US");
initcap
--
First.Second
(1 row)

This behaviour is reproducible on REL_12_STABLE and up to master

I don't believe that this is an erroneous behaviour, just a differing 
one, hence

just a documentation change proposition

I suggest adding a clarification that this function works differently 
with libc
and ICU providers because there is a difference in what a "word" is 
between them


In libc a word is a sequence of alphanumeric characters, separated by
non-alphanumeric characters (as it is written in documentation right 
now)

In ICU words are divided according to Unicode® Standard Annex #29 [1]

Similar issue was briefly discussed in [2]

The suggested documentation patch is attached (versions for 
REL_13_STABLE+ and

for REL_12_STABLE only)

[1]: https://www.unicode.org/reports/tr29/#Word_Boundaries
[2]: 
https://www.postgresql.org/message-id/CAEwbS1R8pwhRkwRo3XsPt24ErBNtFWuReAZhVPJwA3oqo148tA%40mail.gmail.com


Oleg Tselebrovskiy, Postgres Professionaldiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 1bde4091ca6..3ce5ad1d1f1 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3100,8 +3100,11 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in


 Converts the first letter of each word to upper case and the
-rest to lower case. Words are sequences of alphanumeric
-characters separated by non-alphanumeric characters.
+rest to lower case. When using the libc locale
+provider, words are sequences of alphanumeric characters separated
+by non-alphanumeric characters; when using the ICU locale provider,
+words are separated according to
+https://www.unicode.org/reports/tr29/#Word_Boundaries";>Unicode® Standard Annex #29.


 initcap('hi THOMAS')
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 487bb103637..1cd281dd90b 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -1932,8 +1932,11 @@
text

 Convert the first letter of each word to upper case and the
-rest to lower case. Words are sequences of alphanumeric
-characters separated by non-alphanumeric characters.
+rest to lower case. When using the libc locale
+provider, words are sequences of alphanumeric characters separated
+		by non-alphanumeric characters; when using the ICU locale provider,
+		words are separated according to
+		https://www.unicode.org/reports/tr29/#Word_Boundaries";>Unicode® Standard Annex #29.

initcap('hi THOMAS')
Hi Thomas


Re: Initcap works differently with different locale providers

2025-07-28 Thread Oleg Tselebrovskiy

Alexander Korotkov wrote at 2025-07-28 17:23:
On Mon, Jul 28, 2025 at 1:20 PM Alexander Korotkov 
 wrote:


On 25 Sep 2024, at 18:13, Oleg Tselebrovskiy 
 wrote:


Greetings, everyone!

One of our clients has found a difference in behaviour of initcap 
function when

using different locale providers, shown below

postgres=# create database test_db_1 locale_provider=icu 
locale="ru_RU.UTF-8" template=template0;

NOTICE:  using standard form "ru-RU" for ICU locale "ru_RU.UTF-8"
CREATE DATABASE
postgres=# \c test_db_1;
You are now connected to database "test_db_1" as user "postgres".
test_db_1=# select initcap('ЧиЮ А.Ю.');
initcap
--
Чию А.ю.
(1 row)
test_db_1=# select initcap('joHn d.e.');
initcap
---
John D.e.
(1 row)
postgres=# create database test_db_2 locale_provider=libc 
locale="ru_RU.UTF-8" template=template0;

CREATE DATABASE
postgres=# \c test_db_2
You are now connected to database "test_db_2" as user "postgres".
test_db_2=# select initcap('ЧиЮ А.Ю.');
initcap
--
Чию А.Ю.
(1 row)
test_db_2=# select initcap('joHn d.e.');
initcap
---
John D.E.
(1 row)

And an easier reproduction (should work for REL_12_STABLE and up)

postgres=# SELECT initcap('first.second' COLLATE "en-x-icu");
initcap
--
First.second
(1 row)
postgres=# SELECT initcap('first.second' COLLATE "en_US");
initcap
--
First.Second
(1 row)

This behaviour is reproducible on REL_12_STABLE and up to master

I don't believe that this is an erroneous behaviour, just a differing 
one, hence

just a documentation change proposition

I suggest adding a clarification that this function works differently 
with libc
and ICU providers because there is a difference in what a "word" is 
between them


In libc a word is a sequence of alphanumeric characters, separated by
non-alphanumeric characters (as it is written in documentation right 
now)

In ICU words are divided according to Unicode® Standard Annex #29 [1]

Similar issue was briefly discussed in [2]

The suggested documentation patch is attached (versions for 
REL_13_STABLE+ and

for REL_12_STABLE only)

[1]: https://www.unicode.org/reports/tr29/#Word_Boundaries
[2]: 
https://www.postgresql.org/message-id/CAEwbS1R8pwhRkwRo3XsPt24ErBNtFWuReAZhVPJwA3oqo148tA%40mail.gmail.com


Oleg Tselebrovskiy, Postgres 
Professional



I can confirm inicap works with libc and libicu as you stated.  The 
documentation patch looks good to me.  I’ve written a commit message.  
The REL_12_STABLE branch is not relevant anymore as it’s out of 
support.  I’m going to push this if no objections.


I'm sorry for these many messages.  My email client just gone crazy.
Must be fixed now.

--
Regards,
Alexander Korotkov
Supabase


Commit message looks good to me, also no objections on ignoring 
REL_12_STABLE :)

Thank you!

Regards, Oleg Tselebrovskiy




Re: Initcap works differently with different locale providers

2025-08-03 Thread Oleg Tselebrovskiy

Jeff Davis wrote at 2025-07-31 02:58:

Apologies for the late answer to a review


First, it doesn't mention the "builtin" provider, which uses the same
word break rules as libc.


Completely forgot about builtin provider in the first patch, my bad


Second, word boundaries can be complex, and I'm wondering if we should
not be so precise about what ICU does or doesn't do. For instance, ICU
has options like U_TITLECASE_ADJUST_TO_CASED,
U_TITLECASE_NO_BREAK_ADJUSTMENT, etc., and I'm not sure exactly
which one of those we use.


While [1] describes the default word boundary rules and could be useful
as a starting point, I agree that in reality it probably is more
complicated. I didn't exactly find any place where
U_TITLECASE_ADJUST_TO_CASED and alike are set in non-test code, but
U_TITLECASE_ADJUST_TO_CASED was used as a default prior to ICU 60,
so initcap() will also behave differently depending on ICU version


I'd prefer that we try to explain that INITCAP() is intended for
convenient display, and the specific result should not be relied upon
(at least for ICU; maybe for all providers). If you want specific word
boundary rules, write your own function.


First patch just adds this warning about not relying on initcap() exact
result. The second one is the same, but removes the part "what is a 
word"

since it's could be moot because we recommend writing custom functions,
so understanding what is a word is not exactly needed. Still on the 
fence

about which patch is better, though

Thoughts?

[1]: https://www.unicode.org/reports/tr29/#Word_Boundaries

Regards, Oleg Tselebrovskiydiff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 74a16af04ad..8a44e0ae593 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3148,12 +3148,19 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in


 Converts the first letter of each word to upper case and the
-rest to lower case. When using the libc locale
-provider, words are sequences of alphanumeric characters separated
-by non-alphanumeric characters; when using the ICU locale provider,
-words are separated according to
+rest to lower case. When using the libc or
+ builtin  locale provider, words are sequences
+of alphanumeric characters separated by non-alphanumeric characters;
+when using the ICU locale provider, words are separated according to
 https://www.unicode.org/reports/tr29/#Word_Boundaries";>Unicode Standard Annex #29.

+   
+This function is primarily used for convenient
+display, and the specific result should not be relied upon because of
+the differences between locale providers and between different
+ICU versions. If specific word boundary rules are desired,
+it is recomended to write a custom function.
+   

 initcap('hi THOMAS')
 Hi Thomas
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 74a16af04ad..c071d6df366 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3148,11 +3148,14 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in


 Converts the first letter of each word to upper case and the
-rest to lower case. When using the libc locale
-provider, words are sequences of alphanumeric characters separated
-by non-alphanumeric characters; when using the ICU locale provider,
-words are separated according to
-https://www.unicode.org/reports/tr29/#Word_Boundaries";>Unicode Standard Annex #29.
+rest to lower case.
+   
+   
+This function is primarily used for convenient
+display, and the specific result should not be relied upon because of
+the differences between locale providers and between different
+ICU versions. If specific word boundary rules are desired,
+it is recomended to write a custom function.


 initcap('hi THOMAS')


Re: Initcap works differently with different locale providers

2025-08-05 Thread Oleg Tselebrovskiy

Jeff Davis wrote at 2025-08-05 03:59:

One more thing: we should also change it to "... to  upper case (or
title case) and the rest to lower case...". Title case is for scripts
that have characters like 'Dž' (U+01C5).


Done based upon second version of previous patch. Again, there are two
versions - the first one has a mention of digraphs, like 'Dž' (U+01C5),
and the second one doesn't. And again, don't know which version is
better - title case without mentioning digraphs could be interpreted
as "don't capitalise articles and prepositions" or just "don't
capitalize articles", since the definition of "title case" is vague.
We have a "write your own function" clause, but still.

Maybe we should add an example of a digraph to the first patch to
make it more clear, if we go that path.diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 74a16af04ad..b32ec6e2cea 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3147,12 +3147,15 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
 text


-Converts the first letter of each word to upper case and the
-rest to lower case. When using the libc locale
-provider, words are sequences of alphanumeric characters separated
-by non-alphanumeric characters; when using the ICU locale provider,
-words are separated according to
-https://www.unicode.org/reports/tr29/#Word_Boundaries";>Unicode Standard Annex #29.
+Converts the first letter of each word to upper case (or title case
+if the letter is a digraph) and the rest to lower case.
+   
+   
+This function is primarily used for convenient
+display, and the specific result should not be relied upon because of
+the differences between locale providers and between different
+ICU versions. If specific word boundary rules are desired,
+it is recomended to write a custom function.


 initcap('hi THOMAS')
diff --git a/doc/src/sgml/func.sgml b/doc/src/sgml/func.sgml
index 74a16af04ad..f799b34dca7 100644
--- a/doc/src/sgml/func.sgml
+++ b/doc/src/sgml/func.sgml
@@ -3147,12 +3147,15 @@ SELECT NOT(ROW(table.*) IS NOT NULL) FROM TABLE; -- detect at least one null in
 text


-Converts the first letter of each word to upper case and the
-rest to lower case. When using the libc locale
-provider, words are sequences of alphanumeric characters separated
-by non-alphanumeric characters; when using the ICU locale provider,
-words are separated according to
-https://www.unicode.org/reports/tr29/#Word_Boundaries";>Unicode Standard Annex #29.
+Converts the first letter of each word to upper case (or title case)
+and the rest to lower case.
+   
+   
+This function is primarily used for convenient
+display, and the specific result should not be relied upon because of
+the differences between locale providers and between different
+ICU versions. If specific word boundary rules are desired,
+it is recomended to write a custom function.


 initcap('hi THOMAS')