Re: Foreign key joins revisited

2021-12-29 Thread Peter Eisentraut

On 28.12.21 20:45, Vik Fearing wrote:

I don't particularly like this whole idea anyway, but if we're going to
have it, I would suggest

 JOIN ... USING KEY ...

since USING currently requires a parenthesized list, that shouldn't
create any ambiguity.


In the 1990s, there were some SQL drafts that included syntax like

JOIN ... USING PRIMARY KEY | USING FOREIGN KEY | USING CONSTRAINT ...

AFAICT, these ideas just faded away because of other priorities, so if 
someone wants to revive it, some work already exists.






Re: Converting WAL to SQL

2021-12-29 Thread Peter Eisentraut

On 29.12.21 07:18, rajesh singarapu wrote:
I am wondering if we have a mechanism to convert WAL records to SQL 
statements.


I am able to use logical decoders like wal2json or test_decoding for 
converting WAL to readable format, but I am looking for a way to convert 
WAL to sql statements.


Using pglogical in SPI mode has such a logic.




Re: UNIQUE null treatment option

2021-12-29 Thread Peter Eisentraut

Here is a rebased version of this patch.

On 27.08.21 14:38, Peter Eisentraut wrote:

The SQL standard has been ambiguous about whether null values in
unique constraints should be considered equal or not.  Different
implementations have different behaviors.  In the SQL:202x draft, this
has been formalized by making this implementation-defined and adding
an option on unique constraint definitions UNIQUE [ NULLS [NOT]
DISTINCT ] to choose a behavior explicitly.

This patch adds this option to PostgreSQL.  The default behavior
remains UNIQUE NULLS DISTINCT.  Making this happen in the btree code
is apparently pretty easy; most of the patch is just to carry the flag 
around to all the places that need it.


The CREATE UNIQUE INDEX syntax extension is not from the standard,
it's my own invention.

(I named all the internal flags, catalog columns, etc. in the
negative ("nulls not distinct") so that the default PostgreSQL
behavior is the default if the flag is false.  But perhaps the double
negatives make some code harder to read.)
From ffd6c8e0ce24f3c56bd44e71588d4165e20e9157 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 29 Dec 2021 10:49:57 +0100
Subject: [PATCH v2] Add UNIQUE null treatment option

The SQL standard has been ambiguous about whether null values in
unique constraints should be considered equal or not.  Different
implementations have different behaviors.  In the SQL:202x draft, this
has been formalized by making this implementation-defined and adding
an option on unique constraint definitions UNIQUE [ NULLS [NOT]
DISTINCT ] to choose a behavior explicitly.

This patch adds this option to PostgreSQL.  The default behavior
remains UNIQUE NULLS DISTINCT.  Making this happen in the btree code
is pretty easy; most of the patch is just to carry the flag around to
all the places that need it.

The CREATE UNIQUE INDEX syntax extension is not from the standard,
it's my own invention.

XXX I named all the internal flags, catalog columns, etc. in the
negative ("nulls not distinct") so that the default PostgreSQL
behavior is the default if the flag is false.  But perhaps the double
negatives make some code harder to read.

Discussion: 
https://www.postgresql.org/message-id/flat/84e5ee1b-387e-9a54-c326-9082674bd...@enterprisedb.com
---
 doc/src/sgml/catalogs.sgml | 13 +
 doc/src/sgml/ddl.sgml  | 29 --
 doc/src/sgml/information_schema.sgml   | 12 +
 doc/src/sgml/ref/alter_table.sgml  |  4 +-
 doc/src/sgml/ref/create_index.sgml | 13 +
 doc/src/sgml/ref/create_table.sgml | 11 ++--
 src/backend/access/nbtree/nbtinsert.c  | 10 ++--
 src/backend/access/nbtree/nbtsort.c| 15 +-
 src/backend/catalog/index.c|  7 +++
 src/backend/catalog/information_schema.sql |  9 +++-
 src/backend/catalog/sql_features.txt   |  1 +
 src/backend/catalog/toasting.c |  1 +
 src/backend/commands/indexcmds.c   |  3 +-
 src/backend/nodes/copyfuncs.c  |  2 +
 src/backend/nodes/equalfuncs.c |  2 +
 src/backend/nodes/makefuncs.c  |  3 +-
 src/backend/nodes/outfuncs.c   |  2 +
 src/backend/parser/gram.y  | 47 ++---
 src/backend/parser/parse_utilcmd.c |  3 ++
 src/backend/utils/adt/ruleutils.c  | 23 +---
 src/backend/utils/cache/relcache.c |  1 +
 src/backend/utils/sort/tuplesort.c |  8 ++-
 src/bin/pg_dump/pg_dump.c  | 19 +--
 src/bin/pg_dump/pg_dump.h  |  1 +
 src/bin/psql/describe.c| 19 +--
 src/include/catalog/pg_index.h |  1 +
 src/include/nodes/execnodes.h  |  1 +
 src/include/nodes/makefuncs.h  |  2 +-
 src/include/nodes/parsenodes.h |  2 +
 src/include/utils/tuplesort.h  |  1 +
 src/test/regress/expected/constraints.out  | 23 
 src/test/regress/expected/create_index.out | 61 ++
 src/test/regress/sql/constraints.sql   | 14 +
 src/test/regress/sql/create_index.sql  | 37 +
 34 files changed, 342 insertions(+), 58 deletions(-)

diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml
index 03e2537b07..f0d49e4841 100644
--- a/doc/src/sgml/catalogs.sgml
+++ b/doc/src/sgml/catalogs.sgml
@@ -4275,6 +4275,19 @@ pg_index Columns
   
  
 
+ 
+  
+   indnullsnotdistinct bool
+  
+  
+   This value is only used for unique indexes.  If false, this unique
+   index will consider null values distinct (so the index can contain
+   multiple null values in a column, the default PostgreSQL behavior).  If
+   it is true, it will consider null values to be equal (so the index can
+   only contain one null value in a column).
+  
+ 
+
  
   
indisprimary bool
diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 64d9030652..f622285ba0 1006

[PATCH] allow src/tools/msvc/*.bat files to be called from the root of the source tree

2021-12-29 Thread Anton Voloshin

Hello,

currently, on Windows/MSVC, src\tools\msvc\*.bat files mostly require 
being in that src\tools\msvc directory first.


I suggest an obvious fix:

diff --git a/src/tools/msvc/build.bat b/src/tools/msvc/build.bat
index 4001ac1d0d1..407b6559cfb 100755
--- a/src/tools/msvc/build.bat
+++ b/src/tools/msvc/build.bat
@@ -3,4 +3,4 @@ REM src/tools/msvc/build.bat
 REM all the logic for this now belongs in build.pl. This file really
 REM only exists so you don't have to type "perl build.pl"
 REM Resist any temptation to add any logic here.
-@perl build.pl %*
+@perl %~dp0\build.pl %*
diff --git a/src/tools/msvc/install.bat b/src/tools/msvc/install.bat
index d03277eff2b..98edf6bdffb 100644
--- a/src/tools/msvc/install.bat
+++ b/src/tools/msvc/install.bat
@@ -3,4 +3,4 @@ REM src/tools/msvc/install.bat
 REM all the logic for this now belongs in install.pl. This file really
 REM only exists so you don't have to type "perl install.pl"
 REM Resist any temptation to add any logic here.
-@perl install.pl %*
+@perl %~dp0\install.pl %*
diff --git a/src/tools/msvc/vcregress.bat b/src/tools/msvc/vcregress.bat
index a981d3a6aa1..0d65c823e13 100644
--- a/src/tools/msvc/vcregress.bat
+++ b/src/tools/msvc/vcregress.bat
@@ -3,4 +3,4 @@ REM src/tools/msvc/vcregress.bat
 REM all the logic for this now belongs in vcregress.pl. This file really
 REM only exists so you don't have to type "perl vcregress.pl"
 REM Resist any temptation to add any logic here.
-@perl vcregress.pl %*
+@perl %~dp0\vcregress.pl %*

This patch uses standard windows cmd's %~dp0 to get the complete path 
(drive, "d", and path, "p") of the currently executing .bat file to get 
proper path of a .pl file to execute. I find the following link useful 
whenever I need to remember details on cmd's %-substitution rules: 
https://ss64.com/nt/syntax-args.html


With this change, one can call those .bat files, e.g. 
src\tools\msvc\build.bat, without leaving the root of the source tree.


Not sure if similar change should be applied to pgflex.bat and 
pgbison.bat -- never used them on Windows and they seem to require being 
called from the root, but perhaps they deserve a similar change.


If accepted, do you think this change is worthy of back-porting?

Please advise if you think this change is a beneficial one.

P.S. Yes, I am aware of very probable upcoming move to meson, but until 
then this little patch really helps me whenever I have to deal with 
Windows and MSVC from the command line. Besides, it could help old 
branches as well.


--
Anton Voloshin

https://postgrespro.ru
Postgres Professional, The Russian Postgres Companydiff --git a/src/tools/msvc/build.bat b/src/tools/msvc/build.bat
index 4001ac1d0d1..407b6559cfb 100755
--- a/src/tools/msvc/build.bat
+++ b/src/tools/msvc/build.bat
@@ -3,4 +3,4 @@ REM src/tools/msvc/build.bat
 REM all the logic for this now belongs in build.pl. This file really
 REM only exists so you don't have to type "perl build.pl"
 REM Resist any temptation to add any logic here.
-@perl build.pl %*
+@perl %~dp0\build.pl %*
diff --git a/src/tools/msvc/install.bat b/src/tools/msvc/install.bat
index d03277eff2b..98edf6bdffb 100644
--- a/src/tools/msvc/install.bat
+++ b/src/tools/msvc/install.bat
@@ -3,4 +3,4 @@ REM src/tools/msvc/install.bat
 REM all the logic for this now belongs in install.pl. This file really
 REM only exists so you don't have to type "perl install.pl"
 REM Resist any temptation to add any logic here.
-@perl install.pl %*
+@perl %~dp0\install.pl %*
diff --git a/src/tools/msvc/vcregress.bat b/src/tools/msvc/vcregress.bat
index a981d3a6aa1..0d65c823e13 100644
--- a/src/tools/msvc/vcregress.bat
+++ b/src/tools/msvc/vcregress.bat
@@ -3,4 +3,4 @@ REM src/tools/msvc/vcregress.bat
 REM all the logic for this now belongs in vcregress.pl. This file really
 REM only exists so you don't have to type "perl vcregress.pl"
 REM Resist any temptation to add any logic here.
-@perl vcregress.pl %*
+@perl %~dp0\vcregress.pl %*


Re: automatically generating node support functions

2021-12-29 Thread Peter Eisentraut

On 12.10.21 15:52, Andrew Dunstan wrote:

I haven't been through the whole thing, but I did notice this: the
comment stripping code looks rather fragile. I think it would blow up if
there were a continuation line not starting with  qr/\s*\*/. It's a lot
simpler and more robust to do this if you slurp the file in whole.
Here's what we do in the buildfarm code:

     my $src = file_contents($_);
 # strip C comments
     # We used to use the recipe in perlfaq6 but there is actually no point.
     # We don't need to keep the quoted string values anyway, and
     # on some platforms the complex regex causes perl to barf and crash.
     $src =~ s{/\*.*?\*/}{}gs;

After you've done that splitting it into lines is pretty simple.


Here is an updated patch, with some general rebasing, and the above 
improvement.  It now also generates #include lines necessary in 
copyfuncs etc. to pull in all the node types it operates on.


Further, I have looked more into the "metadata" approach discussed in 
[0].  It's pretty easy to generate that kind of output from the data 
structures my script produces.  You just loop over all the node types 
and print stuff and keep a few counters.  I don't plan to work on that 
at this time, but I just wanted to point out that if people wanted to 
move into that direction, my patch wouldn't be in the way.



[0]: 
https://www.postgresql.org/message-id/flat/20190828234136.fk2ndqtld3onfrrp%40alap3.anarazel.deFrom e2c08d8b793200a07b8fe5ae85dd23f401ddcef1 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 29 Dec 2021 12:00:41 +0100
Subject: [PATCH v3] Automatically generate node support functions

Add a script to automatically generate the node support functions
(copy, equal, out, and read, as well as the node tags enum) from the
struct definitions.

For each of the four node support files, it creates two include files,
e.g., copyfuncs.inc1.c and copyfuncs.inc2.c, to include in the main
file.  All the scaffolding of the main file stays in place.

TODO: In this patch, I have only ifdef'ed out the code to could be
removed, mainly so that it won't constantly have merge conflicts.
Eventually, that should all be changed to delete the code.  When we do
that, some code comments should probably be preserved elsewhere, so
that will need another pass of consideration.

I have tried to mostly make the coverage of the output match what is
currently there.  For example, one could now do out/read coverage of
utility statement nodes, but I have manually excluded those for now.
The reason is mainly that it's easier to diff the before and after,
and adding a bunch of stuff like this might require a separate
analysis and review.

Subtyping (TidScan -> Scan) is supported.

For the hard cases, you can just write a manual function and exclude
generating one.  For the not so hard cases, there is a way of
annotating struct fields to get special behaviors.  For example,
pg_node_attr(equal_ignore) has the field ignored in equal functions.

Discussion: 
https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce%40enterprisedb.com
---
 src/backend/Makefile  |   8 +-
 src/backend/nodes/.gitignore  |   3 +
 src/backend/nodes/Makefile|  46 ++
 src/backend/nodes/copyfuncs.c |  19 +-
 src/backend/nodes/equalfuncs.c|  22 +-
 src/backend/nodes/gen_node_support.pl | 660 ++
 src/backend/nodes/outfuncs.c  |  30 +-
 src/backend/nodes/readfuncs.c |  23 +-
 src/include/nodes/.gitignore  |   2 +
 src/include/nodes/nodes.h |   8 +
 src/include/nodes/parsenodes.h|   2 +-
 src/include/nodes/pathnodes.h | 134 +++---
 src/include/nodes/plannodes.h |  90 ++--
 src/include/nodes/primnodes.h |  20 +-
 src/include/pg_config_manual.h|   4 +-
 src/include/utils/rel.h   |   6 +-
 src/tools/msvc/Solution.pm|  46 ++
 17 files changed, 976 insertions(+), 147 deletions(-)
 create mode 100644 src/backend/nodes/.gitignore
 create mode 100644 src/backend/nodes/gen_node_support.pl
 create mode 100644 src/include/nodes/.gitignore

diff --git a/src/backend/Makefile b/src/backend/Makefile
index 0da848b1fd..a33db1ae01 100644
--- a/src/backend/Makefile
+++ b/src/backend/Makefile
@@ -143,11 +143,15 @@ storage/lmgr/lwlocknames.h: 
storage/lmgr/generate-lwlocknames.pl storage/lmgr/lw
 submake-catalog-headers:
$(MAKE) -C catalog distprep generated-header-symlinks
 
+# run this unconditionally to avoid needing to know its dependencies here:
+submake-nodes-headers:
+   $(MAKE) -C nodes distprep generated-header-symlinks
+
 # run this unconditionally to avoid needing to know its dependencies here:
 submake-utils-headers:
$(MAKE) -C utils distprep generated-header-symlinks
 
-.PHONY: submake-catalog-headers submake-utils-headers
+.PHONY: submake-catalog-headers submake-nodes-headers submake-utils-headers
 
 # Make symlinks for these header

Re: Logical replication timeout problem

2021-12-29 Thread Fabrice Chapuis
I put the instance with high level debug mode.
I try to do some log interpretation: After having finished writing the
modifications generated by the insert in the snap files,
then these files are read (restored). One minute after this work starts,
the worker process exit with an error code = 1.
I see that keepalive messages were sent before the work process work leave.

2021-12-28 10:50:01.894 CET [55792] LOCATION:  WalSndKeepalive,
walsender.c:3365
...
2021-12-28 10:50:31.854 CET [55792] STATEMENT:  START_REPLICATION SLOT
"sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s012a00"')
2021-12-28 10:50:31.907 CET [55792] DEBUG:  0: StartTransaction(1)
name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0
2021-12-28 10:50:31.907 CET [55792] LOCATION:  ShowTransactionStateRec,
xact.c:5075
2021-12-28 10:50:31.907 CET [55792] STATEMENT:  START_REPLICATION SLOT
"sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s012a00"')
2021-12-28 10:50:31.907 CET [55792] DEBUG:  0: spill 2271 changes in
XID 14312 to disk
2021-12-28 10:50:31.907 CET [55792] LOCATION:  ReorderBufferSerializeTXN,
reorderbuffer.c:2245
2021-12-28 10:50:31.907 CET [55792] STATEMENT:  START_REPLICATION SLOT
"sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s012a00"')
*2021-12-28 10:50:32.110 CET [55792] DEBUG:  0: restored 4096/22603999
changes from disk*
2021-12-28 10:50:32.110 CET [55792] LOCATION:  ReorderBufferIterTXNNext,
reorderbuffer.c:1156
2021-12-28 10:50:32.110 CET [55792] STATEMENT:  START_REPLICATION SLOT
"sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names
'"pub008_s012a00"')
2021-12-28 10:50:32.138 CET [55792] DEBUG:  0: restored 4096/22603999
changes from disk
...

*2021-12-28 10:50:35.341 CET [55794] DEBUG:  0: sending replication
keepalive2021-12-28 10:50:35.341 CET [55794] LOCATION:  WalSndKeepalive,
walsender.c:3365*
...
*2021-12-28 10:51:31.995 CET [55791] ERROR:  XX000: terminating logical
replication worker due to timeout*

*2021-12-28 10:51:31.995 CET [55791] LOCATION:  LogicalRepApplyLoop,
worker.c:1267*

Could this function in* Apply main loop* in worker.c help to find a
solution?

rc = WaitLatchOrSocket(MyLatch,
WL_SOCKET_READABLE | WL_LATCH_SET |
WL_TIMEOUT | WL_POSTMASTER_DEATH,
fd, wait_time,
WAIT_EVENT_LOGICAL_APPLY_MAIN);
Thanks for your help

Fabrice

On Thu, Dec 23, 2021 at 11:52 AM Amit Kapila 
wrote:

> On Wed, Dec 22, 2021 at 8:50 PM Fabrice Chapuis 
> wrote:
> >
> > Hello Amit,
> >
> > I was able to reproduce the timeout problem in the lab.
> > After loading more than 20 millions of rows in a table which is not
> replicated (insert command ends without error), errors related to logical
> replication processes appear in the postgres log.
> > Approximately every 5 minutes worker process is restarted. The snap
> files in the slot directory are still present. The replication system seems
> to be blocked. Why these snap files are not removed. What do they contain?
> >
>
> These contain changes of insert. I think these are not removed for
> your case as your long transaction is never finished. As mentioned
> earlier, for such cases, it is better to use 'streaming' feature
> released as part of PG-14 but anyway here we are trying to debug
> timeout problem.
>
> > I will recompile postgres with your patch to debug.
> >
>
> Okay, that might help.
>
> --
> With Regards,
> Amit Kapila.
>


Re: Converting WAL to SQL

2021-12-29 Thread Fabrízio de Royes Mello
On Wed, 29 Dec 2021 at 03:18 rajesh singarapu 
wrote:

> Hi Hackers,
>
> I am wondering if we have a mechanism to convert WAL records to SQL
> statements.
>
> I am able to use logical decoders like wal2json or test_decoding for
> converting WAL to readable format, but I am looking for a way to convert
> WAL to sql statements.
>
>
>
Try this:
https://github.com/michaelpq/pg_plugins/tree/main/decoder_raw

-- 
   Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento


Re: generalized conveyor belt storage

2021-12-29 Thread Amul Sul
On Wed, Dec 15, 2021 at 9:04 PM Robert Haas  wrote:
>
> On Wed, Dec 15, 2021 at 10:03 AM Matthias van de Meent
>  wrote:
> [...]

Thought patch is WIP, here are a few comments that I found while
reading the patch and thought might help:

+   {
+   if (meta->cbm_oldest_index_segment ==
+   meta->cbm_newest_index_segment)
+   elog(ERROR, "must remove last index segment when only one remains");
+   meta->cbm_oldest_index_segment = segno;
+   }

How about having to assert or elog to ensure that 'segno' is indeed
the successor?
--

+   if (meta->cbm_index[offset] != offset)
+   elog(ERROR,
+"index entry at offset %u was expected to be %u but found %u",
+offset, segno, meta->cbm_index[offset]);

IF condition should be : meta->cbm_index[offset] != segno ?
--

+   if (segno >= CB_METAPAGE_FREESPACE_BYTES * BITS_PER_BYTE)
+   elog(ERROR, "segment %u out of range for metapage fsm", segno);

I think CB_FSM_SEGMENTS_FOR_METAPAGE should be used like
cb_metapage_set_fsm_bit()
--

+/*
+ * Increment the count of segments allocated.
+ */
+void
+cb_metapage_increment_next_segment(CBMetapageData *meta, CBSegNo segno)
+{
+   if (segno != meta->cbm_next_segment)
+   elog(ERROR, "extending to create segment %u but next segment is %u",
+segno, meta->cbm_next_segment);

I didn't understand this error, what does it mean?  It would be
helpful to add a brief about what it means and why we are throwing it
and/or rephrasing the error bit.
--

+++ b/src/backend/access/conveyor/cbxlog.c
@@ -0,0 +1,442 @@
+/*-
+ *
+ * cbxlog.c
+ *   XLOG support for conveyor belts.
+ *
+ * For each REDO function in this file, see cbmodify.c for the
+ * corresponding function that performs the modification during normal
+ * running and logs the record that we REDO here.
+ *
+ * Copyright (c) 2016-2021, PostgreSQL Global Development Group
+ *
+ * src/backend/access/conveyor/cbmodify.c

Incorrect file name: s/cbmodify.c/cbxlog.c/
--

+   can_allocate_segment =
+   (free_segno_first_blkno < possibly_not_on_disk_blkno)

The condition should be '<=' ?
--

+ * ConveyorBeltPhysicalTruncate. For more aggressive cleanup options, see
+ * ConveyorBeltCompact or ConveyorBeltRewrite.

Didn't find ConveyorBeltCompact or ConveyorBeltRewrite code, might yet
to be implemented?
--

+ * the value computed here here if the entry at that offset is already

"here" twice.
--

And few typos:

othr
semgents
fucntion
refrenced
initialied
remve
extrordinarily
implemenation
--

Regards,
Amul




Re: Documenting when to retry on serialization failure

2021-12-29 Thread Simon Riggs
On Wed, 29 Dec 2021 at 03:30, Thomas Munro  wrote:
>
> On Fri, Dec 10, 2021 at 1:43 AM Simon Riggs
>  wrote:
> > "Applications using this level must be prepared to retry transactions
> > due to serialization failures."
> > ...
> > "When an application receives this error message, it should abort the
> > current transaction and retry the whole transaction from the
> > beginning."
> >
> > I note that the specific error codes this applies to are not
> > documented, so lets discuss what the docs for that would look like.
>
> +1 for naming the error.

I've tried to sum up the various points from everybody into this doc
patch. Thanks all for replies.

-- 
Simon Riggshttp://www.EnterpriseDB.com/


retryable_error_docs.v1.patch
Description: Binary data


Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

2021-12-29 Thread Stephen Frost
Greetings,

* SATYANARAYANA NARLAPURAM (satyanarlapu...@gmail.com) wrote:
> On Sat, Dec 25, 2021 at 9:25 PM Dilip Kumar  wrote:
> > On Sun, Dec 26, 2021 at 10:36 AM SATYANARAYANA NARLAPURAM <
> > satyanarlapu...@gmail.com> wrote:
> >>> Actually all the WAL insertions are done under a critical section
> >>> (except few exceptions), that means if you see all the references of
> >>> XLogInsert(), it is always called under the critical section and that is 
> >>> my
> >>> main worry about hooking at XLogInsert level.
> >>>
> >>
> >> Got it, understood the concern. But can we document the limitations of
> >> the hook and let the hook take care of it? I don't expect an error to be
> >> thrown here since we are not planning to allocate memory or make file
> >> system calls but instead look at the shared memory state and add delays
> >> when required.
> >>
> >>
> > Yet another problem is that if we are in XlogInsert() that means we are
> > holding the buffer locks on all the pages we have modified, so if we add a
> > hook at that level which can make it wait then we would also block any of
> > the read operations needed to read from those buffers.  I haven't thought
> > what could be better way to do this but this is certainly not good.
> >
> 
> Yes, this is a problem. The other approach is adding a hook at
> XLogWrite/XLogFlush? All the other backends will be waiting behind the
> WALWriteLock. The process that is performing the write enters into a busy
> loop with small delays until the criteria are met. Inability to process the
> interrupts inside the critical section is a challenge in both approaches.
> Any other thoughts?

Why not have this work the exact same way sync replicas do, except that
it's based off of some byte/time lag for some set of async replicas?
That is, in RecordTransactionCommit(), perhaps right after the
SyncRepWaitForLSN() call, or maybe even add this to that function?  Sure
seems like there's a lot of similarity.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary

2021-12-29 Thread Stephen Frost
Greetings,

* Euler Taveira (eu...@eulerto.com) wrote:
> On Thu, Dec 23, 2021, at 9:58 AM, Bharath Rupireddy wrote:
> > pg_archivecleanup currently takes a WAL file name as input to delete
> > the WAL files prior to it [1]. As suggested by Satya (cc-ed) in
> > pg_replslotdata thread [2], can we enhance the pg_archivecleanup to
> > automatically detect the last checkpoint (from control file) LSN,
> > calculate the lowest restart_lsn required by the replication slots, if
> > any (by reading the replication slot info from pg_logical directory),
> > archive the unneeded (an archive_command similar to that of the one
> > provided in the server config can be provided as an input) WAL files
> > before finally deleting them? Making pg_archivecleanup tool as an
> > end-to-end solution will help greatly in disk full situations because
> > of WAL files growth (inactive replication slots, archive command
> > failures, infrequent checkpoint etc.).

The overall idea of having a tool for this isn't a bad idea, but ..

> pg_archivecleanup is a tool to remove WAL files from the *archive*. Are you
> suggesting to use it for removing files from pg_wal directory too? No, thanks.

We definitely shouldn't have it be part of pg_archivecleanup for the
simple reason that it'll be really confusing and almost certainly will
be mis-used.  For my 2c, we should just remove pg_archivecleanup
entirely.

> WAL files are a key component for backup and replication. Hence, you cannot
> deliberately allow a tool to remove WAL files from PGDATA. IMO this issue
> wouldn't occur if you have a monitoring system and alerts and someone to keep
> an eye on it. If the disk full situation was caused by a failed archive 
> command
> or a disconnected standby, it is easy to figure out; the fix is simple.

This is perhaps a bit far- PG does, in fact, remove WAL files from
PGDATA.  Having a tool which will do this safely when the server isn't
able to be brought online due to lack of disk space would certainly be
helpful rather frequently.  I agree that monitoring and alerting are
things that everyone should implement and pay attention to, but that
doesn't happen and instead people end up just blowing away pg_wal and
corrupting their database when, had a tool existed, they could have
avoided that happening and brought the system back online in relatively
short order without any data loss.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Proposal: More structured logging

2021-12-29 Thread Justin Pryzby
> Subject: [PATCH v3 2/3] Add test module for the new tag functionality.
...
> +test_logging(PG_FUNCTION_ARGS)
> +{
...
> +  (errmsg("%s", message),
> +   ({
> + forboth(lk, keys, lv, values)
> + {
> + (errtag(lfirst(lk), "%s", (char *) lfirst(lv)));
> + }})
> + ));

The windows build fails with that syntax.
http://cfbot.cputube.org/ronan-dunklau.html
https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.157923

> Subject: [PATCH v3 3/3] Output error tags in CSV logs
> +++ b/doc/src/sgml/config.sgml
> @@ -7370,6 +7371,7 @@ CREATE TABLE postgres_log
>backend_type text,
>leader_pid integer,
>query_id bigint,
> +  tags jsonb
>PRIMARY KEY (session_id, session_line_num)
>  );
>  

That's invalid sql due to missing a trailing ",".

You should also update file-fdw.sgml - which I only think of since we forgot in
to update it before e568ed0eb and 0830d21f5.  config.sgml should have a comment
as a reminder to do that.

-- 
Justin




Report checkpoint progress in server logs

2021-12-29 Thread Bharath Rupireddy
Hi,

At times, some of the checkpoint operations such as removing old WAL
files, dealing with replication snapshot or mapping files etc. may
take a while during which the server doesn't emit any logs or
information, the only logs emitted are LogCheckpointStart and
LogCheckpointEnd. Many times this isn't a problem if the checkpoint is
quicker, but there can be extreme situations which require the users
to know what's going on with the current checkpoint.

Given that the commit 9ce346ea [1] introduced a nice mechanism to
report the long running operations of the startup process in the
server logs, I'm thinking we can have a similar progress mechanism for
the checkpoint as well. There's another idea suggested in a couple of
other threads to have a pg_stat_progress_checkpoint similar to
pg_stat_progress_analyze/vacuum/etc. But the problem with this idea is
during the end-of-recovery or shutdown checkpoints, the
pg_stat_progress_checkpoint view isn't accessible as it requires a
connection to the server which isn't allowed.

Therefore, reporting the checkpoint progress in the server logs, much
like [1], seems to be the best way IMO. We can 1) either make
ereport_startup_progress and log_startup_progress_interval more
generic (something like ereport_log_progress and
log_progress_interval),  move the code to elog.c, use it for
checkpoint progress and if required for other time-consuming
operations 2) or have an entirely different GUC and API for checkpoint
progress.

IMO, option (1) i.e. ereport_log_progress and log_progress_interval
(better names are welcome) seems a better idea.

Thoughts?

[1]
commit 9ce346eabf350a130bba46be3f8c50ba28506969
Author: Robert Haas 
Date:   Mon Oct 25 11:51:57 2021 -0400

Report progress of startup operations that take a long time.

Regards,
Bharath Rupireddy.




Re: Report checkpoint progress in server logs

2021-12-29 Thread Magnus Hagander
On Wed, Dec 29, 2021 at 3:31 PM Bharath Rupireddy
 wrote:
>
> Hi,
>
> At times, some of the checkpoint operations such as removing old WAL
> files, dealing with replication snapshot or mapping files etc. may
> take a while during which the server doesn't emit any logs or
> information, the only logs emitted are LogCheckpointStart and
> LogCheckpointEnd. Many times this isn't a problem if the checkpoint is
> quicker, but there can be extreme situations which require the users
> to know what's going on with the current checkpoint.
>
> Given that the commit 9ce346ea [1] introduced a nice mechanism to
> report the long running operations of the startup process in the
> server logs, I'm thinking we can have a similar progress mechanism for
> the checkpoint as well. There's another idea suggested in a couple of
> other threads to have a pg_stat_progress_checkpoint similar to
> pg_stat_progress_analyze/vacuum/etc. But the problem with this idea is
> during the end-of-recovery or shutdown checkpoints, the
> pg_stat_progress_checkpoint view isn't accessible as it requires a
> connection to the server which isn't allowed.
>
> Therefore, reporting the checkpoint progress in the server logs, much
> like [1], seems to be the best way IMO. We can 1) either make
> ereport_startup_progress and log_startup_progress_interval more
> generic (something like ereport_log_progress and
> log_progress_interval),  move the code to elog.c, use it for
> checkpoint progress and if required for other time-consuming
> operations 2) or have an entirely different GUC and API for checkpoint
> progress.
>
> IMO, option (1) i.e. ereport_log_progress and log_progress_interval
> (better names are welcome) seems a better idea.
>
> Thoughts?

I find progress reporting in the logfile to generally be a terrible
way of doing things, and the fact that we do it for the startup
process is/should be only because we have no other choice, not because
it's the right choice.

I think the right choice to solve the *general* problem is the
mentioned pg_stat_progress_checkpoints.

We may want to *additionally* have the ability to log the progress
specifically for the special cases when we're not able to use that
view. And in those case, we can perhaps just use the existing
log_startup_progress_interval parameter for this as well -- at least
for the startup checkpoint.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/




Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary

2021-12-29 Thread Bharath Rupireddy
On Wed, Dec 29, 2021 at 7:27 PM Stephen Frost  wrote:
> > On Thu, Dec 23, 2021, at 9:58 AM, Bharath Rupireddy wrote:
> > > pg_archivecleanup currently takes a WAL file name as input to delete
> > > the WAL files prior to it [1]. As suggested by Satya (cc-ed) in
> > > pg_replslotdata thread [2], can we enhance the pg_archivecleanup to
> > > automatically detect the last checkpoint (from control file) LSN,
> > > calculate the lowest restart_lsn required by the replication slots, if
> > > any (by reading the replication slot info from pg_logical directory),
> > > archive the unneeded (an archive_command similar to that of the one
> > > provided in the server config can be provided as an input) WAL files
> > > before finally deleting them? Making pg_archivecleanup tool as an
> > > end-to-end solution will help greatly in disk full situations because
> > > of WAL files growth (inactive replication slots, archive command
> > > failures, infrequent checkpoint etc.).
>
> The overall idea of having a tool for this isn't a bad idea, but ..
>
> > pg_archivecleanup is a tool to remove WAL files from the *archive*. Are you
> > suggesting to use it for removing files from pg_wal directory too? No, 
> > thanks.
>
> We definitely shouldn't have it be part of pg_archivecleanup for the
> simple reason that it'll be really confusing and almost certainly will
> be mis-used.

+1

> > WAL files are a key component for backup and replication. Hence, you cannot
> > deliberately allow a tool to remove WAL files from PGDATA. IMO this issue
> > wouldn't occur if you have a monitoring system and alerts and someone to 
> > keep
> > an eye on it. If the disk full situation was caused by a failed archive 
> > command
> > or a disconnected standby, it is easy to figure out; the fix is simple.
>
> This is perhaps a bit far- PG does, in fact, remove WAL files from
> PGDATA.  Having a tool which will do this safely when the server isn't
> able to be brought online due to lack of disk space would certainly be
> helpful rather frequently.  I agree that monitoring and alerting are
> things that everyone should implement and pay attention to, but that
> doesn't happen and instead people end up just blowing away pg_wal and
> corrupting their database when, had a tool existed, they could have
> avoided that happening and brought the system back online in relatively
> short order without any data loss.

Thanks. Yes, the end-to-end tool is helpful in rather eventual
situations and having it in the core is more helpful instead of every
postgres vendor developing their own solution and many times it's hard
to get it right. Also, I agree to not club this idea with
pg_archviecleanup. How about having a new tool like
pg_walcleanup/pg_xlogcleanup helping the developers/admins/users in
eventual situations?

Regards,
Bharath Rupireddy.




Re: [PATCH] allow src/tools/msvc/*.bat files to be called from the root of the source tree

2021-12-29 Thread Andrew Dunstan


On 12/29/21 05:16, Anton Voloshin wrote:
> Hello,
>
> currently, on Windows/MSVC, src\tools\msvc\*.bat files mostly require
> being in that src\tools\msvc directory first.
>
> I suggest an obvious fix:
[...]

> This patch uses standard windows cmd's %~dp0 to get the complete path
> (drive, "d", and path, "p") of the currently executing .bat file to
> get proper path of a .pl file to execute. I find the following link
> useful whenever I need to remember details on cmd's %-substitution
> rules: https://ss64.com/nt/syntax-args.html
>
> With this change, one can call those .bat files, e.g.
> src\tools\msvc\build.bat, without leaving the root of the source tree.
>
> Not sure if similar change should be applied to pgflex.bat and
> pgbison.bat -- never used them on Windows and they seem to require
> being called from the root, but perhaps they deserve a similar change.
>
> If accepted, do you think this change is worthy of back-porting?
>
> Please advise if you think this change is a beneficial one.
>
> P.S. Yes, I am aware of very probable upcoming move to meson, but
> until then this little patch really helps me whenever I have to deal
> with Windows and MSVC from the command line. Besides, it could help
> old branches as well.
>


Seems reasonable. I don't see any reason not to do it for pgbison.bat
and pgflex.bat, just for the sake of consistency.


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Foreign key joins revisited

2021-12-29 Thread Andrew Dunstan


On 12/28/21 15:10, Tom Lane wrote:
> Vik Fearing  writes:
>> On 12/28/21 8:26 PM, Joel Jacobson wrote:
>>> Can with think of some other suitable reserved keyword?
>> I don't particularly like this whole idea anyway, but if we're going to
>> have it, I would suggest
>> JOIN ... USING KEY ...
> That would read well, which is nice, but I wonder if it wouldn't induce
> confusion.  You'd have to explain that it didn't work like standard
> USING in the sense of merging the join-key columns.
>
> ... unless, of course, we wanted to make it do so.  Would that
> be sane?  Which name (referenced or referencing column) would
> the merged column have?
>
>   



I agree this would cause confusion. I think your earlier suggestion of


   JOIN ... FOREIGN KEY ...


seems reasonable.


cheers


andrew


--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: Foreign key joins revisited

2021-12-29 Thread Tom Lane
Peter Eisentraut  writes:
> In the 1990s, there were some SQL drafts that included syntax like
> JOIN ... USING PRIMARY KEY | USING FOREIGN KEY | USING CONSTRAINT ...
> AFAICT, these ideas just faded away because of other priorities, so if 
> someone wants to revive it, some work already exists.

Interesting!  One thing that bothered me about this whole line of
discussion is that we could get blindsided in future by the SQL
committee standardizing the same idea with slightly different
syntax/semantics.  I think borrowing this draft text would greatly
improve the odds of that not happening.  Do you have access to
full details?

regards, tom lane




Re: Report checkpoint progress in server logs

2021-12-29 Thread Tom Lane
Magnus Hagander  writes:
>> Therefore, reporting the checkpoint progress in the server logs, much
>> like [1], seems to be the best way IMO.

> I find progress reporting in the logfile to generally be a terrible
> way of doing things, and the fact that we do it for the startup
> process is/should be only because we have no other choice, not because
> it's the right choice.

I'm already pretty seriously unhappy about the log-spamming effects of
64da07c41 (default to log_checkpoints=on), and am willing to lay a side
bet that that gets reverted after we have some field experience with it.
This proposal seems far worse from that standpoint.  Keep in mind that
our out-of-the-box logging configuration still doesn't have any log
rotation ability, which means that the noisier the server is in normal
operation, the sooner you fill your disk.

> I think the right choice to solve the *general* problem is the
> mentioned pg_stat_progress_checkpoints.

+1

regards, tom lane




Re: PublicationActions - use bit flags.

2021-12-29 Thread Justin Pryzby
On Mon, Dec 20, 2021 at 11:18:41AM +1100, Peter Smith wrote:
> For some reason the current HEAD PublicationActions is a struct of
> boolean representing combinations of the 4 different "publication
> actions".
> 
> I felt it is more natural to implement boolean flag combinations using
> a bitmask instead of a struct of bools. IMO using the bitmask also
> simplifies assignment and checking of said flags.

Peter E made a suggestion to use a similar struct with bools last year for
REINDEX.
https://www.postgresql.org/message-id/7ec67c56-2377-cd05-51a0-691104404...@enterprisedb.com

Alvaro pointed out that the integer flags are better for ABI compatibility - it
would allow adding a new flag in backbranches, if that were needed for a
bugfix.

But that's not very compelling, since the bools have existed in v10...
Also, the booleans directly correspond with the catalog.

+   if (pubform->pubinsert) pub->pubactions |= PUBACTION_INSERT;

This is usually written like:

pub->pubactions |= (pubform->pubinsert ? PUBACTION_INSERT : 0)

-- 
Justin




Per-table storage parameters for TableAM/IndexAM extensions

2021-12-29 Thread Sadhuprasad Patro
Hi,

Currently all the storage options for a table are very much specific
to the heap but a different AM might need some user defined AM
specific parameters to help tune the AM. So here is a patch which
provides an AM level routine so that instead of getting parameters
validated using “heap_reloptions” it will call the registered AM
routine.

e.g:
-- create a new access method and table using this access method
CREATE ACCESS METHOD myam TYPE TABLE HANDLER ;

CREATE TABLE mytest (a int) USING myam ;

--a new parameter is to set storage parameter for only myam as below
ALTER TABLE mytest(squargle_size = '100MB');

The user-defined parameters will have meaning only for the "myam",
otherwise error will be thrown. Our relcache already allows the
AM-specific cache to be stored for each relation.

Open Question: When a user changes AM, then what should be the
behavior for not supported storage options? Should we drop the options
and go with only system storage options?
Or throw an error, in which case the user has to clean the added parameters.

Thanks & Regards
SadhuPrasad
http://www.EnterpriseDB.com/


v1-0001-PATCH-v1-Per-table-storage-parameters-for-TableAM.patch
Description: Binary data


Re: Add index scan progress to pg_stat_progress_vacuum

2021-12-29 Thread Justin Pryzby
http://cfbot.cputube.org/sami-imseih.html
You should run "make check" and update rules.out.

You should also use make check-world - usually something like:
make check-world -j4 >check-world.out 2>&1 ; echo ret $?

> indrelid: The relid of the index currently being vacuumed

I think it should be called indexrelid not indrelid, for consistency with
pg_index.

> S.param10 vacuum_cycle_ordinal_position,
> S.param13 index_rows_vacuumed

These should both say "AS" for consistency.

system_views.sql is using tabs, but should use spaces for consistency.

> #include "commands/progress.h"

The postgres convention is to alphabetize the includes.

>   /* VACCUM operation's longest index scan cycle */

VACCUM => VACUUM

Ultimately you'll also need to update the docs.




Re: Report checkpoint progress in server logs

2021-12-29 Thread SATYANARAYANA NARLAPURAM
  Coincidentally, I was thinking about the same yesterday after tired of
waiting for the checkpoint completion on a server.

On Wed, Dec 29, 2021 at 7:41 AM Tom Lane  wrote:

> Magnus Hagander  writes:
> >> Therefore, reporting the checkpoint progress in the server logs, much
> >> like [1], seems to be the best way IMO.
>
> > I find progress reporting in the logfile to generally be a terrible
> > way of doing things, and the fact that we do it for the startup
> > process is/should be only because we have no other choice, not because
> > it's the right choice.
>
> I'm already pretty seriously unhappy about the log-spamming effects of
> 64da07c41 (default to log_checkpoints=on), and am willing to lay a side
> bet that that gets reverted after we have some field experience with it.
> This proposal seems far worse from that standpoint.  Keep in mind that
> our out-of-the-box logging configuration still doesn't have any log
> rotation ability, which means that the noisier the server is in normal
> operation, the sooner you fill your disk.
>

Server is not open up for the queries while running the end of recovery
checkpoint and a catalog view may not help here but the process title
change or logging would be helpful in such cases. When the server is
running the recovery, anxious customers ask several times the ETA for
recovery completion, and not having visibility into these operations makes
life difficult for the DBA/operations.


>
> > I think the right choice to solve the *general* problem is the
> > mentioned pg_stat_progress_checkpoints.
>
> +1
>

+1 to this. We need at least a trace of the number of buffers to sync
(num_to_scan) before the checkpoint start, instead of just emitting the
stats at the end.


Bharat, it would be good to show the buffers synced counter and the total
buffers to sync, checkpointer pid, substep it is running, whether it is on
target for completion, checkpoint_Reason (manual/times/forced). BufferSync
has several variables tracking the sync progress locally, and we may need
some refactoring here.


>
> regards, tom lane
>
>
>


Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

2021-12-29 Thread SATYANARAYANA NARLAPURAM
Stephen, thank you!

On Wed, Dec 29, 2021 at 5:46 AM Stephen Frost  wrote:

> Greetings,
>
> * SATYANARAYANA NARLAPURAM (satyanarlapu...@gmail.com) wrote:
> > On Sat, Dec 25, 2021 at 9:25 PM Dilip Kumar 
> wrote:
> > > On Sun, Dec 26, 2021 at 10:36 AM SATYANARAYANA NARLAPURAM <
> > > satyanarlapu...@gmail.com> wrote:
> > >>> Actually all the WAL insertions are done under a critical section
> > >>> (except few exceptions), that means if you see all the references of
> > >>> XLogInsert(), it is always called under the critical section and
> that is my
> > >>> main worry about hooking at XLogInsert level.
> > >>>
> > >>
> > >> Got it, understood the concern. But can we document the limitations of
> > >> the hook and let the hook take care of it? I don't expect an error to
> be
> > >> thrown here since we are not planning to allocate memory or make file
> > >> system calls but instead look at the shared memory state and add
> delays
> > >> when required.
> > >>
> > >>
> > > Yet another problem is that if we are in XlogInsert() that means we are
> > > holding the buffer locks on all the pages we have modified, so if we
> add a
> > > hook at that level which can make it wait then we would also block any
> of
> > > the read operations needed to read from those buffers.  I haven't
> thought
> > > what could be better way to do this but this is certainly not good.
> > >
> >
> > Yes, this is a problem. The other approach is adding a hook at
> > XLogWrite/XLogFlush? All the other backends will be waiting behind the
> > WALWriteLock. The process that is performing the write enters into a busy
> > loop with small delays until the criteria are met. Inability to process
> the
> > interrupts inside the critical section is a challenge in both approaches.
> > Any other thoughts?
>
> Why not have this work the exact same way sync replicas do, except that
> it's based off of some byte/time lag for some set of async replicas?
> That is, in RecordTransactionCommit(), perhaps right after the
> SyncRepWaitForLSN() call, or maybe even add this to that function?  Sure
> seems like there's a lot of similarity.
>

I was thinking of achieving log governance (throttling WAL MB/sec) and also
providing RPO guarantees. In this model, it is hard to throttle WAL
generation of a long running transaction (for example copy/select into).
However, this meets my RPO needs. Are you in support of adding a hook or
the actual change? IMHO, the hook allows more creative options. I can go
ahead and make a patch accordingly.




> Thanks,
>
> Stephen
>


Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

2021-12-29 Thread Stephen Frost
Greetings,

On Wed, Dec 29, 2021 at 14:04 SATYANARAYANA NARLAPURAM <
satyanarlapu...@gmail.com> wrote:

> Stephen, thank you!
>
> On Wed, Dec 29, 2021 at 5:46 AM Stephen Frost  wrote:
>
>> Greetings,
>>
>> * SATYANARAYANA NARLAPURAM (satyanarlapu...@gmail.com) wrote:
>> > On Sat, Dec 25, 2021 at 9:25 PM Dilip Kumar 
>> wrote:
>> > > On Sun, Dec 26, 2021 at 10:36 AM SATYANARAYANA NARLAPURAM <
>> > > satyanarlapu...@gmail.com> wrote:
>> > >>> Actually all the WAL insertions are done under a critical section
>> > >>> (except few exceptions), that means if you see all the references of
>> > >>> XLogInsert(), it is always called under the critical section and
>> that is my
>> > >>> main worry about hooking at XLogInsert level.
>> > >>>
>> > >>
>> > >> Got it, understood the concern. But can we document the limitations
>> of
>> > >> the hook and let the hook take care of it? I don't expect an error
>> to be
>> > >> thrown here since we are not planning to allocate memory or make file
>> > >> system calls but instead look at the shared memory state and add
>> delays
>> > >> when required.
>> > >>
>> > >>
>> > > Yet another problem is that if we are in XlogInsert() that means we
>> are
>> > > holding the buffer locks on all the pages we have modified, so if we
>> add a
>> > > hook at that level which can make it wait then we would also block
>> any of
>> > > the read operations needed to read from those buffers.  I haven't
>> thought
>> > > what could be better way to do this but this is certainly not good.
>> > >
>> >
>> > Yes, this is a problem. The other approach is adding a hook at
>> > XLogWrite/XLogFlush? All the other backends will be waiting behind the
>> > WALWriteLock. The process that is performing the write enters into a
>> busy
>> > loop with small delays until the criteria are met. Inability to process
>> the
>> > interrupts inside the critical section is a challenge in both
>> approaches.
>> > Any other thoughts?
>>
>> Why not have this work the exact same way sync replicas do, except that
>> it's based off of some byte/time lag for some set of async replicas?
>> That is, in RecordTransactionCommit(), perhaps right after the
>> SyncRepWaitForLSN() call, or maybe even add this to that function?  Sure
>> seems like there's a lot of similarity.
>>
>
> I was thinking of achieving log governance (throttling WAL MB/sec) and
> also providing RPO guarantees. In this model, it is hard to throttle WAL
> generation of a long running transaction (for example copy/select into).
>

Long running transactions have a lot of downsides and are best discouraged.
I don’t know that we should be designing this for that case specifically,
particularly given the complications it would introduce as discussed on
this thread already.

However, this meets my RPO needs. Are you in support of adding a hook or
> the actual change? IMHO, the hook allows more creative options. I can go
> ahead and make a patch accordingly.
>

I would think this would make more sense as part of core rather than a
hook, as that then requires an extension and additional setup to get going,
which raises the bar quite a bit when it comes to actually being used.

Thanks,

Stephen

>


Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

2021-12-29 Thread SATYANARAYANA NARLAPURAM
On Wed, Dec 29, 2021 at 11:16 AM Stephen Frost  wrote:

> Greetings,
>
> On Wed, Dec 29, 2021 at 14:04 SATYANARAYANA NARLAPURAM <
> satyanarlapu...@gmail.com> wrote:
>
>> Stephen, thank you!
>>
>> On Wed, Dec 29, 2021 at 5:46 AM Stephen Frost  wrote:
>>
>>> Greetings,
>>>
>>> * SATYANARAYANA NARLAPURAM (satyanarlapu...@gmail.com) wrote:
>>> > On Sat, Dec 25, 2021 at 9:25 PM Dilip Kumar 
>>> wrote:
>>> > > On Sun, Dec 26, 2021 at 10:36 AM SATYANARAYANA NARLAPURAM <
>>> > > satyanarlapu...@gmail.com> wrote:
>>> > >>> Actually all the WAL insertions are done under a critical section
>>> > >>> (except few exceptions), that means if you see all the references
>>> of
>>> > >>> XLogInsert(), it is always called under the critical section and
>>> that is my
>>> > >>> main worry about hooking at XLogInsert level.
>>> > >>>
>>> > >>
>>> > >> Got it, understood the concern. But can we document the limitations
>>> of
>>> > >> the hook and let the hook take care of it? I don't expect an error
>>> to be
>>> > >> thrown here since we are not planning to allocate memory or make
>>> file
>>> > >> system calls but instead look at the shared memory state and add
>>> delays
>>> > >> when required.
>>> > >>
>>> > >>
>>> > > Yet another problem is that if we are in XlogInsert() that means we
>>> are
>>> > > holding the buffer locks on all the pages we have modified, so if we
>>> add a
>>> > > hook at that level which can make it wait then we would also block
>>> any of
>>> > > the read operations needed to read from those buffers.  I haven't
>>> thought
>>> > > what could be better way to do this but this is certainly not good.
>>> > >
>>> >
>>> > Yes, this is a problem. The other approach is adding a hook at
>>> > XLogWrite/XLogFlush? All the other backends will be waiting behind the
>>> > WALWriteLock. The process that is performing the write enters into a
>>> busy
>>> > loop with small delays until the criteria are met. Inability to
>>> process the
>>> > interrupts inside the critical section is a challenge in both
>>> approaches.
>>> > Any other thoughts?
>>>
>>> Why not have this work the exact same way sync replicas do, except that
>>> it's based off of some byte/time lag for some set of async replicas?
>>> That is, in RecordTransactionCommit(), perhaps right after the
>>> SyncRepWaitForLSN() call, or maybe even add this to that function?  Sure
>>> seems like there's a lot of similarity.
>>>
>>
>> I was thinking of achieving log governance (throttling WAL MB/sec) and
>> also providing RPO guarantees. In this model, it is hard to throttle WAL
>> generation of a long running transaction (for example copy/select into).
>>
>
> Long running transactions have a lot of downsides and are best
> discouraged. I don’t know that we should be designing this for that case
> specifically, particularly given the complications it would introduce as
> discussed on this thread already.
>
> However, this meets my RPO needs. Are you in support of adding a hook or
>> the actual change? IMHO, the hook allows more creative options. I can go
>> ahead and make a patch accordingly.
>>
>
> I would think this would make more sense as part of core rather than a
> hook, as that then requires an extension and additional setup to get going,
> which raises the bar quite a bit when it comes to actually being used.
>

Sounds good, I will work on making the changes accordingly.

>
> Thanks,
>
> Stephen
>
>>


Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

2021-12-29 Thread Andres Freund
Hi,

On 2021-12-27 16:40:28 -0800, SATYANARAYANA NARLAPURAM wrote:
> > Yet another problem is that if we are in XlogInsert() that means we are
> > holding the buffer locks on all the pages we have modified, so if we add a
> > hook at that level which can make it wait then we would also block any of
> > the read operations needed to read from those buffers.  I haven't thought
> > what could be better way to do this but this is certainly not good.
> >
> 
> Yes, this is a problem. The other approach is adding a hook at
> XLogWrite/XLogFlush?

That's pretty much the same - XLogInsert() can trigger an
XLogWrite()/Flush().

I think it's a complete no-go to add throttling to these places. It's quite
possible that it'd cause new deadlocks, and it's almost guaranteed to have
unintended consequences (e.g. replication falling back further because
XLogFlush() is being throttled).

I also don't think it's a sane thing to add hooks to these places. It's
complicated enough as-is, adding the chance for random other things to happen
during such crucial operations will make it even harder to maintain.

Greetings,

Andres Freund




Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

2021-12-29 Thread SATYANARAYANA NARLAPURAM
On Wed, Dec 29, 2021 at 11:31 AM Andres Freund  wrote:

> Hi,
>
> On 2021-12-27 16:40:28 -0800, SATYANARAYANA NARLAPURAM wrote:
> > > Yet another problem is that if we are in XlogInsert() that means we are
> > > holding the buffer locks on all the pages we have modified, so if we
> add a
> > > hook at that level which can make it wait then we would also block any
> of
> > > the read operations needed to read from those buffers.  I haven't
> thought
> > > what could be better way to do this but this is certainly not good.
> > >
> >
> > Yes, this is a problem. The other approach is adding a hook at
> > XLogWrite/XLogFlush?
>
> That's pretty much the same - XLogInsert() can trigger an
> XLogWrite()/Flush().
>
> I think it's a complete no-go to add throttling to these places. It's quite
> possible that it'd cause new deadlocks, and it's almost guaranteed to have
> unintended consequences (e.g. replication falling back further because
> XLogFlush() is being throttled).
>
> I also don't think it's a sane thing to add hooks to these places. It's
> complicated enough as-is, adding the chance for random other things to
> happen
> during such crucial operations will make it even harder to maintain.
>

Andres, thanks for the comments. Agreed on this based on the previous
discussions on this thread. Could you please share your thoughts on adding
it after SyncRepWaitForLSN()?


>
> Greetings,
>
> Andres Freund
>


Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

2021-12-29 Thread Andres Freund
Hi,

On 2021-12-29 11:34:53 -0800, SATYANARAYANA NARLAPURAM wrote:
> On Wed, Dec 29, 2021 at 11:31 AM Andres Freund  wrote:
> Andres, thanks for the comments. Agreed on this based on the previous
> discussions on this thread. Could you please share your thoughts on adding
> it after SyncRepWaitForLSN()?

I don't think that's good either - you're delaying transaction commit
(i.e. xact becoming visible / locks being released). That also has the danger
of increasing lock contention (albeit more likely to be heavyweight locks /
serializable state). It'd have to be after the transaction actually committed.

Greetings,

Andres Freund




Re: WIP: WAL prefetch (another approach)

2021-12-29 Thread Andres Freund
Hi,

On 2021-12-29 17:29:52 +1300, Thomas Munro wrote:
> > FWIW I don't think we include updates to typedefs.list in patches.
> 
> Seems pretty harmless? And useful to keep around in development
> branches because I like to pgindent stuff...

I think it's even helpful. As long as it's done with a bit of manual
oversight, I don't see a meaningful downside of doing so. One needs to be
careful to not remove platform dependant typedefs, but that's it. And
especially for long-lived feature branches it's much less work to keep the
typedefs.list changes in the tree, rather than coming up with them locally
over and over / across multiple people working on a branch.

Greetings,

Andres Freund




Re: Adding CI to our tree

2021-12-29 Thread Andres Freund
Hi,

On 2021-12-20 11:21:05 -0800, Andres Freund wrote:
> Attached is v4 of the CI patch.

I'd like to push this - any objections? It's not disruptive to anything but
cfbot, so we can incrementally improve it further.

I'll try to sync pushing with Thomas, so that he can adjust cfbot to not add
the CI changes anymore / adjust the links to the CI status URLs etc.

Greetings,

Andres Freund




Strange path from pgarch_readyXlog()

2021-12-29 Thread Thomas Munro
Hi,

Isn't this a corrupted pathname?

2021-12-29 03:39:55.708 CST [79851:1] WARNING:  removal of orphan
archive status file
"pg_wal/archive_status/00010003.0028.backup00010004.ready"
failed too many times, will try again later

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus&dt=2021-12-29%2009%3A20%3A54




Re: Add Boolean node

2021-12-29 Thread Andres Freund
On 2021-12-27 09:53:32 -0500, Tom Lane wrote:
> Didn't really read the patch in any detail, but I did have one idea:
> I think that the different things-that-used-to-be-Value-nodes ought to
> use different field names, say ival, rval, bval, sval not just "val".
> That makes it more likely that you'd catch any code that is doing the
> wrong thing and not going through one of the access macros.

If we go around changing all these places, it might be worth to also change
Integer to be a int64 instead of an int.




Re: Add Boolean node

2021-12-29 Thread Tom Lane
Andres Freund  writes:
> If we go around changing all these places, it might be worth to also change
> Integer to be a int64 instead of an int.

Meh ... that would have some non-obvious consequences, I think,
at least if you tried to make the grammar make use of the extra
width (it'd change the type resolution behavior for integer-ish
literals).  I think it's better to keep it as plain int.

regards, tom lane




Re: Add Boolean node

2021-12-29 Thread Andres Freund
Hi,

On 2021-12-27 10:02:14 +0100, Peter Eisentraut wrote:
> This patch adds a new node type Boolean, to go alongside the "value" nodes
> Integer, Float, String, etc.  This seems appropriate given that Boolean
> values are a fundamental part of the system and are used a lot.
> 
> Before, SQL-level Boolean constants were represented by a string with
> a cast, and internal Boolean values in DDL commands were usually represented
> by Integer nodes.  This takes the place of both of these uses, making the
> intent clearer and having some amount of type safety.

This annoyed me plenty of times before, plus many.


> From 4e1ef56b5443fa11d981eb6e407dfc7c244dc60e Mon Sep 17 00:00:00 2001
> From: Peter Eisentraut 
> Date: Mon, 27 Dec 2021 09:52:05 +0100
> Subject: [PATCH v1] Add Boolean node
> 
> Before, SQL-level boolean constants were represented by a string with
> a cast, and internal Boolean values in DDL commands were usually
> represented by Integer nodes.  This takes the place of both of these
> uses, making the intent clearer and having some amount of type safety.
> ---
> ...
>  20 files changed, 210 insertions(+), 126 deletions(-)

This might be easier to review if there were one patch adding the Boolean
type, and then a separate one converting users?


> diff --git a/src/backend/commands/tsearchcmds.c 
> b/src/backend/commands/tsearchcmds.c
> index c47a05d10d..b7261a88d4 100644
> --- a/src/backend/commands/tsearchcmds.c
> +++ b/src/backend/commands/tsearchcmds.c
> @@ -1742,6 +1742,15 @@ buildDefItem(const char *name, const char *val, bool 
> was_quoted)
>   return makeDefElem(pstrdup(name),
>  (Node *) 
> makeFloat(pstrdup(val)),
>  -1);
> +
> + if (strcmp(val, "true") == 0)
> + return makeDefElem(pstrdup(name),
> +(Node *) 
> makeBoolean(true),
> +-1);
> + if (strcmp(val, "false") == 0)
> + return makeDefElem(pstrdup(name),
> +(Node *) 
> makeBoolean(false),
> +-1);
>   }
>   /* Just make it a string */
>   return makeDefElem(pstrdup(name),

Hm. defGetBoolean() interprets "true", "false", "on", "off" as booleans. ISTM
we shouldn't invent different behaviours for individual subsystems?


> --- a/src/backend/nodes/outfuncs.c
> +++ b/src/backend/nodes/outfuncs.c
> @@ -3433,6 +3433,12 @@ _outFloat(StringInfo str, const Float *node)
>   appendStringInfoString(str, node->val);
>  }
>  
> +static void
> +_outBoolean(StringInfo str, const Boolean *node)
> +{
> + appendStringInfoString(str, node->val ? "true" : "false");
> +}

Any reason not to use 't' and 'f' instead? It seems unnecessary to bloat the
node output by the longer strings, and it makes parsing more expensive
too:

> --- a/src/backend/nodes/read.c
> +++ b/src/backend/nodes/read.c
> @@ -283,6 +283,8 @@ nodeTokenType(const char *token, int length)
>   retval = RIGHT_PAREN;
>   else if (*token == '{')
>   retval = LEFT_BRACE;
> + else if (strcmp(token, "true") == 0 || strcmp(token, "false") == 0)
> + retval = T_Boolean;
>   else if (*token == '"' && length > 1 && token[length - 1] == '"')
>   retval = T_String;
>   else if (*token == 'b')

Before this could be implemented as a jump table, not now it can't easily be
anymore.


> diff --git a/src/test/isolation/expected/ri-trigger.out 
> b/src/test/isolation/expected/ri-trigger.out
> index 842df80a90..db85618bef 100644
> --- a/src/test/isolation/expected/ri-trigger.out
> +++ b/src/test/isolation/expected/ri-trigger.out
> @@ -4,9 +4,9 @@ starting permutation: wxry1 c1 r2 wyrx2 c2
>  step wxry1: INSERT INTO child (parent_id) VALUES (0);
>  step c1: COMMIT;
>  step r2: SELECT TRUE;
> -bool
> -
> -t   
> +?column?
> +
> +t   
>  (1 row)

This doesn't seem great. It might be more consistent ("SELECT 1" doesn't end
up with 'integer' as column name), but this still seems like an unnecessarily
large user-visible change for an internal data-representation change?

Greetings,

Andres Freund




Re: Strange path from pgarch_readyXlog()

2021-12-29 Thread Bossart, Nathan
On 12/29/21, 12:22 PM, "Thomas Munro"  wrote:
> Isn't this a corrupted pathname?
>
> 2021-12-29 03:39:55.708 CST [79851:1] WARNING:  removal of orphan
> archive status file
> "pg_wal/archive_status/00010003.0028.backup00010004.ready"
> failed too many times, will try again later

I bet this was a simple mistake in beb4e9b.

Nathan

diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c
index 434939be9b..b5b0d4e12f 100644
--- a/src/backend/postmaster/pgarch.c
+++ b/src/backend/postmaster/pgarch.c
@@ -113,7 +113,7 @@ static PgArchData *PgArch = NULL;
  * is empty, at which point another directory scan must be performed.
  */
 static binaryheap *arch_heap = NULL;
-static char arch_filenames[NUM_FILES_PER_DIRECTORY_SCAN][MAX_XFN_CHARS];
+static char arch_filenames[NUM_FILES_PER_DIRECTORY_SCAN][MAX_XFN_CHARS + 1];
 static char *arch_files[NUM_FILES_PER_DIRECTORY_SCAN];
 static int arch_files_size = 0;



Re: Strange path from pgarch_readyXlog()

2021-12-29 Thread Tom Lane
"Bossart, Nathan"  writes:
> I bet this was a simple mistake in beb4e9b.

> -static char arch_filenames[NUM_FILES_PER_DIRECTORY_SCAN][MAX_XFN_CHARS];
> +static char arch_filenames[NUM_FILES_PER_DIRECTORY_SCAN][MAX_XFN_CHARS + 1];

Hm, yeah, that looks like a pretty obvious bug.

While we're here, I wonder if we ought to get rid of the static-ness of
these arrays.  I realize that they're only eating a few kB, but they're
doing so in every postgres process, when they'll only be used in the
archiver.

regards, tom lane




Logging replication state changes

2021-12-29 Thread SATYANARAYANA NARLAPURAM
Hi hackers,

I noticed that below critical replication state changes are DEBUG1 level
logged. Any concern about changing the below two messages log level to LOG?
If this is too verbose, we can introduce a new GUC,
log_replication_state_changes that logs the replication state changes when
enabled irrespective of the log level.

1/

/*
 * If we're in catchup state, move to streaming.  This is an
 * important state change for users to know about, since before
 * this point data loss might occur if the primary dies and we
 * need to failover to the standby. The state change is also
 * important for synchronous replication, since commits that
 * started to wait at that point might wait for some time.
 */
if (MyWalSnd->state == WALSNDSTATE_CATCHUP)
{
ereport(DEBUG1,
(errmsg_internal("\"%s\" has now caught up with upstream server",
application_name)));
WalSndSetState(WALSNDSTATE_STREAMING);
}

2/

ereport(DEBUG1,
(errmsg_internal("standby \"%s\" now has synchronous standby priority %u",
application_name, priority)));


Thanks,
Satya


Re: Logging replication state changes

2021-12-29 Thread Tom Lane
SATYANARAYANA NARLAPURAM  writes:
> I noticed that below critical replication state changes are DEBUG1 level
> logged. Any concern about changing the below two messages log level to LOG?

Why?  These seem like perfectly routine messages.

regards, tom lane




Re: Adding CI to our tree

2021-12-29 Thread Daniel Gustafsson
> On 29 Dec 2021, at 21:17, Andres Freund  wrote:
> On 2021-12-20 11:21:05 -0800, Andres Freund wrote:

>> Attached is v4 of the CI patch.
> 
> I'd like to push this - any objections? It's not disruptive to anything but
> cfbot, so we can incrementally improve it further.

No objection, I'm +1 on getting this in.

--
Daniel Gustafsson   https://vmware.com/





Re: Logging replication state changes

2021-12-29 Thread SATYANARAYANA NARLAPURAM
On Wed, Dec 29, 2021 at 2:04 PM Tom Lane  wrote:

> SATYANARAYANA NARLAPURAM  writes:
> > I noticed that below critical replication state changes are DEBUG1 level
> > logged. Any concern about changing the below two messages log level to
> LOG?
>
> Why?  These seem like perfectly routine messages.
>

Consider a scenario where we have a primary and two sync standby (s1 and
s2) where s1 is a preferred failover target and s2 is next with
synchronous_standby_names = 'First 1 ('s1','s2')'.  In an event, s1
streaming replication is broken and reestablished because of a planned or
an unplanned event then s2 participates in the sync commits and makes sure
the writes are not stalled on the primary. I would like to know the time
window where s1 is not actively acknowledging the commits and the writes
are dependent on s2. Also if the service layer decides to failover to s2
instead of s1 because s1 is lagging I need evidence in the log to explain
the behavior.



> regards, tom lane
>


Re: Strange path from pgarch_readyXlog()

2021-12-29 Thread Tom Lane
"Bossart, Nathan"  writes:
> On 12/29/21, 1:04 PM, "Tom Lane"  wrote:
>> While we're here, I wonder if we ought to get rid of the static-ness of
>> these arrays.  I realize that they're only eating a few kB, but they're
>> doing so in every postgres process, when they'll only be used in the
>> archiver.

> This crossed my mind, too.  I also think one of the arrays can be
> eliminated in favor of just using the heap (after rebuilding with a
> reversed comparator).  Here is a minimally-tested patch that
> demonstrates what I'm thinking.  

I already pushed a patch that de-static-izes those arrays, so this
needs rebased at least.  However, now that you mention it it does
seem like maybe the intermediate arch_files[] array could be dropped
in favor of just pulling the next file from the heap.

The need to reverse the heap's sort order seems like a problem
though.  I really dislike the kluge you used here with a static flag
that inverts the comparator's sort order behind the back of the
binary-heap mechanism.  It seems quite accidental that that doesn't
fall foul of asserts or optimizations in binaryheap.c.  For
instance, if binaryheap_build decided it needn't do anything when
bh_has_heap_property is already true, this code would fail.  In any
case, we'd need to spend O(n) time inverting the heap's sort order,
so this'd likely be slower than the current code.

On the whole I'm inclined not to bother trying to optimize this
further.  The main thing that concerned me is that somebody would
bump up NUM_FILES_PER_DIRECTORY_SCAN to the point where the static
space consumption becomes really problematic, and we've fixed that.

regards, tom lane




SELECT documentation

2021-12-29 Thread Joel Jacobson
Hi,

The Examples section in the documentation for the SELECT command [1]
only contains a single example on how to join two tables,
which is written in SQL-89 style:

SELECT f.title, f.did, d.name, f.date_prod, f.kind
FROM distributors d, films f
WHERE f.did = d.did

I think it's good to keep this example query as it is,
and suggest we add the following equivalent queries:

SELECT f.title, f.did, d.name, f.date_prod, f.kind
FROM distributors d
JOIN films f ON f.did = d.did

SELECT f.title, f.did, d.name, f.date_prod, f.kind
FROM distributors d
JOIN films f USING (did)

SELECT f.title, f.did, d.name, f.date_prod, f.kind
FROM distributors d
NATURAL JOIN films f

I also think it would be an improvement to break up the from_item below into 
three separate items,
since the optional NATURAL cannot occur in combination with ON nor USING.
 
from_item [ NATURAL ] join_type from_item [ ON join_condition | USING ( 
join_column [, ...] ) [ AS join_using_alias ] ]

Suggestion:

from_item join_type from_item ON join_condition
from_item join_type from_item USING ( join_column [, ...] ) [ AS 
join_using_alias ]
from_item NATURAL join_type from_item
 
This would be more readable imo.
I picked the order ON, USING, NATURAL to match the order they are described in 
the FROM Clause section.

/Joel

[1] https://www.postgresql.org/docs/current/sql-select.html

Re: UNIQUE null treatment option

2021-12-29 Thread Zhihong Yu
Hi,

boolisunique;
+   boolnulls_not_distinct;
 } BTSpool;

Looking at the other fields in BTSpool, there is no underscore in field
name.
I think the new field can be named nullsdistinct. This way, the
double negative is avoided.

Similar comment for new fields in BTShared and BTLeader.

And the naming would be consistent with information_schema.sql where
nulls_distinct is used:

+   CAST('YES' AS yes_or_no) AS enforced,
+   CAST(NULL AS yes_or_no) AS nulls_distinct

Cheers


Re: Column Filtering in Logical Replication

2021-12-29 Thread Alvaro Herrera
On 2021-Dec-28, Alvaro Herrera wrote:

> There are still some XXX comments.  The one that bothers me most is the
> lack of an implementation that allows changing the column list in a
> publication without having to remove the table from the publication
> first.

OK, I made some progress on this front; I added new forms of ALTER
PUBLICATION to support it:

ALTER PUBLICATION pub1 ALTER TABLE tbl SET COLUMNS (a, b, c);
ALTER PUBLICATION pub1 ALTER TABLE tbl SET COLUMNS ALL;

(not wedded to this syntax; other suggestions welcome)

In order to implement it I changed the haphazardly chosen use of
DEFELEM actions to a new enum.  I also noticed that the division of
labor between pg_publication.c and publicationcmds.c is quite broken
(code to translate column names to numbers is in the former, should be
in the latter; some code that deals with pg_publication tuples is in the
latter, should be in the former, such as CreatePublication,
AlterPublicationOptions).

This new stuff is not yet finished.  For example I didn't refactor
handling of REPLICA IDENTITY, so the new command does not correctly
check everything, such as the REPLICA IDENTITY FULL stuff.  Also, no
tests have been added yet.  In manual tests it seems to behave as
expected.

I noticed that prattrs is inserted in user-specified order instead of
catalog order, which is innocuous but quite weird.

-- 
Álvaro Herrera PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"No renuncies a nada. No te aferres a nada."
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 34a7034282..5bc2e7a591 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -6877,7 +6877,9 @@ Relation
 
 
 
-Next, the following message part appears for each column (except generated columns):
+Next, the following message part appears for each column (except
+generated columns and other columns that don't appear in the column
+filter list, for tables that have one):
 
 
 
diff --git a/doc/src/sgml/ref/alter_publication.sgml b/doc/src/sgml/ref/alter_publication.sgml
index bb4ef5e5e2..4951343f6f 100644
--- a/doc/src/sgml/ref/alter_publication.sgml
+++ b/doc/src/sgml/ref/alter_publication.sgml
@@ -25,12 +25,13 @@ ALTER PUBLICATION name ADD name SET publication_object [, ...]
 ALTER PUBLICATION name DROP publication_object [, ...]
 ALTER PUBLICATION name SET ( publication_parameter [= value] [, ... ] )
+ALTER PUBLICATION name ALTER TABLE publication_object SET COLUMNS { ( name [, ...] ) | ALL
 ALTER PUBLICATION name OWNER TO { new_owner | CURRENT_ROLE | CURRENT_USER | SESSION_USER }
 ALTER PUBLICATION name RENAME TO new_name
 
 where publication_object is one of:
 
-TABLE [ ONLY ] table_name [ * ] [, ... ]
+TABLE [ ONLY ] table_name [ * ]  [ ( column_name, [, ... ] ) ] [, ... ]
 ALL TABLES IN SCHEMA { schema_name | CURRENT_SCHEMA } [, ... ]
 
  
@@ -62,6 +63,11 @@ ALTER PUBLICATION name RENAME TO 
 
+  
+   The ALTER TABLE ... SET COLUMNS variant allows to change
+   the set of columns that are included in the publication.
+  
+
   
The remaining variants change the owner and the name of the publication.
   
@@ -110,6 +116,8 @@ ALTER PUBLICATION name RENAME TO * can be specified after the table
   name to explicitly indicate that descendant tables are included.
+  Optionally, a column list can be specified.  See  for details.
  
 

@@ -164,9 +172,15 @@ ALTER PUBLICATION noinsert SET (publish = 'update, delete');
   
 
   
-   Add some tables to the publication:
+   Add tables to the publication:
 
-ALTER PUBLICATION mypublication ADD TABLE users, departments;
+ALTER PUBLICATION mypublication ADD TABLE users (user_id, firstname), departments;
+
+
+  
+   Change the set of columns published for a table:
+
+ALTER PUBLICATION mypublication ALTER TABLE users SET COLUMNS (user_id, firstname, lastname);
 
 
   
diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml
index d805e8e77a..73a23cbb02 100644
--- a/doc/src/sgml/ref/create_publication.sgml
+++ b/doc/src/sgml/ref/create_publication.sgml
@@ -28,7 +28,7 @@ CREATE PUBLICATION name
 
 where publication_object is one of:
 
-TABLE [ ONLY ] table_name [ * ] [, ... ]
+TABLE [ ONLY ] table_name [ * ] [ ( column_name, [, ... ] ) ] [, ... ]
 ALL TABLES IN SCHEMA { schema_name | CURRENT_SCHEMA } [, ... ]
 
  
@@ -78,6 +78,15 @@ CREATE PUBLICATION name
   publication, so they are never explicitly added to the publication.
  
 
+ 
+  When a column list is specified, only the listed columns are replicated;
+  any other columns are ignored for the purpose of replication through
+  this publication.  If no column list is specified, all columns of the
+  table are replicated through this publication, including any columns
+  added later.  If a column list is specified, it must include the replica
+  identity columns.
+ 
+
  
   Only persistent ba

Re: PublicationActions - use bit flags.

2021-12-29 Thread Peter Smith
On Thu, Dec 30, 2021 at 3:30 AM Justin Pryzby  wrote:
>
> On Mon, Dec 20, 2021 at 11:18:41AM +1100, Peter Smith wrote:
> > For some reason the current HEAD PublicationActions is a struct of
> > boolean representing combinations of the 4 different "publication
> > actions".
> >
> > I felt it is more natural to implement boolean flag combinations using
> > a bitmask instead of a struct of bools. IMO using the bitmask also
> > simplifies assignment and checking of said flags.
>
> Peter E made a suggestion to use a similar struct with bools last year for
> REINDEX.
> https://www.postgresql.org/message-id/7ec67c56-2377-cd05-51a0-691104404...@enterprisedb.com
>
> Alvaro pointed out that the integer flags are better for ABI compatibility - 
> it
> would allow adding a new flag in backbranches, if that were needed for a
> bugfix.
>
> But that's not very compelling, since the bools have existed in v10...
> Also, the booleans directly correspond with the catalog.
>
> +   if (pubform->pubinsert) pub->pubactions |= PUBACTION_INSERT;
>
> This is usually written like:
>
> pub->pubactions |= (pubform->pubinsert ? PUBACTION_INSERT : 0)
>

Thanks for the info, I've modified those assignment styles as suggested.

--
Kind Regards,
Peter Smith.
Fujitsu Australia.


v3-0001-PublicationActions-use-bit-flags.patch
Description: Binary data


Re: PublicationActions - use bit flags.

2021-12-29 Thread Tom Lane
Peter Smith  writes:
> On Thu, Dec 30, 2021 at 3:30 AM Justin Pryzby  wrote:
>> +   if (pubform->pubinsert) pub->pubactions |= PUBACTION_INSERT;
>> This is usually written like:
>> pub->pubactions |= (pubform->pubinsert ? PUBACTION_INSERT : 0)

> Thanks for the info, I've modified those assignment styles as suggested.

FWIW, I think it's utter nonsense to claim that the second way is
preferred over the first.  There may be some people who think
the second way is more legible, but I don't; and I'm pretty sure
that the first way is significantly more common in the PG codebase.

regards, tom lane




Tests "with" and "alter_table" suffer from name clash

2021-12-29 Thread Thomas Munro
Hi,

With unlucky scheduling you can get a failure like this:

https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&dt=2021-12-22%2010%3A51%3A32

Suggested fix attached.
From 3991f040e9c9afc4d7cfd4980b5f27f4113dbd1f Mon Sep 17 00:00:00 2001
From: Thomas Munro 
Date: Thu, 30 Dec 2021 14:43:57 +1300
Subject: [PATCH] Fix racy "with" test.

with.sql asserted that there was no table "test", but alter_table.sql
creates and drops such a table and might run at the same time.  Use a
unique name.  Per build farm.

Back-patch all the way.
---
 src/test/regress/expected/with.out | 16 
 src/test/regress/sql/with.sql  | 10 +-
 2 files changed, 13 insertions(+), 13 deletions(-)

diff --git a/src/test/regress/expected/with.out b/src/test/regress/expected/with.out
index 75e61460d9..f15ece3bd1 100644
--- a/src/test/regress/expected/with.out
+++ b/src/test/regress/expected/with.out
@@ -3149,19 +3149,19 @@ with ordinality as (select 1 as x) select * from ordinality;
 (1 row)
 
 -- check sane response to attempt to modify CTE relation
-WITH test AS (SELECT 42) INSERT INTO test VALUES (1);
-ERROR:  relation "test" does not exist
-LINE 1: WITH test AS (SELECT 42) INSERT INTO test VALUES (1);
- ^
+WITH with_test AS (SELECT 42) INSERT INTO with_test VALUES (1);
+ERROR:  relation "with_test" does not exist
+LINE 1: WITH with_test AS (SELECT 42) INSERT INTO with_test VALUES (...
+  ^
 -- check response to attempt to modify table with same name as a CTE (perhaps
 -- surprisingly it works, because CTEs don't hide tables from data-modifying
 -- statements)
-create temp table test (i int);
-with test as (select 42) insert into test select * from test;
-select * from test;
+create temp table with_test (i int);
+with with_test as (select 42) insert into with_test select * from with_test;
+select * from with_test;
  i  
 
  42
 (1 row)
 
-drop table test;
+drop table with_test;
diff --git a/src/test/regress/sql/with.sql b/src/test/regress/sql/with.sql
index 46668a903e..7ff9de97a5 100644
--- a/src/test/regress/sql/with.sql
+++ b/src/test/regress/sql/with.sql
@@ -1459,12 +1459,12 @@ create table foo (with ordinality);  -- fail, WITH is a reserved word
 with ordinality as (select 1 as x) select * from ordinality;
 
 -- check sane response to attempt to modify CTE relation
-WITH test AS (SELECT 42) INSERT INTO test VALUES (1);
+WITH with_test AS (SELECT 42) INSERT INTO with_test VALUES (1);
 
 -- check response to attempt to modify table with same name as a CTE (perhaps
 -- surprisingly it works, because CTEs don't hide tables from data-modifying
 -- statements)
-create temp table test (i int);
-with test as (select 42) insert into test select * from test;
-select * from test;
-drop table test;
+create temp table with_test (i int);
+with with_test as (select 42) insert into with_test select * from with_test;
+select * from with_test;
+drop table with_test;
-- 
2.30.2



Re: Tests "with" and "alter_table" suffer from name clash

2021-12-29 Thread Tom Lane
Thomas Munro  writes:
> With unlucky scheduling you can get a failure like this:
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&dt=2021-12-22%2010%3A51%3A32
> Suggested fix attached.

Looks reasonable.  We really should avoid using such common
names for short-lived tables in any case --- it's an invitation
to trouble.  So I'd vote for changing the other use of "test", too.

regards, tom lane




Re: Tests "with" and "alter_table" suffer from name clash

2021-12-29 Thread Thomas Munro
On Thu, Dec 30, 2021 at 3:27 PM Tom Lane  wrote:
> Looks reasonable.  We really should avoid using such common
> names for short-lived tables in any case --- it's an invitation
> to trouble.  So I'd vote for changing the other use of "test", too.

In fact only REL_10_STABLE had the problem, because commit 2cf8c7aa
already fixed the other instance in later branches.  I'd entirely
forgotten that earlier discussion, which apparently didn't quite go
far enough.   So I only needed to push the with.sql change.  Done.




Re: Tests "with" and "alter_table" suffer from name clash

2021-12-29 Thread Tom Lane
Thomas Munro  writes:
> In fact only REL_10_STABLE had the problem, because commit 2cf8c7aa
> already fixed the other instance in later branches.  I'd entirely
> forgotten that earlier discussion, which apparently didn't quite go
> far enough.   So I only needed to push the with.sql change.  Done.

Hah, I thought this felt familiar.  So the real problem is that
my backpatch (b15a8c963) only fixed half of the hazard.  Sigh.

regards, tom lane




Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

2021-12-29 Thread Dilip Kumar
On Thu, Dec 30, 2021 at 1:09 AM Andres Freund  wrote:

> Hi,
>
> On 2021-12-29 11:34:53 -0800, SATYANARAYANA NARLAPURAM wrote:
> > On Wed, Dec 29, 2021 at 11:31 AM Andres Freund 
> wrote:
> > Andres, thanks for the comments. Agreed on this based on the previous
> > discussions on this thread. Could you please share your thoughts on
> adding
> > it after SyncRepWaitForLSN()?
>
> I don't think that's good either - you're delaying transaction commit
> (i.e. xact becoming visible / locks being released).


Agree with that.


> That also has the danger
> of increasing lock contention (albeit more likely to be heavyweight locks /
> serializable state). It'd have to be after the transaction actually
> committed.
>

Yeah, I think that would make sense, even though we will be allowing a new
backend to get connected insert WAL, and get committed but after that, it
will be throttled.  However, if the number of max connections will be very
high then even after we detected a lag there a significant amount WAL could
be generated, even if we keep long-running transactions aside.  But I think
still it will serve the purpose of what Satya is trying to achieve.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


RE: Confused comment about drop replica identity index

2021-12-29 Thread houzj.f...@fujitsu.com
On Tues, Dec 21, 2021 8:47 AM Michael Paquier  wrote:
> On Mon, Dec 20, 2021 at 11:57:32AM -0300, Euler Taveira wrote:
> > What do you think about the attached patch? It forbids the DROP INDEX.
> > We might add a detail message but I didn't in this patch.
> 
> Yeah.  I'd agree about doing something like that on HEAD, and that would help
> with some of the logirep-related patch currently being worked on, as far as I
> understood.

Hi,

I think forbids DROP INDEX might not completely solve this problem. Because
user could still use other command to delete the index, for example: ALTER
TABLE DROP COLUMN. After dropping the column, the index on it will also be
dropped.

Besides, user can also ALTER REPLICA IDENTITY USING INDEX "primary key", and in
this case, when they ALTER TABLE DROP CONSTR "PRIMARY KEY", the replica
identity index will also be dropped.

Best regards,
Hou zj





Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

2021-12-29 Thread SATYANARAYANA NARLAPURAM
On Wed, Dec 29, 2021 at 10:38 PM Dilip Kumar  wrote:

> On Thu, Dec 30, 2021 at 1:09 AM Andres Freund  wrote:
>
>> Hi,
>>
>> On 2021-12-29 11:34:53 -0800, SATYANARAYANA NARLAPURAM wrote:
>> > On Wed, Dec 29, 2021 at 11:31 AM Andres Freund 
>> wrote:
>> > Andres, thanks for the comments. Agreed on this based on the previous
>> > discussions on this thread. Could you please share your thoughts on
>> adding
>> > it after SyncRepWaitForLSN()?
>>
>> I don't think that's good either - you're delaying transaction commit
>> (i.e. xact becoming visible / locks being released).
>
>
> Agree with that.
>
>
>> That also has the danger
>> of increasing lock contention (albeit more likely to be heavyweight locks
>> /
>> serializable state). It'd have to be after the transaction actually
>> committed.
>>
>
> Yeah, I think that would make sense, even though we will be allowing a new
> backend to get connected insert WAL, and get committed but after that, it
> will be throttled.  However, if the number of max connections will be very
> high then even after we detected a lag there a significant amount WAL could
> be generated, even if we keep long-running transactions aside.  But I think
> still it will serve the purpose of what Satya is trying to achieve.
>

I am afraid there are problems with making the RPO check post releasing the
locks. By this time the transaction is committed and visible to the other
backends (ProcArrayEndTransaction is already called) though the intention
is to block committing transactions that violate the defined RPO. Even
though we block existing connections starting a new transaction, it is
possible to do writes by opening a new connection / canceling the query. I
am not too much worried about the lock contention as the system is already
hosed because of the policy. This behavior is very similar to what
happens when the Sync standby is not responding. Thoughts?




>
> --
> Regards,
> Dilip Kumar
> EnterpriseDB: http://www.enterprisedb.com
>


Re: Per-table storage parameters for TableAM/IndexAM extensions

2021-12-29 Thread Dilip Kumar
On Wed, Dec 29, 2021 at 10:38 PM Sadhuprasad Patro 
wrote:

> Hi,
>
> Currently all the storage options for a table are very much specific
> to the heap but a different AM might need some user defined AM
> specific parameters to help tune the AM. So here is a patch which
> provides an AM level routine so that instead of getting parameters
> validated using “heap_reloptions” it will call the registered AM
> routine.
>

+1 for the idea.


>
> e.g:
> -- create a new access method and table using this access method
> CREATE ACCESS METHOD myam TYPE TABLE HANDLER ;
>
> CREATE TABLE mytest (a int) USING myam ;
>
> --a new parameter is to set storage parameter for only myam as below
> ALTER TABLE mytest(squargle_size = '100MB');
>

This will work for CREATE TABLE as well I guess as normal relation storage
parameter works now right?


> The user-defined parameters will have meaning only for the "myam",
> otherwise error will be thrown. Our relcache already allows the
> AM-specific cache to be stored for each relation.
>
> Open Question: When a user changes AM, then what should be the
> behavior for not supported storage options? Should we drop the options
> and go with only system storage options?
> Or throw an error, in which case the user has to clean the added
> parameters.
>

IMHO, if the user is changing the access method for the table then it
should be fine to throw an error if there are some parameters which are not
supported by the new AM.  So that user can take a calculative call and
first remove those storage options before changing the AM.

I have a few comments on the patch, mostly cosmetics.

1.
+ Assert(routine->taboptions != NULL);

Why AM is not allowed to register the NULL function, if NULL is registered
that means the AM
does not support any of the storage parameters.
2.
@@ -1358,6 +1358,7 @@ untransformRelOptions(Datum options)
  return result;
 }

+
 /*
  * Extract and parse reloptions from a pg_class tuple.
  *

Unwanted hunk (added extra line)

3.
+ * Parse options for heaps, views and toast tables. This is
+ * implementation of relOptions for access method heapam.
  */

Better to say access method heap instead of heapam.
4.
+ * Parse options for tables.
+ *
+ * taboptions tables AM's option parser function
+ *  reloptions options as text[] datum
+ *  validate error flag

Function header comment formatting is not proper, it also has uneven
spacing between words.
5.
-extract_autovac_opts(HeapTuple tup, TupleDesc pg_class_desc)
+extract_autovac_opts(HeapTuple tup, TupleDesc pg_class_desc,
+ reloptions_function taboptions)

Indentation is not proper, run pgindent on this.

5.
>Currently all the storage options for a table are very much specific to
the heap but a different AM might need some user defined AM specific
parameters to help tune the AM. So here is a patch which provides an AM
level routine so that instead of getting >parameters validated using
“heap_reloptions” it will call the registered AM routine.

Wrap these long commit message lines at 80 characters.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com


Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes

2021-12-29 Thread Dilip Kumar
On Thu, Dec 30, 2021 at 12:36 PM SATYANARAYANA NARLAPURAM <
satyanarlapu...@gmail.com> wrote:

>
>> Yeah, I think that would make sense, even though we will be allowing a
>> new backend to get connected insert WAL, and get committed but after that,
>> it will be throttled.  However, if the number of max connections will be
>> very high then even after we detected a lag there a significant amount WAL
>> could be generated, even if we keep long-running transactions aside.  But I
>> think still it will serve the purpose of what Satya is trying to achieve.
>>
>
> I am afraid there are problems with making the RPO check post releasing
> the locks. By this time the transaction is committed and visible to the
> other backends (ProcArrayEndTransaction is already called) though the
> intention is to block committing transactions that violate the defined RPO.
> Even though we block existing connections starting a new transaction, it is
> possible to do writes by opening a new connection / canceling the query. I
> am not too much worried about the lock contention as the system is already
> hosed because of the policy. This behavior is very similar to what
> happens when the Sync standby is not responding. Thoughts?
>

Yeah, that's true, but even if we are blocking the transactions from
committing then also it is possible that a new connection can come and
generate more WAL,  yeah but I agree with the other part that if you
throttle after committing then the user can cancel the queries and generate
more WAL from those sessions as well.  But that is an extreme case where
application developers want to bypass the throttling and want to generate
more WALs.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com