Re: Foreign key joins revisited
On 28.12.21 20:45, Vik Fearing wrote: I don't particularly like this whole idea anyway, but if we're going to have it, I would suggest JOIN ... USING KEY ... since USING currently requires a parenthesized list, that shouldn't create any ambiguity. In the 1990s, there were some SQL drafts that included syntax like JOIN ... USING PRIMARY KEY | USING FOREIGN KEY | USING CONSTRAINT ... AFAICT, these ideas just faded away because of other priorities, so if someone wants to revive it, some work already exists.
Re: Converting WAL to SQL
On 29.12.21 07:18, rajesh singarapu wrote: I am wondering if we have a mechanism to convert WAL records to SQL statements. I am able to use logical decoders like wal2json or test_decoding for converting WAL to readable format, but I am looking for a way to convert WAL to sql statements. Using pglogical in SPI mode has such a logic.
Re: UNIQUE null treatment option
Here is a rebased version of this patch. On 27.08.21 14:38, Peter Eisentraut wrote: The SQL standard has been ambiguous about whether null values in unique constraints should be considered equal or not. Different implementations have different behaviors. In the SQL:202x draft, this has been formalized by making this implementation-defined and adding an option on unique constraint definitions UNIQUE [ NULLS [NOT] DISTINCT ] to choose a behavior explicitly. This patch adds this option to PostgreSQL. The default behavior remains UNIQUE NULLS DISTINCT. Making this happen in the btree code is apparently pretty easy; most of the patch is just to carry the flag around to all the places that need it. The CREATE UNIQUE INDEX syntax extension is not from the standard, it's my own invention. (I named all the internal flags, catalog columns, etc. in the negative ("nulls not distinct") so that the default PostgreSQL behavior is the default if the flag is false. But perhaps the double negatives make some code harder to read.) From ffd6c8e0ce24f3c56bd44e71588d4165e20e9157 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 29 Dec 2021 10:49:57 +0100 Subject: [PATCH v2] Add UNIQUE null treatment option The SQL standard has been ambiguous about whether null values in unique constraints should be considered equal or not. Different implementations have different behaviors. In the SQL:202x draft, this has been formalized by making this implementation-defined and adding an option on unique constraint definitions UNIQUE [ NULLS [NOT] DISTINCT ] to choose a behavior explicitly. This patch adds this option to PostgreSQL. The default behavior remains UNIQUE NULLS DISTINCT. Making this happen in the btree code is pretty easy; most of the patch is just to carry the flag around to all the places that need it. The CREATE UNIQUE INDEX syntax extension is not from the standard, it's my own invention. XXX I named all the internal flags, catalog columns, etc. in the negative ("nulls not distinct") so that the default PostgreSQL behavior is the default if the flag is false. But perhaps the double negatives make some code harder to read. Discussion: https://www.postgresql.org/message-id/flat/84e5ee1b-387e-9a54-c326-9082674bd...@enterprisedb.com --- doc/src/sgml/catalogs.sgml | 13 + doc/src/sgml/ddl.sgml | 29 -- doc/src/sgml/information_schema.sgml | 12 + doc/src/sgml/ref/alter_table.sgml | 4 +- doc/src/sgml/ref/create_index.sgml | 13 + doc/src/sgml/ref/create_table.sgml | 11 ++-- src/backend/access/nbtree/nbtinsert.c | 10 ++-- src/backend/access/nbtree/nbtsort.c| 15 +- src/backend/catalog/index.c| 7 +++ src/backend/catalog/information_schema.sql | 9 +++- src/backend/catalog/sql_features.txt | 1 + src/backend/catalog/toasting.c | 1 + src/backend/commands/indexcmds.c | 3 +- src/backend/nodes/copyfuncs.c | 2 + src/backend/nodes/equalfuncs.c | 2 + src/backend/nodes/makefuncs.c | 3 +- src/backend/nodes/outfuncs.c | 2 + src/backend/parser/gram.y | 47 ++--- src/backend/parser/parse_utilcmd.c | 3 ++ src/backend/utils/adt/ruleutils.c | 23 +--- src/backend/utils/cache/relcache.c | 1 + src/backend/utils/sort/tuplesort.c | 8 ++- src/bin/pg_dump/pg_dump.c | 19 +-- src/bin/pg_dump/pg_dump.h | 1 + src/bin/psql/describe.c| 19 +-- src/include/catalog/pg_index.h | 1 + src/include/nodes/execnodes.h | 1 + src/include/nodes/makefuncs.h | 2 +- src/include/nodes/parsenodes.h | 2 + src/include/utils/tuplesort.h | 1 + src/test/regress/expected/constraints.out | 23 src/test/regress/expected/create_index.out | 61 ++ src/test/regress/sql/constraints.sql | 14 + src/test/regress/sql/create_index.sql | 37 + 34 files changed, 342 insertions(+), 58 deletions(-) diff --git a/doc/src/sgml/catalogs.sgml b/doc/src/sgml/catalogs.sgml index 03e2537b07..f0d49e4841 100644 --- a/doc/src/sgml/catalogs.sgml +++ b/doc/src/sgml/catalogs.sgml @@ -4275,6 +4275,19 @@ pg_index Columns + + + indnullsnotdistinct bool + + + This value is only used for unique indexes. If false, this unique + index will consider null values distinct (so the index can contain + multiple null values in a column, the default PostgreSQL behavior). If + it is true, it will consider null values to be equal (so the index can + only contain one null value in a column). + + + indisprimary bool diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index 64d9030652..f622285ba0 1006
[PATCH] allow src/tools/msvc/*.bat files to be called from the root of the source tree
Hello, currently, on Windows/MSVC, src\tools\msvc\*.bat files mostly require being in that src\tools\msvc directory first. I suggest an obvious fix: diff --git a/src/tools/msvc/build.bat b/src/tools/msvc/build.bat index 4001ac1d0d1..407b6559cfb 100755 --- a/src/tools/msvc/build.bat +++ b/src/tools/msvc/build.bat @@ -3,4 +3,4 @@ REM src/tools/msvc/build.bat REM all the logic for this now belongs in build.pl. This file really REM only exists so you don't have to type "perl build.pl" REM Resist any temptation to add any logic here. -@perl build.pl %* +@perl %~dp0\build.pl %* diff --git a/src/tools/msvc/install.bat b/src/tools/msvc/install.bat index d03277eff2b..98edf6bdffb 100644 --- a/src/tools/msvc/install.bat +++ b/src/tools/msvc/install.bat @@ -3,4 +3,4 @@ REM src/tools/msvc/install.bat REM all the logic for this now belongs in install.pl. This file really REM only exists so you don't have to type "perl install.pl" REM Resist any temptation to add any logic here. -@perl install.pl %* +@perl %~dp0\install.pl %* diff --git a/src/tools/msvc/vcregress.bat b/src/tools/msvc/vcregress.bat index a981d3a6aa1..0d65c823e13 100644 --- a/src/tools/msvc/vcregress.bat +++ b/src/tools/msvc/vcregress.bat @@ -3,4 +3,4 @@ REM src/tools/msvc/vcregress.bat REM all the logic for this now belongs in vcregress.pl. This file really REM only exists so you don't have to type "perl vcregress.pl" REM Resist any temptation to add any logic here. -@perl vcregress.pl %* +@perl %~dp0\vcregress.pl %* This patch uses standard windows cmd's %~dp0 to get the complete path (drive, "d", and path, "p") of the currently executing .bat file to get proper path of a .pl file to execute. I find the following link useful whenever I need to remember details on cmd's %-substitution rules: https://ss64.com/nt/syntax-args.html With this change, one can call those .bat files, e.g. src\tools\msvc\build.bat, without leaving the root of the source tree. Not sure if similar change should be applied to pgflex.bat and pgbison.bat -- never used them on Windows and they seem to require being called from the root, but perhaps they deserve a similar change. If accepted, do you think this change is worthy of back-porting? Please advise if you think this change is a beneficial one. P.S. Yes, I am aware of very probable upcoming move to meson, but until then this little patch really helps me whenever I have to deal with Windows and MSVC from the command line. Besides, it could help old branches as well. -- Anton Voloshin https://postgrespro.ru Postgres Professional, The Russian Postgres Companydiff --git a/src/tools/msvc/build.bat b/src/tools/msvc/build.bat index 4001ac1d0d1..407b6559cfb 100755 --- a/src/tools/msvc/build.bat +++ b/src/tools/msvc/build.bat @@ -3,4 +3,4 @@ REM src/tools/msvc/build.bat REM all the logic for this now belongs in build.pl. This file really REM only exists so you don't have to type "perl build.pl" REM Resist any temptation to add any logic here. -@perl build.pl %* +@perl %~dp0\build.pl %* diff --git a/src/tools/msvc/install.bat b/src/tools/msvc/install.bat index d03277eff2b..98edf6bdffb 100644 --- a/src/tools/msvc/install.bat +++ b/src/tools/msvc/install.bat @@ -3,4 +3,4 @@ REM src/tools/msvc/install.bat REM all the logic for this now belongs in install.pl. This file really REM only exists so you don't have to type "perl install.pl" REM Resist any temptation to add any logic here. -@perl install.pl %* +@perl %~dp0\install.pl %* diff --git a/src/tools/msvc/vcregress.bat b/src/tools/msvc/vcregress.bat index a981d3a6aa1..0d65c823e13 100644 --- a/src/tools/msvc/vcregress.bat +++ b/src/tools/msvc/vcregress.bat @@ -3,4 +3,4 @@ REM src/tools/msvc/vcregress.bat REM all the logic for this now belongs in vcregress.pl. This file really REM only exists so you don't have to type "perl vcregress.pl" REM Resist any temptation to add any logic here. -@perl vcregress.pl %* +@perl %~dp0\vcregress.pl %*
Re: automatically generating node support functions
On 12.10.21 15:52, Andrew Dunstan wrote: I haven't been through the whole thing, but I did notice this: the comment stripping code looks rather fragile. I think it would blow up if there were a continuation line not starting with qr/\s*\*/. It's a lot simpler and more robust to do this if you slurp the file in whole. Here's what we do in the buildfarm code: my $src = file_contents($_); # strip C comments # We used to use the recipe in perlfaq6 but there is actually no point. # We don't need to keep the quoted string values anyway, and # on some platforms the complex regex causes perl to barf and crash. $src =~ s{/\*.*?\*/}{}gs; After you've done that splitting it into lines is pretty simple. Here is an updated patch, with some general rebasing, and the above improvement. It now also generates #include lines necessary in copyfuncs etc. to pull in all the node types it operates on. Further, I have looked more into the "metadata" approach discussed in [0]. It's pretty easy to generate that kind of output from the data structures my script produces. You just loop over all the node types and print stuff and keep a few counters. I don't plan to work on that at this time, but I just wanted to point out that if people wanted to move into that direction, my patch wouldn't be in the way. [0]: https://www.postgresql.org/message-id/flat/20190828234136.fk2ndqtld3onfrrp%40alap3.anarazel.deFrom e2c08d8b793200a07b8fe5ae85dd23f401ddcef1 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 29 Dec 2021 12:00:41 +0100 Subject: [PATCH v3] Automatically generate node support functions Add a script to automatically generate the node support functions (copy, equal, out, and read, as well as the node tags enum) from the struct definitions. For each of the four node support files, it creates two include files, e.g., copyfuncs.inc1.c and copyfuncs.inc2.c, to include in the main file. All the scaffolding of the main file stays in place. TODO: In this patch, I have only ifdef'ed out the code to could be removed, mainly so that it won't constantly have merge conflicts. Eventually, that should all be changed to delete the code. When we do that, some code comments should probably be preserved elsewhere, so that will need another pass of consideration. I have tried to mostly make the coverage of the output match what is currently there. For example, one could now do out/read coverage of utility statement nodes, but I have manually excluded those for now. The reason is mainly that it's easier to diff the before and after, and adding a bunch of stuff like this might require a separate analysis and review. Subtyping (TidScan -> Scan) is supported. For the hard cases, you can just write a manual function and exclude generating one. For the not so hard cases, there is a way of annotating struct fields to get special behaviors. For example, pg_node_attr(equal_ignore) has the field ignored in equal functions. Discussion: https://www.postgresql.org/message-id/flat/c1097590-a6a4-486a-64b1-e1f9cc0533ce%40enterprisedb.com --- src/backend/Makefile | 8 +- src/backend/nodes/.gitignore | 3 + src/backend/nodes/Makefile| 46 ++ src/backend/nodes/copyfuncs.c | 19 +- src/backend/nodes/equalfuncs.c| 22 +- src/backend/nodes/gen_node_support.pl | 660 ++ src/backend/nodes/outfuncs.c | 30 +- src/backend/nodes/readfuncs.c | 23 +- src/include/nodes/.gitignore | 2 + src/include/nodes/nodes.h | 8 + src/include/nodes/parsenodes.h| 2 +- src/include/nodes/pathnodes.h | 134 +++--- src/include/nodes/plannodes.h | 90 ++-- src/include/nodes/primnodes.h | 20 +- src/include/pg_config_manual.h| 4 +- src/include/utils/rel.h | 6 +- src/tools/msvc/Solution.pm| 46 ++ 17 files changed, 976 insertions(+), 147 deletions(-) create mode 100644 src/backend/nodes/.gitignore create mode 100644 src/backend/nodes/gen_node_support.pl create mode 100644 src/include/nodes/.gitignore diff --git a/src/backend/Makefile b/src/backend/Makefile index 0da848b1fd..a33db1ae01 100644 --- a/src/backend/Makefile +++ b/src/backend/Makefile @@ -143,11 +143,15 @@ storage/lmgr/lwlocknames.h: storage/lmgr/generate-lwlocknames.pl storage/lmgr/lw submake-catalog-headers: $(MAKE) -C catalog distprep generated-header-symlinks +# run this unconditionally to avoid needing to know its dependencies here: +submake-nodes-headers: + $(MAKE) -C nodes distprep generated-header-symlinks + # run this unconditionally to avoid needing to know its dependencies here: submake-utils-headers: $(MAKE) -C utils distprep generated-header-symlinks -.PHONY: submake-catalog-headers submake-utils-headers +.PHONY: submake-catalog-headers submake-nodes-headers submake-utils-headers # Make symlinks for these header
Re: Logical replication timeout problem
I put the instance with high level debug mode. I try to do some log interpretation: After having finished writing the modifications generated by the insert in the snap files, then these files are read (restored). One minute after this work starts, the worker process exit with an error code = 1. I see that keepalive messages were sent before the work process work leave. 2021-12-28 10:50:01.894 CET [55792] LOCATION: WalSndKeepalive, walsender.c:3365 ... 2021-12-28 10:50:31.854 CET [55792] STATEMENT: START_REPLICATION SLOT "sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names '"pub008_s012a00"') 2021-12-28 10:50:31.907 CET [55792] DEBUG: 0: StartTransaction(1) name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 0/1/0 2021-12-28 10:50:31.907 CET [55792] LOCATION: ShowTransactionStateRec, xact.c:5075 2021-12-28 10:50:31.907 CET [55792] STATEMENT: START_REPLICATION SLOT "sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names '"pub008_s012a00"') 2021-12-28 10:50:31.907 CET [55792] DEBUG: 0: spill 2271 changes in XID 14312 to disk 2021-12-28 10:50:31.907 CET [55792] LOCATION: ReorderBufferSerializeTXN, reorderbuffer.c:2245 2021-12-28 10:50:31.907 CET [55792] STATEMENT: START_REPLICATION SLOT "sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names '"pub008_s012a00"') *2021-12-28 10:50:32.110 CET [55792] DEBUG: 0: restored 4096/22603999 changes from disk* 2021-12-28 10:50:32.110 CET [55792] LOCATION: ReorderBufferIterTXNNext, reorderbuffer.c:1156 2021-12-28 10:50:32.110 CET [55792] STATEMENT: START_REPLICATION SLOT "sub008_s012a00" LOGICAL 17/27240748 (proto_version '1', publication_names '"pub008_s012a00"') 2021-12-28 10:50:32.138 CET [55792] DEBUG: 0: restored 4096/22603999 changes from disk ... *2021-12-28 10:50:35.341 CET [55794] DEBUG: 0: sending replication keepalive2021-12-28 10:50:35.341 CET [55794] LOCATION: WalSndKeepalive, walsender.c:3365* ... *2021-12-28 10:51:31.995 CET [55791] ERROR: XX000: terminating logical replication worker due to timeout* *2021-12-28 10:51:31.995 CET [55791] LOCATION: LogicalRepApplyLoop, worker.c:1267* Could this function in* Apply main loop* in worker.c help to find a solution? rc = WaitLatchOrSocket(MyLatch, WL_SOCKET_READABLE | WL_LATCH_SET | WL_TIMEOUT | WL_POSTMASTER_DEATH, fd, wait_time, WAIT_EVENT_LOGICAL_APPLY_MAIN); Thanks for your help Fabrice On Thu, Dec 23, 2021 at 11:52 AM Amit Kapila wrote: > On Wed, Dec 22, 2021 at 8:50 PM Fabrice Chapuis > wrote: > > > > Hello Amit, > > > > I was able to reproduce the timeout problem in the lab. > > After loading more than 20 millions of rows in a table which is not > replicated (insert command ends without error), errors related to logical > replication processes appear in the postgres log. > > Approximately every 5 minutes worker process is restarted. The snap > files in the slot directory are still present. The replication system seems > to be blocked. Why these snap files are not removed. What do they contain? > > > > These contain changes of insert. I think these are not removed for > your case as your long transaction is never finished. As mentioned > earlier, for such cases, it is better to use 'streaming' feature > released as part of PG-14 but anyway here we are trying to debug > timeout problem. > > > I will recompile postgres with your patch to debug. > > > > Okay, that might help. > > -- > With Regards, > Amit Kapila. >
Re: Converting WAL to SQL
On Wed, 29 Dec 2021 at 03:18 rajesh singarapu wrote: > Hi Hackers, > > I am wondering if we have a mechanism to convert WAL records to SQL > statements. > > I am able to use logical decoders like wal2json or test_decoding for > converting WAL to readable format, but I am looking for a way to convert > WAL to sql statements. > > > Try this: https://github.com/michaelpq/pg_plugins/tree/main/decoder_raw -- Fabrízio de Royes Mello Timbira - http://www.timbira.com.br/ PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
Re: generalized conveyor belt storage
On Wed, Dec 15, 2021 at 9:04 PM Robert Haas wrote: > > On Wed, Dec 15, 2021 at 10:03 AM Matthias van de Meent > wrote: > [...] Thought patch is WIP, here are a few comments that I found while reading the patch and thought might help: + { + if (meta->cbm_oldest_index_segment == + meta->cbm_newest_index_segment) + elog(ERROR, "must remove last index segment when only one remains"); + meta->cbm_oldest_index_segment = segno; + } How about having to assert or elog to ensure that 'segno' is indeed the successor? -- + if (meta->cbm_index[offset] != offset) + elog(ERROR, +"index entry at offset %u was expected to be %u but found %u", +offset, segno, meta->cbm_index[offset]); IF condition should be : meta->cbm_index[offset] != segno ? -- + if (segno >= CB_METAPAGE_FREESPACE_BYTES * BITS_PER_BYTE) + elog(ERROR, "segment %u out of range for metapage fsm", segno); I think CB_FSM_SEGMENTS_FOR_METAPAGE should be used like cb_metapage_set_fsm_bit() -- +/* + * Increment the count of segments allocated. + */ +void +cb_metapage_increment_next_segment(CBMetapageData *meta, CBSegNo segno) +{ + if (segno != meta->cbm_next_segment) + elog(ERROR, "extending to create segment %u but next segment is %u", +segno, meta->cbm_next_segment); I didn't understand this error, what does it mean? It would be helpful to add a brief about what it means and why we are throwing it and/or rephrasing the error bit. -- +++ b/src/backend/access/conveyor/cbxlog.c @@ -0,0 +1,442 @@ +/*- + * + * cbxlog.c + * XLOG support for conveyor belts. + * + * For each REDO function in this file, see cbmodify.c for the + * corresponding function that performs the modification during normal + * running and logs the record that we REDO here. + * + * Copyright (c) 2016-2021, PostgreSQL Global Development Group + * + * src/backend/access/conveyor/cbmodify.c Incorrect file name: s/cbmodify.c/cbxlog.c/ -- + can_allocate_segment = + (free_segno_first_blkno < possibly_not_on_disk_blkno) The condition should be '<=' ? -- + * ConveyorBeltPhysicalTruncate. For more aggressive cleanup options, see + * ConveyorBeltCompact or ConveyorBeltRewrite. Didn't find ConveyorBeltCompact or ConveyorBeltRewrite code, might yet to be implemented? -- + * the value computed here here if the entry at that offset is already "here" twice. -- And few typos: othr semgents fucntion refrenced initialied remve extrordinarily implemenation -- Regards, Amul
Re: Documenting when to retry on serialization failure
On Wed, 29 Dec 2021 at 03:30, Thomas Munro wrote: > > On Fri, Dec 10, 2021 at 1:43 AM Simon Riggs > wrote: > > "Applications using this level must be prepared to retry transactions > > due to serialization failures." > > ... > > "When an application receives this error message, it should abort the > > current transaction and retry the whole transaction from the > > beginning." > > > > I note that the specific error codes this applies to are not > > documented, so lets discuss what the docs for that would look like. > > +1 for naming the error. I've tried to sum up the various points from everybody into this doc patch. Thanks all for replies. -- Simon Riggshttp://www.EnterpriseDB.com/ retryable_error_docs.v1.patch Description: Binary data
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
Greetings, * SATYANARAYANA NARLAPURAM (satyanarlapu...@gmail.com) wrote: > On Sat, Dec 25, 2021 at 9:25 PM Dilip Kumar wrote: > > On Sun, Dec 26, 2021 at 10:36 AM SATYANARAYANA NARLAPURAM < > > satyanarlapu...@gmail.com> wrote: > >>> Actually all the WAL insertions are done under a critical section > >>> (except few exceptions), that means if you see all the references of > >>> XLogInsert(), it is always called under the critical section and that is > >>> my > >>> main worry about hooking at XLogInsert level. > >>> > >> > >> Got it, understood the concern. But can we document the limitations of > >> the hook and let the hook take care of it? I don't expect an error to be > >> thrown here since we are not planning to allocate memory or make file > >> system calls but instead look at the shared memory state and add delays > >> when required. > >> > >> > > Yet another problem is that if we are in XlogInsert() that means we are > > holding the buffer locks on all the pages we have modified, so if we add a > > hook at that level which can make it wait then we would also block any of > > the read operations needed to read from those buffers. I haven't thought > > what could be better way to do this but this is certainly not good. > > > > Yes, this is a problem. The other approach is adding a hook at > XLogWrite/XLogFlush? All the other backends will be waiting behind the > WALWriteLock. The process that is performing the write enters into a busy > loop with small delays until the criteria are met. Inability to process the > interrupts inside the critical section is a challenge in both approaches. > Any other thoughts? Why not have this work the exact same way sync replicas do, except that it's based off of some byte/time lag for some set of async replicas? That is, in RecordTransactionCommit(), perhaps right after the SyncRepWaitForLSN() call, or maybe even add this to that function? Sure seems like there's a lot of similarity. Thanks, Stephen signature.asc Description: PGP signature
Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary
Greetings, * Euler Taveira (eu...@eulerto.com) wrote: > On Thu, Dec 23, 2021, at 9:58 AM, Bharath Rupireddy wrote: > > pg_archivecleanup currently takes a WAL file name as input to delete > > the WAL files prior to it [1]. As suggested by Satya (cc-ed) in > > pg_replslotdata thread [2], can we enhance the pg_archivecleanup to > > automatically detect the last checkpoint (from control file) LSN, > > calculate the lowest restart_lsn required by the replication slots, if > > any (by reading the replication slot info from pg_logical directory), > > archive the unneeded (an archive_command similar to that of the one > > provided in the server config can be provided as an input) WAL files > > before finally deleting them? Making pg_archivecleanup tool as an > > end-to-end solution will help greatly in disk full situations because > > of WAL files growth (inactive replication slots, archive command > > failures, infrequent checkpoint etc.). The overall idea of having a tool for this isn't a bad idea, but .. > pg_archivecleanup is a tool to remove WAL files from the *archive*. Are you > suggesting to use it for removing files from pg_wal directory too? No, thanks. We definitely shouldn't have it be part of pg_archivecleanup for the simple reason that it'll be really confusing and almost certainly will be mis-used. For my 2c, we should just remove pg_archivecleanup entirely. > WAL files are a key component for backup and replication. Hence, you cannot > deliberately allow a tool to remove WAL files from PGDATA. IMO this issue > wouldn't occur if you have a monitoring system and alerts and someone to keep > an eye on it. If the disk full situation was caused by a failed archive > command > or a disconnected standby, it is easy to figure out; the fix is simple. This is perhaps a bit far- PG does, in fact, remove WAL files from PGDATA. Having a tool which will do this safely when the server isn't able to be brought online due to lack of disk space would certainly be helpful rather frequently. I agree that monitoring and alerting are things that everyone should implement and pay attention to, but that doesn't happen and instead people end up just blowing away pg_wal and corrupting their database when, had a tool existed, they could have avoided that happening and brought the system back online in relatively short order without any data loss. Thanks, Stephen signature.asc Description: PGP signature
Re: Proposal: More structured logging
> Subject: [PATCH v3 2/3] Add test module for the new tag functionality. ... > +test_logging(PG_FUNCTION_ARGS) > +{ ... > + (errmsg("%s", message), > + ({ > + forboth(lk, keys, lv, values) > + { > + (errtag(lfirst(lk), "%s", (char *) lfirst(lv))); > + }}) > + )); The windows build fails with that syntax. http://cfbot.cputube.org/ronan-dunklau.html https://ci.appveyor.com/project/postgresql-cfbot/postgresql/build/1.0.157923 > Subject: [PATCH v3 3/3] Output error tags in CSV logs > +++ b/doc/src/sgml/config.sgml > @@ -7370,6 +7371,7 @@ CREATE TABLE postgres_log >backend_type text, >leader_pid integer, >query_id bigint, > + tags jsonb >PRIMARY KEY (session_id, session_line_num) > ); > That's invalid sql due to missing a trailing ",". You should also update file-fdw.sgml - which I only think of since we forgot in to update it before e568ed0eb and 0830d21f5. config.sgml should have a comment as a reminder to do that. -- Justin
Report checkpoint progress in server logs
Hi, At times, some of the checkpoint operations such as removing old WAL files, dealing with replication snapshot or mapping files etc. may take a while during which the server doesn't emit any logs or information, the only logs emitted are LogCheckpointStart and LogCheckpointEnd. Many times this isn't a problem if the checkpoint is quicker, but there can be extreme situations which require the users to know what's going on with the current checkpoint. Given that the commit 9ce346ea [1] introduced a nice mechanism to report the long running operations of the startup process in the server logs, I'm thinking we can have a similar progress mechanism for the checkpoint as well. There's another idea suggested in a couple of other threads to have a pg_stat_progress_checkpoint similar to pg_stat_progress_analyze/vacuum/etc. But the problem with this idea is during the end-of-recovery or shutdown checkpoints, the pg_stat_progress_checkpoint view isn't accessible as it requires a connection to the server which isn't allowed. Therefore, reporting the checkpoint progress in the server logs, much like [1], seems to be the best way IMO. We can 1) either make ereport_startup_progress and log_startup_progress_interval more generic (something like ereport_log_progress and log_progress_interval), move the code to elog.c, use it for checkpoint progress and if required for other time-consuming operations 2) or have an entirely different GUC and API for checkpoint progress. IMO, option (1) i.e. ereport_log_progress and log_progress_interval (better names are welcome) seems a better idea. Thoughts? [1] commit 9ce346eabf350a130bba46be3f8c50ba28506969 Author: Robert Haas Date: Mon Oct 25 11:51:57 2021 -0400 Report progress of startup operations that take a long time. Regards, Bharath Rupireddy.
Re: Report checkpoint progress in server logs
On Wed, Dec 29, 2021 at 3:31 PM Bharath Rupireddy wrote: > > Hi, > > At times, some of the checkpoint operations such as removing old WAL > files, dealing with replication snapshot or mapping files etc. may > take a while during which the server doesn't emit any logs or > information, the only logs emitted are LogCheckpointStart and > LogCheckpointEnd. Many times this isn't a problem if the checkpoint is > quicker, but there can be extreme situations which require the users > to know what's going on with the current checkpoint. > > Given that the commit 9ce346ea [1] introduced a nice mechanism to > report the long running operations of the startup process in the > server logs, I'm thinking we can have a similar progress mechanism for > the checkpoint as well. There's another idea suggested in a couple of > other threads to have a pg_stat_progress_checkpoint similar to > pg_stat_progress_analyze/vacuum/etc. But the problem with this idea is > during the end-of-recovery or shutdown checkpoints, the > pg_stat_progress_checkpoint view isn't accessible as it requires a > connection to the server which isn't allowed. > > Therefore, reporting the checkpoint progress in the server logs, much > like [1], seems to be the best way IMO. We can 1) either make > ereport_startup_progress and log_startup_progress_interval more > generic (something like ereport_log_progress and > log_progress_interval), move the code to elog.c, use it for > checkpoint progress and if required for other time-consuming > operations 2) or have an entirely different GUC and API for checkpoint > progress. > > IMO, option (1) i.e. ereport_log_progress and log_progress_interval > (better names are welcome) seems a better idea. > > Thoughts? I find progress reporting in the logfile to generally be a terrible way of doing things, and the fact that we do it for the startup process is/should be only because we have no other choice, not because it's the right choice. I think the right choice to solve the *general* problem is the mentioned pg_stat_progress_checkpoints. We may want to *additionally* have the ability to log the progress specifically for the special cases when we're not able to use that view. And in those case, we can perhaps just use the existing log_startup_progress_interval parameter for this as well -- at least for the startup checkpoint. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: pg_archivecleanup - add the ability to detect, archive and delete the unneeded wal files on the primary
On Wed, Dec 29, 2021 at 7:27 PM Stephen Frost wrote: > > On Thu, Dec 23, 2021, at 9:58 AM, Bharath Rupireddy wrote: > > > pg_archivecleanup currently takes a WAL file name as input to delete > > > the WAL files prior to it [1]. As suggested by Satya (cc-ed) in > > > pg_replslotdata thread [2], can we enhance the pg_archivecleanup to > > > automatically detect the last checkpoint (from control file) LSN, > > > calculate the lowest restart_lsn required by the replication slots, if > > > any (by reading the replication slot info from pg_logical directory), > > > archive the unneeded (an archive_command similar to that of the one > > > provided in the server config can be provided as an input) WAL files > > > before finally deleting them? Making pg_archivecleanup tool as an > > > end-to-end solution will help greatly in disk full situations because > > > of WAL files growth (inactive replication slots, archive command > > > failures, infrequent checkpoint etc.). > > The overall idea of having a tool for this isn't a bad idea, but .. > > > pg_archivecleanup is a tool to remove WAL files from the *archive*. Are you > > suggesting to use it for removing files from pg_wal directory too? No, > > thanks. > > We definitely shouldn't have it be part of pg_archivecleanup for the > simple reason that it'll be really confusing and almost certainly will > be mis-used. +1 > > WAL files are a key component for backup and replication. Hence, you cannot > > deliberately allow a tool to remove WAL files from PGDATA. IMO this issue > > wouldn't occur if you have a monitoring system and alerts and someone to > > keep > > an eye on it. If the disk full situation was caused by a failed archive > > command > > or a disconnected standby, it is easy to figure out; the fix is simple. > > This is perhaps a bit far- PG does, in fact, remove WAL files from > PGDATA. Having a tool which will do this safely when the server isn't > able to be brought online due to lack of disk space would certainly be > helpful rather frequently. I agree that monitoring and alerting are > things that everyone should implement and pay attention to, but that > doesn't happen and instead people end up just blowing away pg_wal and > corrupting their database when, had a tool existed, they could have > avoided that happening and brought the system back online in relatively > short order without any data loss. Thanks. Yes, the end-to-end tool is helpful in rather eventual situations and having it in the core is more helpful instead of every postgres vendor developing their own solution and many times it's hard to get it right. Also, I agree to not club this idea with pg_archviecleanup. How about having a new tool like pg_walcleanup/pg_xlogcleanup helping the developers/admins/users in eventual situations? Regards, Bharath Rupireddy.
Re: [PATCH] allow src/tools/msvc/*.bat files to be called from the root of the source tree
On 12/29/21 05:16, Anton Voloshin wrote: > Hello, > > currently, on Windows/MSVC, src\tools\msvc\*.bat files mostly require > being in that src\tools\msvc directory first. > > I suggest an obvious fix: [...] > This patch uses standard windows cmd's %~dp0 to get the complete path > (drive, "d", and path, "p") of the currently executing .bat file to > get proper path of a .pl file to execute. I find the following link > useful whenever I need to remember details on cmd's %-substitution > rules: https://ss64.com/nt/syntax-args.html > > With this change, one can call those .bat files, e.g. > src\tools\msvc\build.bat, without leaving the root of the source tree. > > Not sure if similar change should be applied to pgflex.bat and > pgbison.bat -- never used them on Windows and they seem to require > being called from the root, but perhaps they deserve a similar change. > > If accepted, do you think this change is worthy of back-porting? > > Please advise if you think this change is a beneficial one. > > P.S. Yes, I am aware of very probable upcoming move to meson, but > until then this little patch really helps me whenever I have to deal > with Windows and MSVC from the command line. Besides, it could help > old branches as well. > Seems reasonable. I don't see any reason not to do it for pgbison.bat and pgflex.bat, just for the sake of consistency. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Foreign key joins revisited
On 12/28/21 15:10, Tom Lane wrote: > Vik Fearing writes: >> On 12/28/21 8:26 PM, Joel Jacobson wrote: >>> Can with think of some other suitable reserved keyword? >> I don't particularly like this whole idea anyway, but if we're going to >> have it, I would suggest >> JOIN ... USING KEY ... > That would read well, which is nice, but I wonder if it wouldn't induce > confusion. You'd have to explain that it didn't work like standard > USING in the sense of merging the join-key columns. > > ... unless, of course, we wanted to make it do so. Would that > be sane? Which name (referenced or referencing column) would > the merged column have? > > I agree this would cause confusion. I think your earlier suggestion of JOIN ... FOREIGN KEY ... seems reasonable. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Foreign key joins revisited
Peter Eisentraut writes: > In the 1990s, there were some SQL drafts that included syntax like > JOIN ... USING PRIMARY KEY | USING FOREIGN KEY | USING CONSTRAINT ... > AFAICT, these ideas just faded away because of other priorities, so if > someone wants to revive it, some work already exists. Interesting! One thing that bothered me about this whole line of discussion is that we could get blindsided in future by the SQL committee standardizing the same idea with slightly different syntax/semantics. I think borrowing this draft text would greatly improve the odds of that not happening. Do you have access to full details? regards, tom lane
Re: Report checkpoint progress in server logs
Magnus Hagander writes: >> Therefore, reporting the checkpoint progress in the server logs, much >> like [1], seems to be the best way IMO. > I find progress reporting in the logfile to generally be a terrible > way of doing things, and the fact that we do it for the startup > process is/should be only because we have no other choice, not because > it's the right choice. I'm already pretty seriously unhappy about the log-spamming effects of 64da07c41 (default to log_checkpoints=on), and am willing to lay a side bet that that gets reverted after we have some field experience with it. This proposal seems far worse from that standpoint. Keep in mind that our out-of-the-box logging configuration still doesn't have any log rotation ability, which means that the noisier the server is in normal operation, the sooner you fill your disk. > I think the right choice to solve the *general* problem is the > mentioned pg_stat_progress_checkpoints. +1 regards, tom lane
Re: PublicationActions - use bit flags.
On Mon, Dec 20, 2021 at 11:18:41AM +1100, Peter Smith wrote: > For some reason the current HEAD PublicationActions is a struct of > boolean representing combinations of the 4 different "publication > actions". > > I felt it is more natural to implement boolean flag combinations using > a bitmask instead of a struct of bools. IMO using the bitmask also > simplifies assignment and checking of said flags. Peter E made a suggestion to use a similar struct with bools last year for REINDEX. https://www.postgresql.org/message-id/7ec67c56-2377-cd05-51a0-691104404...@enterprisedb.com Alvaro pointed out that the integer flags are better for ABI compatibility - it would allow adding a new flag in backbranches, if that were needed for a bugfix. But that's not very compelling, since the bools have existed in v10... Also, the booleans directly correspond with the catalog. + if (pubform->pubinsert) pub->pubactions |= PUBACTION_INSERT; This is usually written like: pub->pubactions |= (pubform->pubinsert ? PUBACTION_INSERT : 0) -- Justin
Per-table storage parameters for TableAM/IndexAM extensions
Hi, Currently all the storage options for a table are very much specific to the heap but a different AM might need some user defined AM specific parameters to help tune the AM. So here is a patch which provides an AM level routine so that instead of getting parameters validated using “heap_reloptions” it will call the registered AM routine. e.g: -- create a new access method and table using this access method CREATE ACCESS METHOD myam TYPE TABLE HANDLER ; CREATE TABLE mytest (a int) USING myam ; --a new parameter is to set storage parameter for only myam as below ALTER TABLE mytest(squargle_size = '100MB'); The user-defined parameters will have meaning only for the "myam", otherwise error will be thrown. Our relcache already allows the AM-specific cache to be stored for each relation. Open Question: When a user changes AM, then what should be the behavior for not supported storage options? Should we drop the options and go with only system storage options? Or throw an error, in which case the user has to clean the added parameters. Thanks & Regards SadhuPrasad http://www.EnterpriseDB.com/ v1-0001-PATCH-v1-Per-table-storage-parameters-for-TableAM.patch Description: Binary data
Re: Add index scan progress to pg_stat_progress_vacuum
http://cfbot.cputube.org/sami-imseih.html You should run "make check" and update rules.out. You should also use make check-world - usually something like: make check-world -j4 >check-world.out 2>&1 ; echo ret $? > indrelid: The relid of the index currently being vacuumed I think it should be called indexrelid not indrelid, for consistency with pg_index. > S.param10 vacuum_cycle_ordinal_position, > S.param13 index_rows_vacuumed These should both say "AS" for consistency. system_views.sql is using tabs, but should use spaces for consistency. > #include "commands/progress.h" The postgres convention is to alphabetize the includes. > /* VACCUM operation's longest index scan cycle */ VACCUM => VACUUM Ultimately you'll also need to update the docs.
Re: Report checkpoint progress in server logs
Coincidentally, I was thinking about the same yesterday after tired of waiting for the checkpoint completion on a server. On Wed, Dec 29, 2021 at 7:41 AM Tom Lane wrote: > Magnus Hagander writes: > >> Therefore, reporting the checkpoint progress in the server logs, much > >> like [1], seems to be the best way IMO. > > > I find progress reporting in the logfile to generally be a terrible > > way of doing things, and the fact that we do it for the startup > > process is/should be only because we have no other choice, not because > > it's the right choice. > > I'm already pretty seriously unhappy about the log-spamming effects of > 64da07c41 (default to log_checkpoints=on), and am willing to lay a side > bet that that gets reverted after we have some field experience with it. > This proposal seems far worse from that standpoint. Keep in mind that > our out-of-the-box logging configuration still doesn't have any log > rotation ability, which means that the noisier the server is in normal > operation, the sooner you fill your disk. > Server is not open up for the queries while running the end of recovery checkpoint and a catalog view may not help here but the process title change or logging would be helpful in such cases. When the server is running the recovery, anxious customers ask several times the ETA for recovery completion, and not having visibility into these operations makes life difficult for the DBA/operations. > > > I think the right choice to solve the *general* problem is the > > mentioned pg_stat_progress_checkpoints. > > +1 > +1 to this. We need at least a trace of the number of buffers to sync (num_to_scan) before the checkpoint start, instead of just emitting the stats at the end. Bharat, it would be good to show the buffers synced counter and the total buffers to sync, checkpointer pid, substep it is running, whether it is on target for completion, checkpoint_Reason (manual/times/forced). BufferSync has several variables tracking the sync progress locally, and we may need some refactoring here. > > regards, tom lane > > >
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
Stephen, thank you! On Wed, Dec 29, 2021 at 5:46 AM Stephen Frost wrote: > Greetings, > > * SATYANARAYANA NARLAPURAM (satyanarlapu...@gmail.com) wrote: > > On Sat, Dec 25, 2021 at 9:25 PM Dilip Kumar > wrote: > > > On Sun, Dec 26, 2021 at 10:36 AM SATYANARAYANA NARLAPURAM < > > > satyanarlapu...@gmail.com> wrote: > > >>> Actually all the WAL insertions are done under a critical section > > >>> (except few exceptions), that means if you see all the references of > > >>> XLogInsert(), it is always called under the critical section and > that is my > > >>> main worry about hooking at XLogInsert level. > > >>> > > >> > > >> Got it, understood the concern. But can we document the limitations of > > >> the hook and let the hook take care of it? I don't expect an error to > be > > >> thrown here since we are not planning to allocate memory or make file > > >> system calls but instead look at the shared memory state and add > delays > > >> when required. > > >> > > >> > > > Yet another problem is that if we are in XlogInsert() that means we are > > > holding the buffer locks on all the pages we have modified, so if we > add a > > > hook at that level which can make it wait then we would also block any > of > > > the read operations needed to read from those buffers. I haven't > thought > > > what could be better way to do this but this is certainly not good. > > > > > > > Yes, this is a problem. The other approach is adding a hook at > > XLogWrite/XLogFlush? All the other backends will be waiting behind the > > WALWriteLock. The process that is performing the write enters into a busy > > loop with small delays until the criteria are met. Inability to process > the > > interrupts inside the critical section is a challenge in both approaches. > > Any other thoughts? > > Why not have this work the exact same way sync replicas do, except that > it's based off of some byte/time lag for some set of async replicas? > That is, in RecordTransactionCommit(), perhaps right after the > SyncRepWaitForLSN() call, or maybe even add this to that function? Sure > seems like there's a lot of similarity. > I was thinking of achieving log governance (throttling WAL MB/sec) and also providing RPO guarantees. In this model, it is hard to throttle WAL generation of a long running transaction (for example copy/select into). However, this meets my RPO needs. Are you in support of adding a hook or the actual change? IMHO, the hook allows more creative options. I can go ahead and make a patch accordingly. > Thanks, > > Stephen >
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
Greetings, On Wed, Dec 29, 2021 at 14:04 SATYANARAYANA NARLAPURAM < satyanarlapu...@gmail.com> wrote: > Stephen, thank you! > > On Wed, Dec 29, 2021 at 5:46 AM Stephen Frost wrote: > >> Greetings, >> >> * SATYANARAYANA NARLAPURAM (satyanarlapu...@gmail.com) wrote: >> > On Sat, Dec 25, 2021 at 9:25 PM Dilip Kumar >> wrote: >> > > On Sun, Dec 26, 2021 at 10:36 AM SATYANARAYANA NARLAPURAM < >> > > satyanarlapu...@gmail.com> wrote: >> > >>> Actually all the WAL insertions are done under a critical section >> > >>> (except few exceptions), that means if you see all the references of >> > >>> XLogInsert(), it is always called under the critical section and >> that is my >> > >>> main worry about hooking at XLogInsert level. >> > >>> >> > >> >> > >> Got it, understood the concern. But can we document the limitations >> of >> > >> the hook and let the hook take care of it? I don't expect an error >> to be >> > >> thrown here since we are not planning to allocate memory or make file >> > >> system calls but instead look at the shared memory state and add >> delays >> > >> when required. >> > >> >> > >> >> > > Yet another problem is that if we are in XlogInsert() that means we >> are >> > > holding the buffer locks on all the pages we have modified, so if we >> add a >> > > hook at that level which can make it wait then we would also block >> any of >> > > the read operations needed to read from those buffers. I haven't >> thought >> > > what could be better way to do this but this is certainly not good. >> > > >> > >> > Yes, this is a problem. The other approach is adding a hook at >> > XLogWrite/XLogFlush? All the other backends will be waiting behind the >> > WALWriteLock. The process that is performing the write enters into a >> busy >> > loop with small delays until the criteria are met. Inability to process >> the >> > interrupts inside the critical section is a challenge in both >> approaches. >> > Any other thoughts? >> >> Why not have this work the exact same way sync replicas do, except that >> it's based off of some byte/time lag for some set of async replicas? >> That is, in RecordTransactionCommit(), perhaps right after the >> SyncRepWaitForLSN() call, or maybe even add this to that function? Sure >> seems like there's a lot of similarity. >> > > I was thinking of achieving log governance (throttling WAL MB/sec) and > also providing RPO guarantees. In this model, it is hard to throttle WAL > generation of a long running transaction (for example copy/select into). > Long running transactions have a lot of downsides and are best discouraged. I don’t know that we should be designing this for that case specifically, particularly given the complications it would introduce as discussed on this thread already. However, this meets my RPO needs. Are you in support of adding a hook or > the actual change? IMHO, the hook allows more creative options. I can go > ahead and make a patch accordingly. > I would think this would make more sense as part of core rather than a hook, as that then requires an extension and additional setup to get going, which raises the bar quite a bit when it comes to actually being used. Thanks, Stephen >
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
On Wed, Dec 29, 2021 at 11:16 AM Stephen Frost wrote: > Greetings, > > On Wed, Dec 29, 2021 at 14:04 SATYANARAYANA NARLAPURAM < > satyanarlapu...@gmail.com> wrote: > >> Stephen, thank you! >> >> On Wed, Dec 29, 2021 at 5:46 AM Stephen Frost wrote: >> >>> Greetings, >>> >>> * SATYANARAYANA NARLAPURAM (satyanarlapu...@gmail.com) wrote: >>> > On Sat, Dec 25, 2021 at 9:25 PM Dilip Kumar >>> wrote: >>> > > On Sun, Dec 26, 2021 at 10:36 AM SATYANARAYANA NARLAPURAM < >>> > > satyanarlapu...@gmail.com> wrote: >>> > >>> Actually all the WAL insertions are done under a critical section >>> > >>> (except few exceptions), that means if you see all the references >>> of >>> > >>> XLogInsert(), it is always called under the critical section and >>> that is my >>> > >>> main worry about hooking at XLogInsert level. >>> > >>> >>> > >> >>> > >> Got it, understood the concern. But can we document the limitations >>> of >>> > >> the hook and let the hook take care of it? I don't expect an error >>> to be >>> > >> thrown here since we are not planning to allocate memory or make >>> file >>> > >> system calls but instead look at the shared memory state and add >>> delays >>> > >> when required. >>> > >> >>> > >> >>> > > Yet another problem is that if we are in XlogInsert() that means we >>> are >>> > > holding the buffer locks on all the pages we have modified, so if we >>> add a >>> > > hook at that level which can make it wait then we would also block >>> any of >>> > > the read operations needed to read from those buffers. I haven't >>> thought >>> > > what could be better way to do this but this is certainly not good. >>> > > >>> > >>> > Yes, this is a problem. The other approach is adding a hook at >>> > XLogWrite/XLogFlush? All the other backends will be waiting behind the >>> > WALWriteLock. The process that is performing the write enters into a >>> busy >>> > loop with small delays until the criteria are met. Inability to >>> process the >>> > interrupts inside the critical section is a challenge in both >>> approaches. >>> > Any other thoughts? >>> >>> Why not have this work the exact same way sync replicas do, except that >>> it's based off of some byte/time lag for some set of async replicas? >>> That is, in RecordTransactionCommit(), perhaps right after the >>> SyncRepWaitForLSN() call, or maybe even add this to that function? Sure >>> seems like there's a lot of similarity. >>> >> >> I was thinking of achieving log governance (throttling WAL MB/sec) and >> also providing RPO guarantees. In this model, it is hard to throttle WAL >> generation of a long running transaction (for example copy/select into). >> > > Long running transactions have a lot of downsides and are best > discouraged. I don’t know that we should be designing this for that case > specifically, particularly given the complications it would introduce as > discussed on this thread already. > > However, this meets my RPO needs. Are you in support of adding a hook or >> the actual change? IMHO, the hook allows more creative options. I can go >> ahead and make a patch accordingly. >> > > I would think this would make more sense as part of core rather than a > hook, as that then requires an extension and additional setup to get going, > which raises the bar quite a bit when it comes to actually being used. > Sounds good, I will work on making the changes accordingly. > > Thanks, > > Stephen > >>
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
Hi, On 2021-12-27 16:40:28 -0800, SATYANARAYANA NARLAPURAM wrote: > > Yet another problem is that if we are in XlogInsert() that means we are > > holding the buffer locks on all the pages we have modified, so if we add a > > hook at that level which can make it wait then we would also block any of > > the read operations needed to read from those buffers. I haven't thought > > what could be better way to do this but this is certainly not good. > > > > Yes, this is a problem. The other approach is adding a hook at > XLogWrite/XLogFlush? That's pretty much the same - XLogInsert() can trigger an XLogWrite()/Flush(). I think it's a complete no-go to add throttling to these places. It's quite possible that it'd cause new deadlocks, and it's almost guaranteed to have unintended consequences (e.g. replication falling back further because XLogFlush() is being throttled). I also don't think it's a sane thing to add hooks to these places. It's complicated enough as-is, adding the chance for random other things to happen during such crucial operations will make it even harder to maintain. Greetings, Andres Freund
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
On Wed, Dec 29, 2021 at 11:31 AM Andres Freund wrote: > Hi, > > On 2021-12-27 16:40:28 -0800, SATYANARAYANA NARLAPURAM wrote: > > > Yet another problem is that if we are in XlogInsert() that means we are > > > holding the buffer locks on all the pages we have modified, so if we > add a > > > hook at that level which can make it wait then we would also block any > of > > > the read operations needed to read from those buffers. I haven't > thought > > > what could be better way to do this but this is certainly not good. > > > > > > > Yes, this is a problem. The other approach is adding a hook at > > XLogWrite/XLogFlush? > > That's pretty much the same - XLogInsert() can trigger an > XLogWrite()/Flush(). > > I think it's a complete no-go to add throttling to these places. It's quite > possible that it'd cause new deadlocks, and it's almost guaranteed to have > unintended consequences (e.g. replication falling back further because > XLogFlush() is being throttled). > > I also don't think it's a sane thing to add hooks to these places. It's > complicated enough as-is, adding the chance for random other things to > happen > during such crucial operations will make it even harder to maintain. > Andres, thanks for the comments. Agreed on this based on the previous discussions on this thread. Could you please share your thoughts on adding it after SyncRepWaitForLSN()? > > Greetings, > > Andres Freund >
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
Hi, On 2021-12-29 11:34:53 -0800, SATYANARAYANA NARLAPURAM wrote: > On Wed, Dec 29, 2021 at 11:31 AM Andres Freund wrote: > Andres, thanks for the comments. Agreed on this based on the previous > discussions on this thread. Could you please share your thoughts on adding > it after SyncRepWaitForLSN()? I don't think that's good either - you're delaying transaction commit (i.e. xact becoming visible / locks being released). That also has the danger of increasing lock contention (albeit more likely to be heavyweight locks / serializable state). It'd have to be after the transaction actually committed. Greetings, Andres Freund
Re: WIP: WAL prefetch (another approach)
Hi, On 2021-12-29 17:29:52 +1300, Thomas Munro wrote: > > FWIW I don't think we include updates to typedefs.list in patches. > > Seems pretty harmless? And useful to keep around in development > branches because I like to pgindent stuff... I think it's even helpful. As long as it's done with a bit of manual oversight, I don't see a meaningful downside of doing so. One needs to be careful to not remove platform dependant typedefs, but that's it. And especially for long-lived feature branches it's much less work to keep the typedefs.list changes in the tree, rather than coming up with them locally over and over / across multiple people working on a branch. Greetings, Andres Freund
Re: Adding CI to our tree
Hi, On 2021-12-20 11:21:05 -0800, Andres Freund wrote: > Attached is v4 of the CI patch. I'd like to push this - any objections? It's not disruptive to anything but cfbot, so we can incrementally improve it further. I'll try to sync pushing with Thomas, so that he can adjust cfbot to not add the CI changes anymore / adjust the links to the CI status URLs etc. Greetings, Andres Freund
Strange path from pgarch_readyXlog()
Hi, Isn't this a corrupted pathname? 2021-12-29 03:39:55.708 CST [79851:1] WARNING: removal of orphan archive status file "pg_wal/archive_status/00010003.0028.backup00010004.ready" failed too many times, will try again later https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=peripatus&dt=2021-12-29%2009%3A20%3A54
Re: Add Boolean node
On 2021-12-27 09:53:32 -0500, Tom Lane wrote: > Didn't really read the patch in any detail, but I did have one idea: > I think that the different things-that-used-to-be-Value-nodes ought to > use different field names, say ival, rval, bval, sval not just "val". > That makes it more likely that you'd catch any code that is doing the > wrong thing and not going through one of the access macros. If we go around changing all these places, it might be worth to also change Integer to be a int64 instead of an int.
Re: Add Boolean node
Andres Freund writes: > If we go around changing all these places, it might be worth to also change > Integer to be a int64 instead of an int. Meh ... that would have some non-obvious consequences, I think, at least if you tried to make the grammar make use of the extra width (it'd change the type resolution behavior for integer-ish literals). I think it's better to keep it as plain int. regards, tom lane
Re: Add Boolean node
Hi, On 2021-12-27 10:02:14 +0100, Peter Eisentraut wrote: > This patch adds a new node type Boolean, to go alongside the "value" nodes > Integer, Float, String, etc. This seems appropriate given that Boolean > values are a fundamental part of the system and are used a lot. > > Before, SQL-level Boolean constants were represented by a string with > a cast, and internal Boolean values in DDL commands were usually represented > by Integer nodes. This takes the place of both of these uses, making the > intent clearer and having some amount of type safety. This annoyed me plenty of times before, plus many. > From 4e1ef56b5443fa11d981eb6e407dfc7c244dc60e Mon Sep 17 00:00:00 2001 > From: Peter Eisentraut > Date: Mon, 27 Dec 2021 09:52:05 +0100 > Subject: [PATCH v1] Add Boolean node > > Before, SQL-level boolean constants were represented by a string with > a cast, and internal Boolean values in DDL commands were usually > represented by Integer nodes. This takes the place of both of these > uses, making the intent clearer and having some amount of type safety. > --- > ... > 20 files changed, 210 insertions(+), 126 deletions(-) This might be easier to review if there were one patch adding the Boolean type, and then a separate one converting users? > diff --git a/src/backend/commands/tsearchcmds.c > b/src/backend/commands/tsearchcmds.c > index c47a05d10d..b7261a88d4 100644 > --- a/src/backend/commands/tsearchcmds.c > +++ b/src/backend/commands/tsearchcmds.c > @@ -1742,6 +1742,15 @@ buildDefItem(const char *name, const char *val, bool > was_quoted) > return makeDefElem(pstrdup(name), > (Node *) > makeFloat(pstrdup(val)), > -1); > + > + if (strcmp(val, "true") == 0) > + return makeDefElem(pstrdup(name), > +(Node *) > makeBoolean(true), > +-1); > + if (strcmp(val, "false") == 0) > + return makeDefElem(pstrdup(name), > +(Node *) > makeBoolean(false), > +-1); > } > /* Just make it a string */ > return makeDefElem(pstrdup(name), Hm. defGetBoolean() interprets "true", "false", "on", "off" as booleans. ISTM we shouldn't invent different behaviours for individual subsystems? > --- a/src/backend/nodes/outfuncs.c > +++ b/src/backend/nodes/outfuncs.c > @@ -3433,6 +3433,12 @@ _outFloat(StringInfo str, const Float *node) > appendStringInfoString(str, node->val); > } > > +static void > +_outBoolean(StringInfo str, const Boolean *node) > +{ > + appendStringInfoString(str, node->val ? "true" : "false"); > +} Any reason not to use 't' and 'f' instead? It seems unnecessary to bloat the node output by the longer strings, and it makes parsing more expensive too: > --- a/src/backend/nodes/read.c > +++ b/src/backend/nodes/read.c > @@ -283,6 +283,8 @@ nodeTokenType(const char *token, int length) > retval = RIGHT_PAREN; > else if (*token == '{') > retval = LEFT_BRACE; > + else if (strcmp(token, "true") == 0 || strcmp(token, "false") == 0) > + retval = T_Boolean; > else if (*token == '"' && length > 1 && token[length - 1] == '"') > retval = T_String; > else if (*token == 'b') Before this could be implemented as a jump table, not now it can't easily be anymore. > diff --git a/src/test/isolation/expected/ri-trigger.out > b/src/test/isolation/expected/ri-trigger.out > index 842df80a90..db85618bef 100644 > --- a/src/test/isolation/expected/ri-trigger.out > +++ b/src/test/isolation/expected/ri-trigger.out > @@ -4,9 +4,9 @@ starting permutation: wxry1 c1 r2 wyrx2 c2 > step wxry1: INSERT INTO child (parent_id) VALUES (0); > step c1: COMMIT; > step r2: SELECT TRUE; > -bool > - > -t > +?column? > + > +t > (1 row) This doesn't seem great. It might be more consistent ("SELECT 1" doesn't end up with 'integer' as column name), but this still seems like an unnecessarily large user-visible change for an internal data-representation change? Greetings, Andres Freund
Re: Strange path from pgarch_readyXlog()
On 12/29/21, 12:22 PM, "Thomas Munro" wrote: > Isn't this a corrupted pathname? > > 2021-12-29 03:39:55.708 CST [79851:1] WARNING: removal of orphan > archive status file > "pg_wal/archive_status/00010003.0028.backup00010004.ready" > failed too many times, will try again later I bet this was a simple mistake in beb4e9b. Nathan diff --git a/src/backend/postmaster/pgarch.c b/src/backend/postmaster/pgarch.c index 434939be9b..b5b0d4e12f 100644 --- a/src/backend/postmaster/pgarch.c +++ b/src/backend/postmaster/pgarch.c @@ -113,7 +113,7 @@ static PgArchData *PgArch = NULL; * is empty, at which point another directory scan must be performed. */ static binaryheap *arch_heap = NULL; -static char arch_filenames[NUM_FILES_PER_DIRECTORY_SCAN][MAX_XFN_CHARS]; +static char arch_filenames[NUM_FILES_PER_DIRECTORY_SCAN][MAX_XFN_CHARS + 1]; static char *arch_files[NUM_FILES_PER_DIRECTORY_SCAN]; static int arch_files_size = 0;
Re: Strange path from pgarch_readyXlog()
"Bossart, Nathan" writes: > I bet this was a simple mistake in beb4e9b. > -static char arch_filenames[NUM_FILES_PER_DIRECTORY_SCAN][MAX_XFN_CHARS]; > +static char arch_filenames[NUM_FILES_PER_DIRECTORY_SCAN][MAX_XFN_CHARS + 1]; Hm, yeah, that looks like a pretty obvious bug. While we're here, I wonder if we ought to get rid of the static-ness of these arrays. I realize that they're only eating a few kB, but they're doing so in every postgres process, when they'll only be used in the archiver. regards, tom lane
Logging replication state changes
Hi hackers, I noticed that below critical replication state changes are DEBUG1 level logged. Any concern about changing the below two messages log level to LOG? If this is too verbose, we can introduce a new GUC, log_replication_state_changes that logs the replication state changes when enabled irrespective of the log level. 1/ /* * If we're in catchup state, move to streaming. This is an * important state change for users to know about, since before * this point data loss might occur if the primary dies and we * need to failover to the standby. The state change is also * important for synchronous replication, since commits that * started to wait at that point might wait for some time. */ if (MyWalSnd->state == WALSNDSTATE_CATCHUP) { ereport(DEBUG1, (errmsg_internal("\"%s\" has now caught up with upstream server", application_name))); WalSndSetState(WALSNDSTATE_STREAMING); } 2/ ereport(DEBUG1, (errmsg_internal("standby \"%s\" now has synchronous standby priority %u", application_name, priority))); Thanks, Satya
Re: Logging replication state changes
SATYANARAYANA NARLAPURAM writes: > I noticed that below critical replication state changes are DEBUG1 level > logged. Any concern about changing the below two messages log level to LOG? Why? These seem like perfectly routine messages. regards, tom lane
Re: Adding CI to our tree
> On 29 Dec 2021, at 21:17, Andres Freund wrote: > On 2021-12-20 11:21:05 -0800, Andres Freund wrote: >> Attached is v4 of the CI patch. > > I'd like to push this - any objections? It's not disruptive to anything but > cfbot, so we can incrementally improve it further. No objection, I'm +1 on getting this in. -- Daniel Gustafsson https://vmware.com/
Re: Logging replication state changes
On Wed, Dec 29, 2021 at 2:04 PM Tom Lane wrote: > SATYANARAYANA NARLAPURAM writes: > > I noticed that below critical replication state changes are DEBUG1 level > > logged. Any concern about changing the below two messages log level to > LOG? > > Why? These seem like perfectly routine messages. > Consider a scenario where we have a primary and two sync standby (s1 and s2) where s1 is a preferred failover target and s2 is next with synchronous_standby_names = 'First 1 ('s1','s2')'. In an event, s1 streaming replication is broken and reestablished because of a planned or an unplanned event then s2 participates in the sync commits and makes sure the writes are not stalled on the primary. I would like to know the time window where s1 is not actively acknowledging the commits and the writes are dependent on s2. Also if the service layer decides to failover to s2 instead of s1 because s1 is lagging I need evidence in the log to explain the behavior. > regards, tom lane >
Re: Strange path from pgarch_readyXlog()
"Bossart, Nathan" writes: > On 12/29/21, 1:04 PM, "Tom Lane" wrote: >> While we're here, I wonder if we ought to get rid of the static-ness of >> these arrays. I realize that they're only eating a few kB, but they're >> doing so in every postgres process, when they'll only be used in the >> archiver. > This crossed my mind, too. I also think one of the arrays can be > eliminated in favor of just using the heap (after rebuilding with a > reversed comparator). Here is a minimally-tested patch that > demonstrates what I'm thinking. I already pushed a patch that de-static-izes those arrays, so this needs rebased at least. However, now that you mention it it does seem like maybe the intermediate arch_files[] array could be dropped in favor of just pulling the next file from the heap. The need to reverse the heap's sort order seems like a problem though. I really dislike the kluge you used here with a static flag that inverts the comparator's sort order behind the back of the binary-heap mechanism. It seems quite accidental that that doesn't fall foul of asserts or optimizations in binaryheap.c. For instance, if binaryheap_build decided it needn't do anything when bh_has_heap_property is already true, this code would fail. In any case, we'd need to spend O(n) time inverting the heap's sort order, so this'd likely be slower than the current code. On the whole I'm inclined not to bother trying to optimize this further. The main thing that concerned me is that somebody would bump up NUM_FILES_PER_DIRECTORY_SCAN to the point where the static space consumption becomes really problematic, and we've fixed that. regards, tom lane
SELECT documentation
Hi, The Examples section in the documentation for the SELECT command [1] only contains a single example on how to join two tables, which is written in SQL-89 style: SELECT f.title, f.did, d.name, f.date_prod, f.kind FROM distributors d, films f WHERE f.did = d.did I think it's good to keep this example query as it is, and suggest we add the following equivalent queries: SELECT f.title, f.did, d.name, f.date_prod, f.kind FROM distributors d JOIN films f ON f.did = d.did SELECT f.title, f.did, d.name, f.date_prod, f.kind FROM distributors d JOIN films f USING (did) SELECT f.title, f.did, d.name, f.date_prod, f.kind FROM distributors d NATURAL JOIN films f I also think it would be an improvement to break up the from_item below into three separate items, since the optional NATURAL cannot occur in combination with ON nor USING. from_item [ NATURAL ] join_type from_item [ ON join_condition | USING ( join_column [, ...] ) [ AS join_using_alias ] ] Suggestion: from_item join_type from_item ON join_condition from_item join_type from_item USING ( join_column [, ...] ) [ AS join_using_alias ] from_item NATURAL join_type from_item This would be more readable imo. I picked the order ON, USING, NATURAL to match the order they are described in the FROM Clause section. /Joel [1] https://www.postgresql.org/docs/current/sql-select.html
Re: UNIQUE null treatment option
Hi, boolisunique; + boolnulls_not_distinct; } BTSpool; Looking at the other fields in BTSpool, there is no underscore in field name. I think the new field can be named nullsdistinct. This way, the double negative is avoided. Similar comment for new fields in BTShared and BTLeader. And the naming would be consistent with information_schema.sql where nulls_distinct is used: + CAST('YES' AS yes_or_no) AS enforced, + CAST(NULL AS yes_or_no) AS nulls_distinct Cheers
Re: Column Filtering in Logical Replication
On 2021-Dec-28, Alvaro Herrera wrote: > There are still some XXX comments. The one that bothers me most is the > lack of an implementation that allows changing the column list in a > publication without having to remove the table from the publication > first. OK, I made some progress on this front; I added new forms of ALTER PUBLICATION to support it: ALTER PUBLICATION pub1 ALTER TABLE tbl SET COLUMNS (a, b, c); ALTER PUBLICATION pub1 ALTER TABLE tbl SET COLUMNS ALL; (not wedded to this syntax; other suggestions welcome) In order to implement it I changed the haphazardly chosen use of DEFELEM actions to a new enum. I also noticed that the division of labor between pg_publication.c and publicationcmds.c is quite broken (code to translate column names to numbers is in the former, should be in the latter; some code that deals with pg_publication tuples is in the latter, should be in the former, such as CreatePublication, AlterPublicationOptions). This new stuff is not yet finished. For example I didn't refactor handling of REPLICA IDENTITY, so the new command does not correctly check everything, such as the REPLICA IDENTITY FULL stuff. Also, no tests have been added yet. In manual tests it seems to behave as expected. I noticed that prattrs is inserted in user-specified order instead of catalog order, which is innocuous but quite weird. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "No renuncies a nada. No te aferres a nada." diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 34a7034282..5bc2e7a591 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -6877,7 +6877,9 @@ Relation -Next, the following message part appears for each column (except generated columns): +Next, the following message part appears for each column (except +generated columns and other columns that don't appear in the column +filter list, for tables that have one): diff --git a/doc/src/sgml/ref/alter_publication.sgml b/doc/src/sgml/ref/alter_publication.sgml index bb4ef5e5e2..4951343f6f 100644 --- a/doc/src/sgml/ref/alter_publication.sgml +++ b/doc/src/sgml/ref/alter_publication.sgml @@ -25,12 +25,13 @@ ALTER PUBLICATION name ADD name SET publication_object [, ...] ALTER PUBLICATION name DROP publication_object [, ...] ALTER PUBLICATION name SET ( publication_parameter [= value] [, ... ] ) +ALTER PUBLICATION name ALTER TABLE publication_object SET COLUMNS { ( name [, ...] ) | ALL ALTER PUBLICATION name OWNER TO { new_owner | CURRENT_ROLE | CURRENT_USER | SESSION_USER } ALTER PUBLICATION name RENAME TO new_name where publication_object is one of: -TABLE [ ONLY ] table_name [ * ] [, ... ] +TABLE [ ONLY ] table_name [ * ] [ ( column_name, [, ... ] ) ] [, ... ] ALL TABLES IN SCHEMA { schema_name | CURRENT_SCHEMA } [, ... ] @@ -62,6 +63,11 @@ ALTER PUBLICATION name RENAME TO + + The ALTER TABLE ... SET COLUMNS variant allows to change + the set of columns that are included in the publication. + + The remaining variants change the owner and the name of the publication. @@ -110,6 +116,8 @@ ALTER PUBLICATION name RENAME TO * can be specified after the table name to explicitly indicate that descendant tables are included. + Optionally, a column list can be specified. See for details. @@ -164,9 +172,15 @@ ALTER PUBLICATION noinsert SET (publish = 'update, delete'); - Add some tables to the publication: + Add tables to the publication: -ALTER PUBLICATION mypublication ADD TABLE users, departments; +ALTER PUBLICATION mypublication ADD TABLE users (user_id, firstname), departments; + + + + Change the set of columns published for a table: + +ALTER PUBLICATION mypublication ALTER TABLE users SET COLUMNS (user_id, firstname, lastname); diff --git a/doc/src/sgml/ref/create_publication.sgml b/doc/src/sgml/ref/create_publication.sgml index d805e8e77a..73a23cbb02 100644 --- a/doc/src/sgml/ref/create_publication.sgml +++ b/doc/src/sgml/ref/create_publication.sgml @@ -28,7 +28,7 @@ CREATE PUBLICATION name where publication_object is one of: -TABLE [ ONLY ] table_name [ * ] [, ... ] +TABLE [ ONLY ] table_name [ * ] [ ( column_name, [, ... ] ) ] [, ... ] ALL TABLES IN SCHEMA { schema_name | CURRENT_SCHEMA } [, ... ] @@ -78,6 +78,15 @@ CREATE PUBLICATION name publication, so they are never explicitly added to the publication. + + When a column list is specified, only the listed columns are replicated; + any other columns are ignored for the purpose of replication through + this publication. If no column list is specified, all columns of the + table are replicated through this publication, including any columns + added later. If a column list is specified, it must include the replica + identity columns. + + Only persistent ba
Re: PublicationActions - use bit flags.
On Thu, Dec 30, 2021 at 3:30 AM Justin Pryzby wrote: > > On Mon, Dec 20, 2021 at 11:18:41AM +1100, Peter Smith wrote: > > For some reason the current HEAD PublicationActions is a struct of > > boolean representing combinations of the 4 different "publication > > actions". > > > > I felt it is more natural to implement boolean flag combinations using > > a bitmask instead of a struct of bools. IMO using the bitmask also > > simplifies assignment and checking of said flags. > > Peter E made a suggestion to use a similar struct with bools last year for > REINDEX. > https://www.postgresql.org/message-id/7ec67c56-2377-cd05-51a0-691104404...@enterprisedb.com > > Alvaro pointed out that the integer flags are better for ABI compatibility - > it > would allow adding a new flag in backbranches, if that were needed for a > bugfix. > > But that's not very compelling, since the bools have existed in v10... > Also, the booleans directly correspond with the catalog. > > + if (pubform->pubinsert) pub->pubactions |= PUBACTION_INSERT; > > This is usually written like: > > pub->pubactions |= (pubform->pubinsert ? PUBACTION_INSERT : 0) > Thanks for the info, I've modified those assignment styles as suggested. -- Kind Regards, Peter Smith. Fujitsu Australia. v3-0001-PublicationActions-use-bit-flags.patch Description: Binary data
Re: PublicationActions - use bit flags.
Peter Smith writes: > On Thu, Dec 30, 2021 at 3:30 AM Justin Pryzby wrote: >> + if (pubform->pubinsert) pub->pubactions |= PUBACTION_INSERT; >> This is usually written like: >> pub->pubactions |= (pubform->pubinsert ? PUBACTION_INSERT : 0) > Thanks for the info, I've modified those assignment styles as suggested. FWIW, I think it's utter nonsense to claim that the second way is preferred over the first. There may be some people who think the second way is more legible, but I don't; and I'm pretty sure that the first way is significantly more common in the PG codebase. regards, tom lane
Tests "with" and "alter_table" suffer from name clash
Hi, With unlucky scheduling you can get a failure like this: https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&dt=2021-12-22%2010%3A51%3A32 Suggested fix attached. From 3991f040e9c9afc4d7cfd4980b5f27f4113dbd1f Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Thu, 30 Dec 2021 14:43:57 +1300 Subject: [PATCH] Fix racy "with" test. with.sql asserted that there was no table "test", but alter_table.sql creates and drops such a table and might run at the same time. Use a unique name. Per build farm. Back-patch all the way. --- src/test/regress/expected/with.out | 16 src/test/regress/sql/with.sql | 10 +- 2 files changed, 13 insertions(+), 13 deletions(-) diff --git a/src/test/regress/expected/with.out b/src/test/regress/expected/with.out index 75e61460d9..f15ece3bd1 100644 --- a/src/test/regress/expected/with.out +++ b/src/test/regress/expected/with.out @@ -3149,19 +3149,19 @@ with ordinality as (select 1 as x) select * from ordinality; (1 row) -- check sane response to attempt to modify CTE relation -WITH test AS (SELECT 42) INSERT INTO test VALUES (1); -ERROR: relation "test" does not exist -LINE 1: WITH test AS (SELECT 42) INSERT INTO test VALUES (1); - ^ +WITH with_test AS (SELECT 42) INSERT INTO with_test VALUES (1); +ERROR: relation "with_test" does not exist +LINE 1: WITH with_test AS (SELECT 42) INSERT INTO with_test VALUES (... + ^ -- check response to attempt to modify table with same name as a CTE (perhaps -- surprisingly it works, because CTEs don't hide tables from data-modifying -- statements) -create temp table test (i int); -with test as (select 42) insert into test select * from test; -select * from test; +create temp table with_test (i int); +with with_test as (select 42) insert into with_test select * from with_test; +select * from with_test; i 42 (1 row) -drop table test; +drop table with_test; diff --git a/src/test/regress/sql/with.sql b/src/test/regress/sql/with.sql index 46668a903e..7ff9de97a5 100644 --- a/src/test/regress/sql/with.sql +++ b/src/test/regress/sql/with.sql @@ -1459,12 +1459,12 @@ create table foo (with ordinality); -- fail, WITH is a reserved word with ordinality as (select 1 as x) select * from ordinality; -- check sane response to attempt to modify CTE relation -WITH test AS (SELECT 42) INSERT INTO test VALUES (1); +WITH with_test AS (SELECT 42) INSERT INTO with_test VALUES (1); -- check response to attempt to modify table with same name as a CTE (perhaps -- surprisingly it works, because CTEs don't hide tables from data-modifying -- statements) -create temp table test (i int); -with test as (select 42) insert into test select * from test; -select * from test; -drop table test; +create temp table with_test (i int); +with with_test as (select 42) insert into with_test select * from with_test; +select * from with_test; +drop table with_test; -- 2.30.2
Re: Tests "with" and "alter_table" suffer from name clash
Thomas Munro writes: > With unlucky scheduling you can get a failure like this: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=hoverfly&dt=2021-12-22%2010%3A51%3A32 > Suggested fix attached. Looks reasonable. We really should avoid using such common names for short-lived tables in any case --- it's an invitation to trouble. So I'd vote for changing the other use of "test", too. regards, tom lane
Re: Tests "with" and "alter_table" suffer from name clash
On Thu, Dec 30, 2021 at 3:27 PM Tom Lane wrote: > Looks reasonable. We really should avoid using such common > names for short-lived tables in any case --- it's an invitation > to trouble. So I'd vote for changing the other use of "test", too. In fact only REL_10_STABLE had the problem, because commit 2cf8c7aa already fixed the other instance in later branches. I'd entirely forgotten that earlier discussion, which apparently didn't quite go far enough. So I only needed to push the with.sql change. Done.
Re: Tests "with" and "alter_table" suffer from name clash
Thomas Munro writes: > In fact only REL_10_STABLE had the problem, because commit 2cf8c7aa > already fixed the other instance in later branches. I'd entirely > forgotten that earlier discussion, which apparently didn't quite go > far enough. So I only needed to push the with.sql change. Done. Hah, I thought this felt familiar. So the real problem is that my backpatch (b15a8c963) only fixed half of the hazard. Sigh. regards, tom lane
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
On Thu, Dec 30, 2021 at 1:09 AM Andres Freund wrote: > Hi, > > On 2021-12-29 11:34:53 -0800, SATYANARAYANA NARLAPURAM wrote: > > On Wed, Dec 29, 2021 at 11:31 AM Andres Freund > wrote: > > Andres, thanks for the comments. Agreed on this based on the previous > > discussions on this thread. Could you please share your thoughts on > adding > > it after SyncRepWaitForLSN()? > > I don't think that's good either - you're delaying transaction commit > (i.e. xact becoming visible / locks being released). Agree with that. > That also has the danger > of increasing lock contention (albeit more likely to be heavyweight locks / > serializable state). It'd have to be after the transaction actually > committed. > Yeah, I think that would make sense, even though we will be allowing a new backend to get connected insert WAL, and get committed but after that, it will be throttled. However, if the number of max connections will be very high then even after we detected a lag there a significant amount WAL could be generated, even if we keep long-running transactions aside. But I think still it will serve the purpose of what Satya is trying to achieve. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
RE: Confused comment about drop replica identity index
On Tues, Dec 21, 2021 8:47 AM Michael Paquier wrote: > On Mon, Dec 20, 2021 at 11:57:32AM -0300, Euler Taveira wrote: > > What do you think about the attached patch? It forbids the DROP INDEX. > > We might add a detail message but I didn't in this patch. > > Yeah. I'd agree about doing something like that on HEAD, and that would help > with some of the logirep-related patch currently being worked on, as far as I > understood. Hi, I think forbids DROP INDEX might not completely solve this problem. Because user could still use other command to delete the index, for example: ALTER TABLE DROP COLUMN. After dropping the column, the index on it will also be dropped. Besides, user can also ALTER REPLICA IDENTITY USING INDEX "primary key", and in this case, when they ALTER TABLE DROP CONSTR "PRIMARY KEY", the replica identity index will also be dropped. Best regards, Hou zj
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
On Wed, Dec 29, 2021 at 10:38 PM Dilip Kumar wrote: > On Thu, Dec 30, 2021 at 1:09 AM Andres Freund wrote: > >> Hi, >> >> On 2021-12-29 11:34:53 -0800, SATYANARAYANA NARLAPURAM wrote: >> > On Wed, Dec 29, 2021 at 11:31 AM Andres Freund >> wrote: >> > Andres, thanks for the comments. Agreed on this based on the previous >> > discussions on this thread. Could you please share your thoughts on >> adding >> > it after SyncRepWaitForLSN()? >> >> I don't think that's good either - you're delaying transaction commit >> (i.e. xact becoming visible / locks being released). > > > Agree with that. > > >> That also has the danger >> of increasing lock contention (albeit more likely to be heavyweight locks >> / >> serializable state). It'd have to be after the transaction actually >> committed. >> > > Yeah, I think that would make sense, even though we will be allowing a new > backend to get connected insert WAL, and get committed but after that, it > will be throttled. However, if the number of max connections will be very > high then even after we detected a lag there a significant amount WAL could > be generated, even if we keep long-running transactions aside. But I think > still it will serve the purpose of what Satya is trying to achieve. > I am afraid there are problems with making the RPO check post releasing the locks. By this time the transaction is committed and visible to the other backends (ProcArrayEndTransaction is already called) though the intention is to block committing transactions that violate the defined RPO. Even though we block existing connections starting a new transaction, it is possible to do writes by opening a new connection / canceling the query. I am not too much worried about the lock contention as the system is already hosed because of the policy. This behavior is very similar to what happens when the Sync standby is not responding. Thoughts? > > -- > Regards, > Dilip Kumar > EnterpriseDB: http://www.enterprisedb.com >
Re: Per-table storage parameters for TableAM/IndexAM extensions
On Wed, Dec 29, 2021 at 10:38 PM Sadhuprasad Patro wrote: > Hi, > > Currently all the storage options for a table are very much specific > to the heap but a different AM might need some user defined AM > specific parameters to help tune the AM. So here is a patch which > provides an AM level routine so that instead of getting parameters > validated using “heap_reloptions” it will call the registered AM > routine. > +1 for the idea. > > e.g: > -- create a new access method and table using this access method > CREATE ACCESS METHOD myam TYPE TABLE HANDLER ; > > CREATE TABLE mytest (a int) USING myam ; > > --a new parameter is to set storage parameter for only myam as below > ALTER TABLE mytest(squargle_size = '100MB'); > This will work for CREATE TABLE as well I guess as normal relation storage parameter works now right? > The user-defined parameters will have meaning only for the "myam", > otherwise error will be thrown. Our relcache already allows the > AM-specific cache to be stored for each relation. > > Open Question: When a user changes AM, then what should be the > behavior for not supported storage options? Should we drop the options > and go with only system storage options? > Or throw an error, in which case the user has to clean the added > parameters. > IMHO, if the user is changing the access method for the table then it should be fine to throw an error if there are some parameters which are not supported by the new AM. So that user can take a calculative call and first remove those storage options before changing the AM. I have a few comments on the patch, mostly cosmetics. 1. + Assert(routine->taboptions != NULL); Why AM is not allowed to register the NULL function, if NULL is registered that means the AM does not support any of the storage parameters. 2. @@ -1358,6 +1358,7 @@ untransformRelOptions(Datum options) return result; } + /* * Extract and parse reloptions from a pg_class tuple. * Unwanted hunk (added extra line) 3. + * Parse options for heaps, views and toast tables. This is + * implementation of relOptions for access method heapam. */ Better to say access method heap instead of heapam. 4. + * Parse options for tables. + * + * taboptions tables AM's option parser function + * reloptions options as text[] datum + * validate error flag Function header comment formatting is not proper, it also has uneven spacing between words. 5. -extract_autovac_opts(HeapTuple tup, TupleDesc pg_class_desc) +extract_autovac_opts(HeapTuple tup, TupleDesc pg_class_desc, + reloptions_function taboptions) Indentation is not proper, run pgindent on this. 5. >Currently all the storage options for a table are very much specific to the heap but a different AM might need some user defined AM specific parameters to help tune the AM. So here is a patch which provides an AM level routine so that instead of getting >parameters validated using “heap_reloptions” it will call the registered AM routine. Wrap these long commit message lines at 80 characters. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Throttling WAL inserts when the standby falls behind more than the configured replica_lag_in_bytes
On Thu, Dec 30, 2021 at 12:36 PM SATYANARAYANA NARLAPURAM < satyanarlapu...@gmail.com> wrote: > >> Yeah, I think that would make sense, even though we will be allowing a >> new backend to get connected insert WAL, and get committed but after that, >> it will be throttled. However, if the number of max connections will be >> very high then even after we detected a lag there a significant amount WAL >> could be generated, even if we keep long-running transactions aside. But I >> think still it will serve the purpose of what Satya is trying to achieve. >> > > I am afraid there are problems with making the RPO check post releasing > the locks. By this time the transaction is committed and visible to the > other backends (ProcArrayEndTransaction is already called) though the > intention is to block committing transactions that violate the defined RPO. > Even though we block existing connections starting a new transaction, it is > possible to do writes by opening a new connection / canceling the query. I > am not too much worried about the lock contention as the system is already > hosed because of the policy. This behavior is very similar to what > happens when the Sync standby is not responding. Thoughts? > Yeah, that's true, but even if we are blocking the transactions from committing then also it is possible that a new connection can come and generate more WAL, yeah but I agree with the other part that if you throttle after committing then the user can cancel the queries and generate more WAL from those sessions as well. But that is an extreme case where application developers want to bypass the throttling and want to generate more WALs. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com