Re: psql: fix variable existence tab completion
Hi, Alexander! On 06.05.2024 13:19, Alexander Korotkov wrote: The patch attached fix the 010_tab_completion.pl test in the same way like [1]. Thank you for the fix. As I get, the fix teaches 010_tab_completion.pl to tolerate the invalid result of tab completion. Do you think we could fix it another way to make the result of tab completion correct? Right now i don't see any straight way to fix this to the correct tab completion. There are several similar cases in this test. E.g., for such a commands: CREATE TABLE "mixedName" (f1 int, f2 text); select * from "mi ; gives with debian 10: postgres=# select * from \"mixedName\" ; resulting in an error. Now there is a similar workaround in the 010_tab_completion.pl with regex: qr/"mixedName\\?" / I think if there were or will be complaints from users about this behavior in Debian 10, then it makes sense to look for more complex solutions that will fix a backslash substitutions. If no such complaints, then it is better to make a workaround in test. With the best wishes, -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Re: elog/ereport VS misleading backtrace_function function address
Hi Tom and -hackers! On Thu, Mar 28, 2024 at 7:36 PM Tom Lane wrote: > > Jakub Wartak writes: > > While chasing some other bug I've learned that backtrace_functions > > might be misleading with top elog/ereport() address. > > That was understood from the beginning: this type of backtrace is > inherently pretty imprecise, and I doubt there is much that can > be done to make it better. IIRC the fundamental problem is it only > looks at global symbols, so static functions inherently defeat it. > It was argued that this is better than nothing, which is true, but > you have to take the info with a mountain of salt. OK, point taken, thanks for describing the limitations, still I find backtrace_functions often the best thing we have primarily due its simplicity and ease of use (for customers). I found out simplest portable way to generate NOP (or any other instruction that makes the problem go away): with the reproducer, psql returns: psql:ri-collation-bug-example.sql:48: error: ERROR: XX000: cache lookup failed for collation 0 LOCATION: get_collation_isdeterministic, lsyscache.c:1062 logfile without patch: 2024-05-07 09:05:43.043 CEST [44720] ERROR: cache lookup failed for collation 0 2024-05-07 09:05:43.043 CEST [44720] BACKTRACE: postgres: postgres postgres [local] DELETE(+0x155883) [0x55ce5a86a883] postgres: postgres postgres [local] DELETE(RI_FKey_cascade_del+0x2b0) [0x55ce5ac6dfd0] postgres: postgres postgres [local] DELETE(+0x2d488b) [0x55ce5a9e988b] [..] $ addr2line -e /usr/pgsql18/bin/postgres -a -f 0x155883 0x00155883 get_constraint_type.cold << so it's wrong as the real function should be get_collation_isdeterministic logfile with attached patch: 2024-05-07 09:11:06.596 CEST [51185] ERROR: cache lookup failed for collation 0 2024-05-07 09:11:06.596 CEST [51185] BACKTRACE: postgres: postgres postgres [local] DELETE(+0x168bf0) [0x560e1cdfabf0] postgres: postgres postgres [local] DELETE(RI_FKey_cascade_del+0x2b0) [0x560e1d200c00] postgres: postgres postgres [local] DELETE(+0x2e90fb) [0x560e1cf7b0fb] [..] $ addr2line -e /usr/pgsql18/bin/postgres -a -f 0x168bf0 0x00168bf0 get_collation_isdeterministic.cold <<< It's ok with the patch NOTE: in case one will be testing this: one cannot ./configure with --enable-debug as it prevents the compiler optimizations that actually end up with the ".cold" branch optimizations that cause backtrace() to return the wrong address. > I recall speculating about whether we could somehow invoke gdb > to get a more comprehensive and accurate backtrace, but I don't > really have a concrete idea how that could be made to work. Oh no, I'm more about micro-fix rather than doing some bigger revolution. The patch simply adds this one instruction in macro, so that now backtrace_functions behaves better: 0x00773d28 <+105>: call 0x79243f 0x00773d2d <+110>: movzbl -0x12(%rbp),%eax << this ends up being added by the patch 0x00773d31 <+114>: call 0xdc1a0 I'll put that as for PG18 in CF, but maybe it could be backpatched too - no hard opinion on that (I don't see how that ERROR/FATAL path could cause any performance regressions) -J. v1-0001-Tweak-ereport-so-that-it-generates-proper-address.patch Description: Binary data
Re: cataloguing NOT NULL constraints
Hello, At Wed, 1 May 2024 19:49:35 +0200, Alvaro Herrera wrote in > Here are two patches that I intend to push soon (hopefully tomorrow). This commit added and edited two error messages, resulting in using slightly different wordings "in" and "on" for relation constraints. + errmsg("cannot change NO INHERIT status of NOT NULL constraint \"%s\" on relation \"%s\"", === + errmsg("cannot change NO INHERIT status of NOT NULL constraint \"%s\" in relation \"%s\"", I think we usually use on in this case. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
Re: Revert: Remove useless self-joins *and* -DREALLOCATE_BITMAPSETS make server crash, regress test fail.
On Tue, May 7, 2024 at 1:22 PM David Rowley wrote: > It would be good to get some build farm coverage of this so we don't > have to rely on manual testing. I wonder if it's a good idea to just > define REALLOCATE_BITMAPSETS when #ifdef CLOBBER_FREED_MEMORY... or if > we should ask on the buildfarm-members list if someone wouldn't mind > defining it? +1 to have build farm coverage of REALLOCATE_BITMAPSETS. This flag seems quite useful. Thanks Richard
Re: Revert: Remove useless self-joins *and* -DREALLOCATE_BITMAPSETS make server crash, regress test fail.
On Tue, May 7, 2024 at 1:46 PM David Rowley wrote: > On Tue, 7 May 2024 at 17:28, Tom Lane wrote: > > What I'm trying to figure out here is whether we have a live bug > > in this area in released branches; and if so, why we've not seen > > reports of that. > > We could check what portions of REALLOCATE_BITMAPSETS are > backpatchable. It may not be applicable very far back because of v16's > 00b41463c. The bms_del_member() would have left a zero set rather than > doing bms_free() prior to that commit. There could be a bug in v16. I also think there might be a bug in v16, as long as 'sjinfo->syn_lefthand' and 'rinfo->outer_relids' are referencing the same bitmapset and the content of this bitmapset is altered through 'sjinfo->syn_lefthand' without 'rinfo->outer_relids' being aware of these changes. I tried to compose a query that can trigger this bug but failed though. Another thing that comes to my mind is that this issue shows that RestrictInfo.outer_relids could contain references to removed rels and joins, and RestrictInfo.outer_relids could be referenced after the removal of useless left joins. Currently we do not have a mechanism to clean out the bits in outer_relids during outer join removal. That is to say, RestrictInfo.outer_relids might be referenced while including rels that should have been removed. I'm not sure if this is a problem. Thanks Richard
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
> On 7 May 2024, at 01:31, Michael Paquier wrote: > > On Fri, May 03, 2024 at 10:39:15AM +0200, Daniel Gustafsson wrote: >> They are no-ops when linking against v18, but writing an extension which >> targets all supported versions of postgres along with their respective >> supported OpenSSL versions make them still required, or am I missing >> something? > > Yeah, that depends on how much version you expect your application to > work on. Still it seems to me that there's value in mentioning that > if your application does not care about anything older than OpenSSL > 1.1.0, like PG 18 assuming that this patch is merged, then these calls > are pointless for HEAD. The routine definitions would be around only > for the .so compatibility. Fair enough. I've taken a stab at documenting that the functions are deprecated, while at the same time documenting when and how they can be used (and be useful). The attached also removes one additional comment in the testcode which is now obsolete (since removing 1.0.1 support really), and fixes the spurious whitespace you detected upthread. -- Daniel Gustafsson v13-0002-Remove-pg_strong_random-initialization.patch Description: Binary data v13-0001-Remove-support-for-OpenSSL-1.0.2.patch Description: Binary data
Skip adding row-marks for non target tables when result relation is foreign table.
Hi PostgreSQL Community, I would like to bring to your attention an observation regarding the planner's behavior for foreign table update/delete operations. It appears that the planner adds rowmarks (ROW_MARK_COPY) for non-target tables, which I believe is unnecessary when using the postgres-fdw. This is because postgres-fdw performs early locking on tuples belonging to the target foreign table by utilizing the SELECT FOR UPDATE clause. In an attempt to address this, I tried implementing late locking. However, this approach still doesn't work as intended because the API assumes that foreign table rows can be re-fetched using TID (ctid). This assumption is invalid for partitioned tables on the foreign server. Additionally, the commit afb9249d06f47d7a6d4a89fea0c3625fe43c5a5d, which introduced late locking for foreign tables, mentions that the benefits of late locking against a remote server are unclear, as the extra round trips required are likely to outweigh any potential concurrency improvements. To address this issue, I have taken the initiative to create a patch that prevents the addition of rowmarks for non-target tables when the target table is using early locking. I would greatly appreciate it if you could review the patch and provide any feedback or insights I may be overlooking. Example query plan with my change: (foreign scan doesn't fetch whole row for bar). postgres=# \d+ bar Foreign table "public.bar" Column | Type | Collation | Nullable | Default |FDW options | Storage | Stats target | Description +-+---+--+-++-+--+- b1 | integer | | | | (column_name 'b1') | plain | | b2 | integer | | | | (column_name 'b2') | plain | | Server: postgres_1 FDW options: (schema_name 'public', table_name 'bar') router=# \d+ foo Foreign table "public.foo" Column | Type | Collation | Nullable | Default |FDW options | Storage | Stats target | Description +-+---+--+-++-+--+- f1 | integer | | | | (column_name 'f1') | plain | | f2 | integer | | | | (column_name 'f2') | plain | | Server: postgres_2 FDW options: (schema_name 'public', table_name 'foo') postgres=# explain verbose update foo set f1 = b1 from bar where f2=b2; QUERY PLAN Update on public.foo (cost=200.00..48713.72 rows=0 width=0) Remote SQL: UPDATE public.foo SET f1 = $2 WHERE ctid = $1 -> Nested Loop (cost=200.00..48713.72 rows=15885 width=42) Output: bar.b1, foo.ctid, foo.* Join Filter: (foo.f2 = bar.b2) -> Foreign Scan on public.bar (cost=100.00..673.20 rows=2560 width=8) Output: bar.b1, bar.b2 Remote SQL: SELECT b1, b2 FROM public.bar -> Materialize (cost=100.00..389.23 rows=1241 width=42) Output: foo.ctid, foo.*, foo.f2 -> Foreign Scan on public.foo (cost=100.00..383.02 rows=1241 width=42) Output: foo.ctid, foo.*, foo.f2 Remote SQL: SELECT f1, f2, ctid FROM public.foo FOR UPDATE (13 rows) Thank you for your time and consideration. Regards Saikiran Avula SDE, Amazon Web Services. v1-0001-Skip-adding-row-marks-for-not-target-relations-wh.patch Description: Binary data
Re: Revert: Remove useless self-joins *and* -DREALLOCATE_BITMAPSETS make server crash, regress test fail.
On Tue, May 7, 2024 at 1:19 PM Richard Guo wrote: > On Tue, May 7, 2024 at 1:46 PM David Rowley wrote: >> >> On Tue, 7 May 2024 at 17:28, Tom Lane wrote: >> > What I'm trying to figure out here is whether we have a live bug >> > in this area in released branches; and if so, why we've not seen >> > reports of that. >> >> We could check what portions of REALLOCATE_BITMAPSETS are >> backpatchable. It may not be applicable very far back because of v16's >> 00b41463c. The bms_del_member() would have left a zero set rather than >> doing bms_free() prior to that commit. There could be a bug in v16. > > > I also think there might be a bug in v16, as long as > 'sjinfo->syn_lefthand' and 'rinfo->outer_relids' are referencing the > same bitmapset and the content of this bitmapset is altered through > 'sjinfo->syn_lefthand' without 'rinfo->outer_relids' being aware of > these changes. I tried to compose a query that can trigger this bug but > failed though. Can sjinfo->syn_lefthand became empty set after bms_del_member()? If so, rinfo->outer_relids will become an invalid pointer. If so, it's obviously a bug, while it still might be very hard to make this trigger a segfault. -- Regards, Alexander Korotkov Supabase
Re: Revert: Remove useless self-joins *and* -DREALLOCATE_BITMAPSETS make server crash, regress test fail.
On Tue, May 7, 2024 at 8:29 AM Tom Lane wrote: > David Rowley writes: > > Yeah, before the revert, that did: > > - sjinf->syn_lefthand = replace_relid(sjinf->syn_lefthand, relid, > > subst); > > That replace code seems to have always done a bms_copy() > > Hmm, not always; see e0477837c. > > What I'm trying to figure out here is whether we have a live bug > in this area in released branches; and if so, why we've not seen > reports of that. I didn't yet spot a particular bug. But this place looks dangerous, and it's very hard to prove there is no bug. Even if there is no bug, it seems very easy to unintentionally add a bug here. Should we just accept to always do bms_copy()? -- Regards, Alexander Korotkov Supabase
Re: CREATE DATABASE with filesystem cloning
Hi, On Wed, 6 Mar 2024 at 05:17, Thomas Munro wrote: > > The main thing that is missing is support for redo. It's mostly > trivial I think, probably just a record type for "try cloning first" > and then teaching that clone function to fall back to the regular copy > path if it fails in recovery, do you agree with that idea? Another > approach would be to let it fail if it doesn't work on the replica, so > you don't finish up using dramatically different amounts of disk > space, but that seems terrible because now your replica is broken. So > probably fallback with logged warning (?), though I'm not sure exactly > which errnos to give that treatment to. We had an off-list talk with Thomas and we thought making this option GUC instead of SQL command level could solve this problem. I am posting a new rebased version of the patch with some important changes: * 'createdb_file_copy_method' GUC is created. Possible values are 'copy' and 'clone'. Copy is the default option. Clone option can be chosen if the system supports it, otherwise it gives error at the startup. * #else part of the clone_file() function calls pg_unreachable() instead of ereport(). * Documentation updates. Also, what should happen when the kernel has clone support but the file system does not? - I tested this on Linux and copy_file_range() silently uses normal copy when this happens. I assume the same thing will happen for FreeBSD because it uses the copy_file_range() function as well. - I am not sure about MacOS since the force flag (COPYFILE_CLONE_FORCE) is used. I do not have MacOS so I can not test it but I assume it will error out when this happens. If that is the case, is this a problem? I am asking that since this is a GUC now, the user will have the full responsibility. Any kind of feedback would be appreciated. -- Regards, Nazir Bilal Yavuz Microsoft From 404e301dbdb252c23ea9d451b817cf6e372d0d9a Mon Sep 17 00:00:00 2001 From: Nazir Bilal Yavuz Date: Tue, 7 May 2024 14:16:09 +0300 Subject: [PATCH v5] Use CLONE method with GUC on CREATE DATABASE ... STRATEGY=FILE_COPY. createdb_file_copy_method GUC option is introduced. It can be set to either COPY (default) or CLONE (if system supports it). If CLONE option is chosen, similar to STRATEGY=FILE_COPY; but attempting to use efficient file copying system calls. The kernel has the opportunity to share block ranges in copy-on-write file systems, or maybe push down the copy to network file systems and storage devices. Currently works on Linux, FreeBSD and macOS. More systems could be supported. Author: Thomas Munro Author: Nazir Bilal Yavuz Discussion: https://postgr.es/m/CA%2BhUKGLM%2Bt%2BSwBU-cHeMUXJCOgBxSHLGZutV5zCwY4qrCcE02w%40mail.gmail.com --- src/include/commands/dbcommands.h | 9 ++ src/include/storage/copydir.h | 3 +- src/backend/commands/dbcommands.c | 21 - src/backend/storage/file/copydir.c| 83 ++- .../utils/activity/wait_event_names.txt | 1 + src/backend/utils/misc/guc_tables.c | 13 +++ src/backend/utils/misc/postgresql.conf.sample | 5 ++ doc/src/sgml/config.sgml | 17 doc/src/sgml/ref/create_database.sgml | 24 -- src/tools/pgindent/typedefs.list | 1 + 10 files changed, 162 insertions(+), 15 deletions(-) diff --git a/src/include/commands/dbcommands.h b/src/include/commands/dbcommands.h index 92e17c71158..2e1d3565edc 100644 --- a/src/include/commands/dbcommands.h +++ b/src/include/commands/dbcommands.h @@ -14,6 +14,15 @@ #ifndef DBCOMMANDS_H #define DBCOMMANDS_H +typedef enum CreateDBFileCopyMethod +{ + CREATEDB_FILE_COPY_METHOD_COPY, + CREATEDB_FILE_COPY_METHOD_CLONE, +} CreateDBFileCopyMethod; + +/* GUC parameters */ +extern PGDLLIMPORT int createdb_file_copy_method; + #include "access/xlogreader.h" #include "catalog/objectaddress.h" #include "lib/stringinfo.h" diff --git a/src/include/storage/copydir.h b/src/include/storage/copydir.h index a25e258f479..9ff28f2eec9 100644 --- a/src/include/storage/copydir.h +++ b/src/include/storage/copydir.h @@ -13,7 +13,8 @@ #ifndef COPYDIR_H #define COPYDIR_H -extern void copydir(const char *fromdir, const char *todir, bool recurse); +extern void copydir(const char *fromdir, const char *todir, bool recurse, + bool clone_files); extern void copy_file(const char *fromfile, const char *tofile); #endif /* COPYDIR_H */ diff --git a/src/backend/commands/dbcommands.c b/src/backend/commands/dbcommands.c index be629ea92cf..cf0dff65e69 100644 --- a/src/backend/commands/dbcommands.c +++ b/src/backend/commands/dbcommands.c @@ -69,6 +69,20 @@ #include "utils/snapmgr.h" #include "utils/syscache.h" +/* GUCs */ +int createdb_file_copy_method = CREATEDB_FILE_COPY_METHOD_COPY; + +/* + * GUC support + */ +const struct config_enum_entry createdb_file_copy_method_options[] = { + {"copy", CREATEDB_FILE_COPY_METHOD_COPY, false}, +#if (defined(HAVE_COPYFILE) &&
Re: Revert: Remove useless self-joins *and* -DREALLOCATE_BITMAPSETS make server crash, regress test fail.
On 2024-05-07 Tu 06:05, Richard Guo wrote: On Tue, May 7, 2024 at 1:22 PM David Rowley wrote: It would be good to get some build farm coverage of this so we don't have to rely on manual testing. I wonder if it's a good idea to just define REALLOCATE_BITMAPSETS when #ifdef CLOBBER_FREED_MEMORY... or if we should ask on the buildfarm-members list if someone wouldn't mind defining it? +1 to have build farm coverage of REALLOCATE_BITMAPSETS. This flag seems quite useful. I have added it to the CPPFLAGS on prion. cheers andrew -- Andrew Dunstan EDB:https://www.enterprisedb.com
Re: Control flow in logical replication walsender
On Tue, Apr 30, 2024 at 11:28 PM Christophe Pettus wrote: > > I wanted to check my understanding of how control flows in a walsender doing > logical replication. My understanding is that the (single) thread in each > walsender process, in the simplest case, loops on: > > 1. Pull a record out of the WAL. > 2. Pass it to the recorder buffer code, which, > 3. Sorts it out into the appropriate in-memory structure for that transaction > (spilling to disk as required), and then continues with #1, or, > 4. If it's a commit record, it iteratively passes the transaction data one > change at a time to, > 5. The logical decoding plugin, which returns the output format of that > plugin, and then, > 6. The walsender sends the output from the plugin to the client. It cycles on > passing the data to the plugin and sending it to the client until it runs out > of changes in that transaction, and then resumes reading the WAL in #1. > > In particular, I wanted to confirm that while it is pulling the reordered > transaction and sending it to the plugin (and thence to the client), that > particular walsender is *not* reading new WAL records or putting them in the > reorder buffer. > > The specific issue I'm trying to track down is an enormous pileup of spill > files. This is in a non-supported version of PostgreSQL (v11), so an upgrade > may fix it, but at the moment, I'm trying to find a cause and a mitigation. > In PG-14, we have added a feature in logical replication to stream long in-progress transactions which should reduce spilling to a good extent. You might want to try that. -- With Regards, Amit Kapila.
Re: Control flow in logical replication walsender
On Tue, May 7, 2024 at 9:51 AM Ashutosh Bapat wrote: > > On Tue, May 7, 2024 at 12:00 AM Christophe Pettus wrote: >> >> Thank you for the reply! >> >> > On May 1, 2024, at 02:18, Ashutosh Bapat >> > wrote: >> > Is there a large transaction which is failing to be replicated repeatedly >> > - timeouts, crashes on upstream or downstream? >> >> AFAIK, no, although I am doing this somewhat by remote control (I don't have >> direct access to the failing system). This did bring up one other question, >> though: >> >> Are subtransactions written to their own individual reorder buffers (and >> thus potentially spill files), or are they appended to the topmost >> transaction's reorder buffer? > > > IIRC, they have their own RB, > Right. > but once they commit, they are transferred to topmost transaction's RB. > I don't think they are transferred to topmost transaction's RB. We perform a k-way merge between transactions/subtransactions to retrieve the changes. See comments: "Support for efficiently iterating over a transaction's and its subtransactions' changes..." in reorderbuffer.c -- With Regards, Amit Kapila.
Re: bug: copy progress reporting of backends which run multiple COPYs
On Sat, Jan 21, 2023 at 02:45:40AM +0100, Matthias van de Meent wrote: > > Would you do anything different in the master branch, with no > > compatibility constraints ? I think the progress reporting would still > > be limited to one row per backend, not one per CopyFrom(). > > I think I would at least introduce another parameter to BeginCopyFrom > for progress reporting (instead of relying on pstate != NULL), like > how we have a bit in reindex_index's params->options that specifies > whether we want progress reporting (which is unset for parallel > workers iirc). This didn't get fixed for v16, and it seems unlikely that it'll be addressed in back branches. But while I was reviewing forgotten threads, it occurred to me to raise the issue in time to fix it for v17. -- Justin
pg_restore -N loses extension comment
pg_dump -Fc |pg_restore -l -N schema: | 2; 3079 18187 EXTENSION - pg_buffercache Without -N schema also shows: | 2562; 0 0 COMMENT - EXTENSION pg_buffercache I mean literal s-c-h-e-m-a, but I suppose anything else will work the same. BTW, I noticed that pg_restore -v shows that duplicate dependencies can be stored. We see things like this (and worse). | 4284; 1259 191439414 VIEW public wmg_server_view telsasoft | ; depends on: 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 612 23087 612 I see that's possible not only for views, but also tables. That's probaably wasteful of CPU, at least. -- Justin
Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Hello, Matthias and others! Updated WIP in attach. Changes are: * Renaming, now it feels better for me * More reliable approach in `GlobalVisHorizonKindForRel` to make sure we have not missed `rd_safeindexconcurrentlybuilding` by calling `RelationGetIndexList` if required * Optimization to avoid any additional `RelationGetIndexList` if zero of concurrently indexes are being built * TOAST moved to TODO, since looks like it is out of scope - but not sure yet, need to dive dipper TODO: * TOAST * docs and comments * make sure non-data tables are not affected * Per-database scope of optimization * Handle index building errors correctly in optimization code * More tests: create index, multiple re-indexes, multiple tables Thanks, Michail. From 63677046efc9b6a1d93f9248c6d9dce14a945a42 Mon Sep 17 00:00:00 2001 From: nkey Date: Tue, 7 May 2024 14:24:09 +0200 Subject: [PATCH v2] WIP: fix d9d076222f5b "VACUUM: ignore indexing operations with CONCURRENTLY" which was reverted by e28bb8851969. Issue was caused by absent of any snapshot actually protects the data in relation in the required to build index correctly. Introduce new type of visibility horizon to be used for relation with concurrently build indexes (in the case of "safe" index). Now `GlobalVisHorizonKindForRel` may dynamically decide which horizon to used base of the data about safe indexes being built concurrently. To reduce performance impact counter of concurrently built indexes updated in shared memory. --- src/backend/catalog/index.c | 36 ++ src/backend/commands/indexcmds.c | 20 +++ src/backend/storage/ipc/ipci.c | 2 + src/backend/storage/ipc/procarray.c | 88 - src/backend/utils/cache/relcache.c | 11 ++ src/bin/pg_amcheck/t/006_concurrently.pl | 155 +++ src/include/catalog/index.h | 5 + src/include/utils/rel.h | 1 + 8 files changed, 311 insertions(+), 7 deletions(-) create mode 100644 src/bin/pg_amcheck/t/006_concurrently.pl diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 5a8568c55c..3caa2bab12 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -97,6 +97,11 @@ typedef struct Oid pendingReindexedIndexes[FLEXIBLE_ARRAY_MEMBER]; } SerializedReindexState; +typedef struct { + pg_atomic_uint32 numSafeConcurrentlyBuiltIndexes; +} SafeICSharedState; +static SafeICSharedState *SafeICStateShmem; + /* non-export function prototypes */ static bool relationHasPrimaryKey(Relation rel); static TupleDesc ConstructTupleDescriptor(Relation heapRelation, @@ -176,6 +181,37 @@ relationHasPrimaryKey(Relation rel) return result; } + +void SafeICStateShmemInit(void) +{ + bool found; + + SafeICStateShmem = (SafeICSharedState *) + ShmemInitStruct("Safe Concurrently Build Indexes", + sizeof(SafeICSharedState), + &found); + + if (!IsUnderPostmaster) + { + Assert(!found); + pg_atomic_init_u32(&SafeICStateShmem->numSafeConcurrentlyBuiltIndexes, 0); + } else + Assert(found); +} + +void UpdateNumSafeConcurrentlyBuiltIndexes(bool increment) +{ + if (increment) + pg_atomic_fetch_add_u32(&SafeICStateShmem->numSafeConcurrentlyBuiltIndexes, 1); + else + pg_atomic_fetch_sub_u32(&SafeICStateShmem->numSafeConcurrentlyBuiltIndexes, 1); +} + +bool IsAnySafeIndexBuildsConcurrently() +{ + return pg_atomic_read_u32(&SafeICStateShmem->numSafeConcurrentlyBuiltIndexes) > 0; +} + /* * index_check_primary_key * Apply special checks needed before creating a PRIMARY KEY index diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index d9016ef487..663450ba20 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -1636,6 +1636,8 @@ DefineIndex(Oid tableId, * hold lock on the parent table. This might need to change later. */ LockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock); + if (safe_index && concurrent) + UpdateNumSafeConcurrentlyBuiltIndexes(true); PopActiveSnapshot(); CommitTransactionCommand(); @@ -1804,7 +1806,15 @@ DefineIndex(Oid tableId, * to replan; so relcache flush on the index itself was sufficient.) */ CacheInvalidateRelcacheByRelid(heaprelid.relId); + /* Commit index as valid before reducing counter of safe concurrently build indexes */ + CommitTransactionCommand(); + Assert(concurrent); + if (safe_index) + UpdateNumSafeConcurrentlyBuiltIndexes(false); + + /* Start a new transaction to finish process properly */ + StartTransactionCommand(); /* * Last thing to do is release the session-level lock on the parent table. */ @@ -3902,6 +3912,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein indexRel->rd_indpred == NIL); idx->tableId = RelationGetRelid(heapRel); idx->amId = indexRel->rd_rel->relam; + if (idx->safe) + UpdateNumSafeConcurrentlyBuiltIndexes(true); /* This function shouldn't be called for tem
Re: CREATE DATABASE with filesystem cloning
Hi, Nazir Bilal Yavuz wrote: >Any kind of feedback would be appreciated. I know it's coming from copy-and-paste, but I believe the flags could be: - dstfd = OpenTransientFile(tofile, O_RDWR | O_CREAT | O_EXCL | PG_BINARY); + dstfd = OpenTransientFile(tofile, O_CREAT | O_WRONLY | O_TRUNC | O_EXCL | PG_BINARY); The flags: O_WRONLY | O_TRUNC Allow the OS to make some optimizations, if you deem it possible. The flag O_RDWR indicates that the file can be read, which is not true in this case. The destination file will just be written. best regards, Ranier Vilela
Re: WHERE CURRENT OF with RLS quals that are ctid conditions
On Mon, May 6, 2024 at 7:31 PM Tom Lane wrote: > Robert pointed out [1] that the planner fails if we have $SUBJECT, > because tidpath.c can seize on the RLS-derived ctid constraint > instead of the CurrentOfExpr. Since the executor can only handle > CurrentOfExpr in a TidScan's tidquals, that leads to a confusing > runtime error. > > Here's a patch for that. > > However ... along the way to testing it, I found that you can only > get such an RLS qual to work if it accepts "(InvalidBlockNumber,0)", > because that's what the ctid field will look like in a > not-yet-stored-to-disk tuple. That's sufficiently weird, and so > unduly in bed with undocumented implementation details, that I can't > imagine anyone is actually using such an RLS condition or ever will. > So maybe this is not really worth fixing. Thoughts? Hmm, I thought the RLS condition needed to accept the old and new TIDs, but not (InvalidBlockNumber,0). I might well have misunderstood, though. As to whether this is worth fixing, I think it is, but it might not be worth back-patching the fix. Also, I'd really like to get disable_cost out of the picture here. That would require more code reorganization than you've done here, but I think it would be worthwhile. I suppose that could also be done as a separate patch, but I wonder if that doesn't just amount to changing approximately the same code twice. Or maybe it doesn't, and this is worth doing on its own. I'm not sure; I haven't coded what I have in mind yet. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_restore -N loses extension comment
Justin Pryzby writes: > pg_dump -Fc |pg_restore -l -N schema: > | 2; 3079 18187 EXTENSION - pg_buffercache > Without -N schema also shows: > | 2562; 0 0 COMMENT - EXTENSION pg_buffercache Hmm, but what happens if you actually do the restore? I think this may be a bug in -l mode: ProcessArchiveRestoreOptions saves the result of _tocEntryRequired in te->reqs, but PrintTOCSummary doesn't, and that will bollix its subsequent _tocEntryRequired checks for "dependent" TOC entries. regards, tom lane
Re: WHERE CURRENT OF with RLS quals that are ctid conditions
On Tue, May 7, 2024 at 9:47 AM Robert Haas wrote: > As to whether this is worth fixing, I think it is, but it might not be > worth back-patching the fix. Also, I'd really like to get disable_cost > out of the picture here. That would require more code reorganization > than you've done here, but I think it would be worthwhile. I suppose > that could also be done as a separate patch, but I wonder if that > doesn't just amount to changing approximately the same code twice. > > Or maybe it doesn't, and this is worth doing on its own. I'm not sure; > I haven't coded what I have in mind yet. Never mind all this. I think what I have in mind requires doing what you did first. So if you're happy with what you've got, I'd go for it. -- Robert Haas EDB: http://www.enterprisedb.com
Re: WHERE CURRENT OF with RLS quals that are ctid conditions
Robert Haas writes: > On Mon, May 6, 2024 at 7:31 PM Tom Lane wrote: >> So maybe this is not really worth fixing. Thoughts? > Hmm, I thought the RLS condition needed to accept the old and new > TIDs, but not (InvalidBlockNumber,0). I might well have misunderstood, > though. If you leave the (InvalidBlockNumber,0) alternative out of the RLS condition, my patch's test case fails because the row "doesn't satisfy the RLS condition" (I forget the exact error message, but it was more or less that). > As to whether this is worth fixing, I think it is, but it might not be > worth back-patching the fix. Also, I'd really like to get disable_cost > out of the picture here. That would require more code reorganization > than you've done here, but I think it would be worthwhile. I suppose > that could also be done as a separate patch, but I wonder if that > doesn't just amount to changing approximately the same code twice. No, because the disable_cost stuff is nowhere near here. In any case, what we were talking about was suppressing creation of competing non-TIDScan paths. It's still going to be incumbent on tidpath.c to create a correct path, and as things stand it won't for this case. regards, tom lane
Re: WHERE CURRENT OF with RLS quals that are ctid conditions
Robert Haas writes: > Never mind all this. I think what I have in mind requires doing what > you did first. So if you're happy with what you've got, I'd go for it. OK. HEAD-only sounds like a good compromise. Barring objections, I'll do that later today. regards, tom lane
Re: New GUC autovacuum_max_threshold ?
On Wed, May 1, 2024 at 10:03 PM David Rowley wrote: > Here are some of the problems that I know about: > > 1. Autovacuum has exactly zero forward vision and operates reactively > rather than proactively. This "blind operating" causes tables to > either not need vacuumed or suddenly need vacuumed without any > consideration of how busy autovacuum is at that current moment. > 2. There is no prioritisation for the order in which tables are autovacuumed. > 3. With the default scale factor, the larger a table becomes, the more > infrequent the autovacuums. > 4. Autovacuum is more likely to trigger when the system is busy > because more transaction IDs are being consumed and there is more DML > occurring. This results in autovacuum having less work to do during > quiet periods when there are more free resources to be doing the > vacuum work. I agree with all of these points. For a while now, I've been thinking that we really needed a prioritization scheme, so that we don't waste our time on low-priority tasks when there are high-priority tasks that need to be completed. But lately I've started to think that what matters most is the rate at which autovacuum work is happening overall. I feel like prioritization is mostly going to matter when we're not keeping up, and I think the primary goal should be to keep up. I think we could use the same data to make both decisions -- if autovacuum were proactive rather than reactive, that would mean that we know something about what is going to happen in the future, and I think that data could be used both to decide whether we're keeping up, and also to prioritize. But if I had to pick a first target, I'd forget about trying to make things happen in the right order and just try to make sure we get all the things done. > I think we need at least 1a) before we can give autovacuum more work > to do, especially if we do something like multiply its workload by > 1024x, per your comment above. I guess I view it differently. It seems to me that right now, we're not vacuuming large tables often enough. We should fix that, independently of anything else. If the result is that small and medium sized tables get vacuumed less often, then that just means there were never enough resources to go around in the first place. We haven't taken a system that was working fine and broken it: we've just moved the problem from one category of tables (the big ones) to a different category of tables. If the user wants to solve that problem, they need to bump up the cost limit or add hardware. I don't see that we have any particular reason to believe such users will be worse off on average than they are today. On the other hand, users who do have a sufficiently high cost limit and enough hardware will be better off, because we'll start doing all the vacuuming work that needs to be done instead of only some of it. Now, if we start vacuuming any class of table whatsoever 1024x as often as we do today, we are going to lose. But that would still be true even if we did everything on your list. Large tables need to be vacuumed more frequently than we now do, but not THAT much more frequently. Any system that produces that result is just using a wrong algorithm, or wrong constants, or something. Even if all the necessary resources are available, nobody is going to thank us for vacuuming gigantic tables in a tight loop. The problem with such a large increase is not that we don't have prioritization, but that such a large increase is fundamentally the wrong thing to do. On the other hand, I think a more modest increase is the right thing to do, and I think it's the right thing to do whether we have prioritization or not. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_restore -N loses extension comment
I wrote: > I think this may be a bug in -l mode: ProcessArchiveRestoreOptions > saves the result of _tocEntryRequired in te->reqs, but PrintTOCSummary > doesn't, and that will bollix its subsequent _tocEntryRequired checks > for "dependent" TOC entries. Yeah ... the attached seems to fix it. regards, tom lane diff --git a/src/bin/pg_dump/pg_backup_archiver.c b/src/bin/pg_dump/pg_backup_archiver.c index c6c101c118..56e0688154 100644 --- a/src/bin/pg_dump/pg_backup_archiver.c +++ b/src/bin/pg_dump/pg_backup_archiver.c @@ -1319,10 +1319,13 @@ PrintTOCSummary(Archive *AHX) curSection = SECTION_PRE_DATA; for (te = AH->toc->next; te != AH->toc; te = te->next) { + /* This bit must match ProcessArchiveRestoreOptions' marking logic */ if (te->section != SECTION_NONE) curSection = te->section; + te->reqs = _tocEntryRequired(te, curSection, AH); + /* Now, should we print it? */ if (ropt->verbose || - (_tocEntryRequired(te, curSection, AH) & (REQ_SCHEMA | REQ_DATA)) != 0) + (te->reqs & (REQ_SCHEMA | REQ_DATA)) != 0) { char *sanitized_name; char *sanitized_schema;
PERIOD foreign key feature
In this commit: commit 34768ee3616 Author: Peter Eisentraut Date: Sun Mar 24 07:37:13 2024 +0100 Add temporal FOREIGN KEY contraints Add PERIOD clause to foreign key constraint definitions. This is supported for range and multirange types. Temporal foreign keys check for range containment instead of equality. This feature matches the behavior of the SQL standard temporal foreign keys, but it works on PostgreSQL's native ranges instead of SQL's "periods", which don't exist in PostgreSQL (yet). Reference actions ON {UPDATE,DELETE} {CASCADE,SET NULL,SET DEFAULT} are not supported yet. Author: Paul A. Jungwirth Reviewed-by: Peter Eisentraut Reviewed-by: jian he Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mdhcy4_qq0+noc...@mail.gmail.com this text was added to create_table.sgml: In addition, the referenced table must have a primary key or unique constraint declared with WITHOUT --> OVERLAPS. Finally, if one side of the foreign key --> uses PERIOD, the other side must too. If the refcolumn list is omitted, the WITHOUT OVERLAPS part of the primary key is treated as if marked with PERIOD. In the two marked lines, it says "if one side of the foreign key uses PERIOD, the other side must too." However, looking at the example queries, it seems like if the foreign side has PERIOD, the primary side must have WITHOUT OVERLAPS, not PERIOD. Does this doc text need correcting? -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Only you can decide what is important to you.
Re: PERIOD foreign key feature
On Tue, May 7, 2024 at 7:54 AM Bruce Momjian wrote: > In this commit: > > commit 34768ee3616 > Author: Peter Eisentraut > Date: Sun Mar 24 07:37:13 2024 +0100 > > Add temporal FOREIGN KEY contraints > > Add PERIOD clause to foreign key constraint definitions. This > is > supported for range and multirange types. Temporal foreign > keys check > for range containment instead of equality. > > This feature matches the behavior of the SQL standard temporal > foreign > keys, but it works on PostgreSQL's native ranges instead of > SQL's > "periods", which don't exist in PostgreSQL (yet). > > Reference actions ON {UPDATE,DELETE} {CASCADE,SET NULL,SET > DEFAULT} > are not supported yet. > > Author: Paul A. Jungwirth > Reviewed-by: Peter Eisentraut > Reviewed-by: jian he > Discussion: > https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mdhcy4_qq0+noc...@mail.gmail.com > > this text was added to create_table.sgml: > > In addition, the referenced table must have a primary > key or unique constraint declared with WITHOUT > --> OVERLAPS. Finally, if one side of the foreign key > --> uses PERIOD, the other side must too. If the > refcolumn list is > omitted, the WITHOUT OVERLAPS part of the > primary key is treated as if marked with PERIOD. > > In the two marked lines, it says "if one side of the foreign key uses > PERIOD, the other side must too." However, looking at the example > queries, it seems like if the foreign side has PERIOD, the primary side > must have WITHOUT OVERLAPS, not PERIOD. > > Does this doc text need correcting? > > The text is factually correct, though a bit hard to parse. "the other side" refers to the part after "REFERENCES": FOREIGN KEY ( column_name [, ... ] [, PERIOD column_name ] ) REFERENCES reftable [ ( refcolumn [, ... ] [, PERIOD column_name ] ) ] ***(shouldn't the second occurrence be [, PERIOD refcolum] ?) The text is pointing out that since the refcolumn specification is optional you may very well not see a second PERIOD keyword in the clause. Instead it will be inferred from the PK. Maybe: Finally, if the foreign key has a PERIOD column_name specification the corresponding refcolumn, if present, must also be marked PERIOD. If the refcolumn clause is omitted, and thus the reftable's primary key constraint chosen, the primary key must have its final column marked WITHOUT OVERLAPS. David J.
Re: allow changing autovacuum_max_workers without restarting
On Fri, May 03, 2024 at 12:57:18PM +, Imseih (AWS), Sami wrote: >> That's true, but using a hard-coded limit means we no longer need to add a >> new GUC. Always allocating, say, 256 slots might require a few additional >> kilobytes of shared memory, most of which will go unused, but that seems >> unlikely to be a problem for the systems that will run Postgres v18. > > I agree with this. Here's what this might look like. I chose an upper limit of 1024, which seems like it "ought to be enough for anybody," at least for now. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 72e0496294ef0390c77cef8031ae51c1a44ebde8 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Tue, 7 May 2024 10:59:24 -0500 Subject: [PATCH v3 1/1] allow changing autovacuum_max_workers without restarting --- doc/src/sgml/config.sgml | 3 +- doc/src/sgml/runtime.sgml | 15 --- src/backend/access/transam/xlog.c | 2 +- src/backend/postmaster/autovacuum.c | 44 --- src/backend/postmaster/postmaster.c | 2 +- src/backend/storage/lmgr/proc.c | 9 ++-- src/backend/utils/init/postinit.c | 20 ++--- src/backend/utils/misc/guc_tables.c | 7 ++- src/backend/utils/misc/postgresql.conf.sample | 1 - src/include/postmaster/autovacuum.h | 8 src/include/utils/guc_hooks.h | 2 - 11 files changed, 58 insertions(+), 55 deletions(-) diff --git a/doc/src/sgml/config.sgml b/doc/src/sgml/config.sgml index e93208b2e6..8e2a1d6902 100644 --- a/doc/src/sgml/config.sgml +++ b/doc/src/sgml/config.sgml @@ -8528,7 +8528,8 @@ COPY postgres_log FROM '/full/path/to/logfile.csv' WITH csv; Specifies the maximum number of autovacuum processes (other than the autovacuum launcher) that may be running at any one time. The default -is three. This parameter can only be set at server start. +is three. This parameter can only be set in the +postgresql.conf file or on the server command line. diff --git a/doc/src/sgml/runtime.sgml b/doc/src/sgml/runtime.sgml index 6047b8171d..8a672a8383 100644 --- a/doc/src/sgml/runtime.sgml +++ b/doc/src/sgml/runtime.sgml @@ -781,13 +781,13 @@ psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: No such SEMMNI Maximum number of semaphore identifiers (i.e., sets) -at least ceil((max_connections + autovacuum_max_workers + max_wal_senders + max_worker_processes + 5) / 16) plus room for other applications +at least ceil((max_connections + max_wal_senders + max_worker_processes + 1029) / 16) plus room for other applications SEMMNS Maximum number of semaphores system-wide -ceil((max_connections + autovacuum_max_workers + max_wal_senders + max_worker_processes + 5) / 16) * 17 plus room for other applications +ceil((max_connections + max_wal_senders + max_worker_processes + 1029) / 16) * 17 plus room for other applications @@ -838,7 +838,7 @@ psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: No such When using System V semaphores, PostgreSQL uses one semaphore per allowed connection (), allowed autovacuum worker process -() and allowed background +(1024) and allowed background process (), in sets of 16. Each such set will also contain a 17th semaphore which contains a magic @@ -846,13 +846,14 @@ psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: No such other applications. The maximum number of semaphores in the system is set by SEMMNS, which consequently must be at least as high as max_connections plus -autovacuum_max_workers plus max_wal_senders, -plus max_worker_processes, plus one extra for each 16 +max_wal_senders, +plus max_worker_processes, plus 1024 for autovacuum +worker processes, plus one extra for each 16 allowed connections plus workers (see the formula in ). The parameter SEMMNI determines the limit on the number of semaphore sets that can exist on the system at one time. Hence this parameter must be at -least ceil((max_connections + autovacuum_max_workers + max_wal_senders + max_worker_processes + 5) / 16). +least ceil((max_connections + max_wal_senders + max_worker_processes + 1029) / 16). Lowering the number of allowed connections is a temporary workaround for failures, which are usually confusingly worded No space @@ -883,7 +884,7 @@ psql: error: connection to server on socket "/tmp/.s.PGSQL.5432" failed: No such When using POSIX semaphores, the number of semaphores needed is the same as for System V, that is one semaphore per allowed connection (), allowed autovacuum worker process -() and allowed background +(1024) and allowed background pro
Re: PERIOD foreign key feature
On 5/7/24 08:23, David G. Johnston wrote: On Tue, May 7, 2024 at 7:54 AM Bruce Momjian mailto:br...@momjian.us>> wrote: In the two marked lines, it says "if one side of the foreign key uses PERIOD, the other side must too." However, looking at the example queries, it seems like if the foreign side has PERIOD, the primary side must have WITHOUT OVERLAPS, not PERIOD. Does this doc text need correcting? The text is factually correct, though a bit hard to parse. "the other side" refers to the part after "REFERENCES": FOREIGN KEY ( column_name [, ... ] [, PERIOD column_name ] ) REFERENCES reftable [ ( refcolumn [, ... ] [, PERIOD column_name ] ) ] ***(shouldn't the second occurrence be [, PERIOD refcolum] ?) The text is pointing out that since the refcolumn specification is optional you may very well not see a second PERIOD keyword in the clause. Instead it will be inferred from the PK. Maybe: Finally, if the foreign key has a PERIOD column_name specification the corresponding refcolumn, if present, must also be marked PERIOD. If the refcolumn clause is omitted, and thus the reftable's primary key constraint chosen, the primary key must have its final column marked WITHOUT OVERLAPS. Yes, David is correct here on all points. I like his suggestion to clarify the language here also. If you need a patch from me let me know, but I assume it's something a committer can just make happen? Yours, -- Paul ~{:-) p...@illuminatedcomputing.com
Re: pg_sequence_last_value() for unlogged sequences on standbys
On Sat, May 04, 2024 at 06:45:32PM +0900, Michael Paquier wrote: > On Fri, May 03, 2024 at 05:22:06PM -0400, Tom Lane wrote: >> Nathan Bossart writes: >>> IIUC this would cause other sessions' temporary sequences to appear in the >>> view. Is that desirable? >> >> I assume Michael meant to move the test into the C code, not drop >> it entirely --- I agree we don't want that. > > Yup. I meant to remove it from the script and keep only something in > the C code to avoid the duplication, but you're right that the temp > sequences would create more noise than now. > >> Moving it has some attraction, but pg_is_other_temp_schema() is also >> used in a lot of information_schema views, so we couldn't get rid of >> it without a lot of further hacking. Not sure we want to relocate >> that filter responsibility in just one view. > > Okay. Okay, so are we okay to back-patch something like v1? Or should we also return NULL for other sessions' temporary schemas on primaries? That would change the condition to something like char relpersist = seqrel->rd_rel->relpersistence; if (relpersist == RELPERSISTENCE_PERMANENT || (relpersist == RELPERSISTENCE_UNLOGGED && !RecoveryInProgress()) || !RELATION_IS_OTHER_TEMP(seqrel)) { ... } I personally think that would be fine to back-patch since pg_sequences already filters it out anyway. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: pg_sequence_last_value() for unlogged sequences on standbys
Nathan Bossart writes: > Okay, so are we okay to back-patch something like v1? Or should we also > return NULL for other sessions' temporary schemas on primaries? That would > change the condition to something like > char relpersist = seqrel->rd_rel->relpersistence; > if (relpersist == RELPERSISTENCE_PERMANENT || > (relpersist == RELPERSISTENCE_UNLOGGED && > !RecoveryInProgress()) || > !RELATION_IS_OTHER_TEMP(seqrel)) > { > ... > } Should be AND'ing not OR'ing the !TEMP condition, no? Also I liked your other formulation of the persistence check better. > I personally think that would be fine to back-patch since pg_sequences > already filters it out anyway. +1 to include that, as it offers a defense if someone invokes this function directly. In HEAD we could then rip out the test in the view. BTW, I think you also need something like - int64 result; + int64 result = 0; Your compiler may not complain about result being possibly uninitialized, but IME others will. regards, tom lane
Re: Use pgstat_kind_infos to read fixed shared stats structs
On Tue May 7, 2024 at 1:01 AM CDT, Michael Paquier wrote: On Tue, May 07, 2024 at 12:44:51AM -0500, Tristan Partin wrote: > Thanks for the feedback Michael. Should I just throw the patch in the next > commitfest so it doesn't get left behind? Better to do so, yes. I have noted this thread in my TODO list, but we're a couple of weeks away from the next CF and things tend to get easily forgotten. Added here: https://commitfest.postgresql.org/48/4978/ -- Tristan Partin Neon (https://neon.tech)
Re: Control flow in logical replication walsender
> On May 7, 2024, at 05:02, Amit Kapila wrote: > > > In PG-14, we have added a feature in logical replication to stream > long in-progress transactions which should reduce spilling to a good > extent. You might want to try that. That's been my principal recommendation (since that would also allow controlling the amount of logical replication working memory). Thank you!
Re: 2024-05-09 release announcement draft
On Mon, May 06, 2024 at 11:09:24PM -0400, Jonathan S. Katz wrote: > * Avoid leaking a query result from > [`psql`](https://www.postgresql.org/docs/current/app-psql.html) > after the query is cancelled. I'd delete this item about a psql-lifespan memory leak, because (a) it's so minor and (b) there are other reasonable readings of "leak" that would make the change look more important.
Re: Revert: Remove useless self-joins *and* -DREALLOCATE_BITMAPSETS make server crash, regress test fail.
Andrew Dunstan writes: > On 2024-05-07 Tu 06:05, Richard Guo wrote: >> +1 to have build farm coverage of REALLOCATE_BITMAPSETS. This flag >> seems quite useful. > I have added it to the CPPFLAGS on prion. ... and as expected, prion fell over. I find that Richard's proposed fix makes the core regression tests pass, but we still fail check-world. So I'm afraid we need something more aggressive, like the attached which makes make_restrictinfo copy all its input bitmapsets. Without that, we still have sharing of bitmapsets across different RestrictInfos, which seems pretty scary given what we now see about the effects of 00b41463c. This seems annoyingly expensive, but maybe there's little choice? Given this, we could remove ad-hoc bms_copy calls from the callers of make_restrictinfo, distribute_quals_to_rels, etc. I didn't go looking for possible wins of that sort; there's unlikely to be a lot of them. regards, tom lane diff --git a/src/backend/optimizer/util/restrictinfo.c b/src/backend/optimizer/util/restrictinfo.c index 0b406e9334..e81861bc8b 100644 --- a/src/backend/optimizer/util/restrictinfo.c +++ b/src/backend/optimizer/util/restrictinfo.c @@ -132,8 +132,8 @@ make_restrictinfo_internal(PlannerInfo *root, restrictinfo->is_clone = is_clone; restrictinfo->can_join = false; /* may get set below */ restrictinfo->security_level = security_level; - restrictinfo->incompatible_relids = incompatible_relids; - restrictinfo->outer_relids = outer_relids; + restrictinfo->incompatible_relids = bms_copy(incompatible_relids); + restrictinfo->outer_relids = bms_copy(outer_relids); /* * If it's potentially delayable by lower-level security quals, figure out @@ -191,7 +191,7 @@ make_restrictinfo_internal(PlannerInfo *root, /* required_relids defaults to clause_relids */ if (required_relids != NULL) - restrictinfo->required_relids = required_relids; + restrictinfo->required_relids = bms_copy(required_relids); else restrictinfo->required_relids = restrictinfo->clause_relids;
Re: Use pgstat_kind_infos to read fixed shared stats structs
Hi, On 2024-05-06 14:07:53 -0500, Tristan Partin wrote: > Instead of needing to be explicit, we can just iterate the > pgstat_kind_infos array to find the memory locations to read into. > This was originally thought of by Andres in > 5891c7a8ed8f2d3d577e7eea34dacff12d7b6bbd. > > Not a fix, per se, but it does remove a comment. Perhaps the discussion will > just lead to someone deleting the comment, and keeping the code as is. > Either way, a win :). I think it's a good idea. I'd really like to allow extensions to register new types of stats eventually. Stuff like pg_stat_statements having its own, fairly ... mediocre, stats storage shouldn't be necessary. Do we need to increase the stats version, I didn't check if the order we currently store things in and the numerical order of the stats IDs are the same. Greetings, Andres Freund
Re: pg_sequence_last_value() for unlogged sequences on standbys
On Tue, May 07, 2024 at 01:44:16PM -0400, Tom Lane wrote: > Nathan Bossart writes: >> char relpersist = seqrel->rd_rel->relpersistence; > >> if (relpersist == RELPERSISTENCE_PERMANENT || >> (relpersist == RELPERSISTENCE_UNLOGGED && >> !RecoveryInProgress()) || >> !RELATION_IS_OTHER_TEMP(seqrel)) >> { >> ... >> } > > Should be AND'ing not OR'ing the !TEMP condition, no? Also I liked > your other formulation of the persistence check better. Yes, that's a silly mistake on my part. I changed it to if ((RelationIsPermanent(seqrel) || !RecoveryInProgress()) && !RELATION_IS_OTHER_TEMP(seqrel)) { ... } in the attached v2. >> I personally think that would be fine to back-patch since pg_sequences >> already filters it out anyway. > > +1 to include that, as it offers a defense if someone invokes this > function directly. In HEAD we could then rip out the test in the > view. I apologize for belaboring this point, but I don't see how we would be comfortable removing that check unless we are okay with other sessions' temporary sequences appearing in the view, albeit with a NULL last_value. This check lives in the WHERE clause today, so if we remove it, we'd no longer exclude those sequences. Michael and you seem united on this, so I have a sinking feeling that I'm missing something terribly obvious. > BTW, I think you also need something like > > - int64 result; > + int64 result = 0; > > Your compiler may not complain about result being possibly > uninitialized, but IME others will. Good call. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 974f56896add92983b664c11fd25010ef29ac42c Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Tue, 30 Apr 2024 20:54:51 -0500 Subject: [PATCH v2 1/1] Fix pg_sequence_last_value() for non-permanent sequences on standbys. --- src/backend/commands/sequence.c | 22 -- src/test/recovery/t/001_stream_rep.pl | 8 2 files changed, 24 insertions(+), 6 deletions(-) diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c index 46103561c3..9d7468d7bb 100644 --- a/src/backend/commands/sequence.c +++ b/src/backend/commands/sequence.c @@ -1780,8 +1780,8 @@ pg_sequence_last_value(PG_FUNCTION_ARGS) Buffer buf; HeapTupleData seqtuple; Form_pg_sequence_data seq; - bool is_called; - int64 result; + bool is_called = false; + int64 result = 0; /* open and lock sequence */ init_sequence(relid, &elm, &seqrel); @@ -1792,12 +1792,22 @@ pg_sequence_last_value(PG_FUNCTION_ARGS) errmsg("permission denied for sequence %s", RelationGetRelationName(seqrel; - seq = read_seq_tuple(seqrel, &buf, &seqtuple); + /* + * For the benefit of the pg_sequences system view, we return NULL for + * temporary and unlogged sequences on standbys instead of throwing an + * error. We also always return NULL for other sessions' temporary + * sequences. + */ + if ((RelationIsPermanent(seqrel) || !RecoveryInProgress()) && + !RELATION_IS_OTHER_TEMP(seqrel)) + { + seq = read_seq_tuple(seqrel, &buf, &seqtuple); - is_called = seq->is_called; - result = seq->last_value; + is_called = seq->is_called; + result = seq->last_value; - UnlockReleaseBuffer(buf); + UnlockReleaseBuffer(buf); + } sequence_close(seqrel, NoLock); if (is_called) diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl index 5311ade509..4c698b5ce1 100644 --- a/src/test/recovery/t/001_stream_rep.pl +++ b/src/test/recovery/t/001_stream_rep.pl @@ -95,6 +95,14 @@ $result = $node_standby_2->safe_psql('postgres', "SELECT * FROM seq1"); print "standby 2: $result\n"; is($result, qq(33|0|t), 'check streamed sequence content on standby 2'); +# Check pg_sequence_last_value() returns NULL for unlogged sequence on standby +$node_primary->safe_psql('postgres', + "CREATE UNLOGGED SEQUENCE ulseq; SELECT nextval('ulseq')"); +$node_primary->wait_for_replay_catchup($node_standby_1); +is($node_standby_1->safe_psql('postgres', + "SELECT pg_sequence_last_value('ulseq'::regclass) IS NULL"), + 't', 'pg_sequence_last_value() on unlogged sequence on standby 1'); + # Check that only READ-only queries can run on standbys is($node_standby_1->psql('postgres', 'INSERT INTO tab_int VALUES (1)'), 3, 'read-only queries on standby 1'); -- 2.25.1
Re: Weird test mixup
On Tue, May 07, 2024 at 10:17:49AM +0900, Michael Paquier wrote: > On Mon, May 06, 2024 at 02:23:24PM -0700, Noah Misch wrote: > > Here's how I've patched it locally. It does avoid changing the > > backend-side, > > which has some attraction. Shall I just push this? > > It looks like you did not rebase on top of HEAD Yes, the base was 713cfaf (Sunday). > A side effect is that this causes the conditions to pile > up on a running server when running installcheck, and assuming that > many test suites are run on a server left running this could cause > spurious failures when failing to find a new slot. Yes, we'd be raising INJ_MAX_CONDITION more often under this approach. > Always resetting > condition->name when detaching a point is a simpler flow and saner > IMO. > > Overall, this switches from one detach behavior to a different one, Can you say more about that? The only behavior change known to me is that a given injection point workload uses more of INJ_MAX_CONDITION. If there's another behavior change, it was likely unintended. > which may or may not be intuitive depending on what one is looking > for. FWIW, I see InjectionPointCondition as something that should be > around as long as its injection point exists, with the condition > entirely gone once the point is detached because it should not exist > anymore on the server running, with no information left in shmem. > > Through your patch, you make conditions have a different meaning, with > a mix of "local" definition, but it is kind of permanent as it keeps a > trace of the point's name in shmem. I find the behavior of the patch > less intuitive. Perhaps it would be interesting to see first the bug > and/or problem you are trying to tackle with this different behavior > as I feel like we could do something even with the module as-is. As > far as I understand, the implementation of the module on HEAD allows > one to emulate a breakpoint with a wait/wake, which can avoid the > window mentioned in step 2. Even if a wait point is detached > concurrently, it can be awaken with its traces in shmem removed. The problem I'm trying to tackle in this thread is to make src/test/modules/gin installcheck-safe. $SUBJECT's commit 5105c90 started that work, having seen the intarray test suite break when run concurrently with the injection_points test suite. That combination still does break at the exit-time race condition. To reproduce, apply this attachment to add sleeps, and run: make -C src/test/modules/gin installcheck USE_MODULE_DB=1 & sleep 2; make -C contrib/intarray installcheck USE_MODULE_DB=1 Separately, I see injection_points_attach() populates InjectionPointCondition after InjectionPointAttach(). Shouldn't InjectionPointAttach() come last, to avoid the same sort of race? I've not tried to reproduce that one. diff --git a/src/test/modules/gin/expected/gin_incomplete_splits.out b/src/test/modules/gin/expected/gin_incomplete_splits.out index 15574e5..4bc5ef1 100644 --- a/src/test/modules/gin/expected/gin_incomplete_splits.out +++ b/src/test/modules/gin/expected/gin_incomplete_splits.out @@ -126,6 +126,12 @@ SELECT injection_points_attach('gin-leave-leaf-split-incomplete', 'error'); select insert_until_fail(:next_i) as next_i \gset NOTICE: failed with: error triggered for injection point gin-leave-leaf-split-incomplete +SELECT pg_sleep(10); + pg_sleep +-- + +(1 row) + SELECT injection_points_detach('gin-leave-leaf-split-incomplete'); injection_points_detach - diff --git a/src/test/modules/gin/sql/gin_incomplete_splits.sql b/src/test/modules/gin/sql/gin_incomplete_splits.sql index ebf0f62..fb66bac 100644 --- a/src/test/modules/gin/sql/gin_incomplete_splits.sql +++ b/src/test/modules/gin/sql/gin_incomplete_splits.sql @@ -118,6 +118,7 @@ select verify(:next_i); SELECT injection_points_attach('gin-leave-leaf-split-incomplete', 'error'); select insert_until_fail(:next_i) as next_i \gset +SELECT pg_sleep(10); SELECT injection_points_detach('gin-leave-leaf-split-incomplete'); -- Verify that a scan works even though there's an incomplete split diff --git a/src/test/modules/injection_points/injection_points.c b/src/test/modules/injection_points/injection_points.c index a74a4a2..f9e8a1f 100644 --- a/src/test/modules/injection_points/injection_points.c +++ b/src/test/modules/injection_points/injection_points.c @@ -18,6 +18,7 @@ #include "postgres.h" #include "fmgr.h" +#include "libpq/pqsignal.h" #include "miscadmin.h" #include "storage/condition_variable.h" #include "storage/dsm_registry.h" @@ -213,6 +214,14 @@ injection_points_cleanup(int code, Datum arg) void injection_error(const char *name) { + if (strcmp(name, "gin-leave-leaf-split-incomplete") == 0 && + !injection_point_allowed(name)) + { + sigprocmask(SIG_SETMASK, &BlockSig, NULL); + pg_usleep(20 * USECS_PER_SEC); /* let other suite detach */ + sigprocmask
Re: backend stuck in DataFileExtend
On Tue, May 07, 2024 at 10:55:28AM +1200, Thomas Munro wrote: > On Tue, May 7, 2024 at 6:21 AM Justin Pryzby wrote: > > FWIW: both are running zfs-2.2.3 RPMs from zfsonlinux.org. > ... > > Yes, they're running centos7 with the indicated kernels. > > So far we've got: > > * spurious EIO when opening a file (your previous report) > * hanging with CPU spinning (?) inside pwritev() > * old kernel, bleeding edge ZFS > > From an (uninformed) peek at the ZFS code, if it really is spinning > there is seems like a pretty low level problem: it's finish the write, > and now is just trying to release (something like our unpin) and > unlock the buffers, which involves various code paths that might touch > various mutexes and spinlocks, and to get stuck like that I guess it's > either corrupted itself or it is deadlocking against something else, > but what? Do you see any other processes (including kernel threads) > with any stuck stacks that might be a deadlock partner? Sorry, but even after forgetting several times, I finally remembered to go back to issue, and already rebooted the VM as needed to kill the stuck process. But .. it seems to have recurred again this AM. Note that yesterday, I'd taken the opportunity to upgrade to zfs-2.2.4. These two procs are the oldest active postgres procs, and also (now) adjacent in ps -ef --sort start_time. -[ RECORD 1 ]+ backend_start| 2024-05-07 09:45:06.228528-06 application_name | xact_start | 2024-05-07 09:55:38.409549-06 query_start | 2024-05-07 09:55:38.409549-06 state_change | 2024-05-07 09:55:38.409549-06 pid | 27449 backend_type | autovacuum worker wait_event_type | BufferPin wait_event | BufferPin state| active backend_xid | backend_xmin | 4293757489 query_id | left | autovacuum: VACUUM ANALYZE child. -[ RECORD 2 ]+ backend_start| 2024-05-07 09:49:24.686314-06 application_name | MasterMetricsLoader -n -m Xml xact_start | 2024-05-07 09:50:30.387156-06 query_start | 2024-05-07 09:50:32.744435-06 state_change | 2024-05-07 09:50:32.744436-06 pid | 5051 backend_type | client backend wait_event_type | IO wait_event | DataFileExtend state| active backend_xid | 4293757489 backend_xmin | 4293757429 query_id | 2230046159014513529 left | PREPARE mml_0 AS INSERT INTO chil The earlier proc is doing: strace: Process 27449 attached epoll_wait(11, ^Cstrace: Process 27449 detached The later process is stuck, with: [pryzbyj@telsasoft-centos7 ~]$ sudo cat /proc/5051/stack [] 0x For good measure: [pryzbyj@telsasoft-centos7 ~]$ sudo cat /proc/27433/stack [] taskq_thread+0x48e/0x4e0 [spl] [] kthread+0xd1/0xe0 [] ret_from_fork_nospec_end+0x0/0x39 [] 0x [pryzbyj@telsasoft-centos7 ~]$ sudo cat /proc/27434/stack [] taskq_thread+0x48e/0x4e0 [spl] [] kthread+0xd1/0xe0 [] ret_from_fork_nospec_end+0x0/0x39 [] 0x [pryzbyj@telsasoft-centos7 ~]$ ps -O wchan 5051 27449 PID === S TTY TIME COMMAND 5051 ? R ?02:14:27 postgres: telsasoft ts ::1(53708) INSERT 27449 ep_poll S ?00:05:16 postgres: autovacuum worker ts The interesting procs might be: ps -eO wchan===,lstart --sort start_time ... 15632 worker_thread Mon May 6 23:51:34 2024 S ?00:00:00 [kworker/2:2H] 27433 taskq_thread Tue May 7 09:35:59 2024 S ?00:00:56 [z_wr_iss] 27434 taskq_thread Tue May 7 09:35:59 2024 S ?00:00:57 [z_wr_iss] 27449 ep_pollTue May 7 09:45:05 2024 S ?00:05:16 postgres: autovacuum worker ts 5051 ? Tue May 7 09:49:23 2024 R ?02:23:04 postgres: telsasoft ts ::1(53708) INSERT 7861 ep_pollTue May 7 09:51:25 2024 S ?00:03:04 /usr/local/bin/python3.8 -u /home/telsasoft/server/alarms/core/pr... 7912 ep_pollTue May 7 09:51:27 2024 S ?00:00:00 postgres: telsasoft ts ::1(53794) idle 24518 futex_wait_que Tue May 7 10:42:56 2024 S ?00:00:55 java -jar /home/telsasoft/server/alarms/alcatel_lucent/jms/jms2rm... ... > While looking around for reported issues I found your abandoned report > against an older ZFS version from a few years ago, same old Linux > version: > > https://github.com/openzfs/zfs/issues/11641 > > I don't know enough to say anything useful about that but it certainly > smells similar... Wow - I'd completely forgotten about that problem report. The symptoms are the same, even with a zfs version 3+ years newer. I wish the ZFS people would do more with their problem reports. BTW, we'll be upgrading this VM to a newer kernel, if not a newer OS (for some reason, these projects take a very long time). With any luck, it'll either recur, or not. I'm
Re: pg_sequence_last_value() for unlogged sequences on standbys
Nathan Bossart writes: > On Tue, May 07, 2024 at 01:44:16PM -0400, Tom Lane wrote: >> +1 to include that, as it offers a defense if someone invokes this >> function directly. In HEAD we could then rip out the test in the >> view. > I apologize for belaboring this point, but I don't see how we would be > comfortable removing that check unless we are okay with other sessions' > temporary sequences appearing in the view, albeit with a NULL last_value. Oh! You're right, I'm wrong. I was looking at the CASE filter, which we could get rid of -- but the "WHERE NOT pg_is_other_temp_schema(N.oid)" part has to stay. regards, tom lane
Re: Use pgstat_kind_infos to read fixed shared stats structs
On Tue May 7, 2024 at 1:29 PM CDT, Andres Freund wrote: Hi, On 2024-05-06 14:07:53 -0500, Tristan Partin wrote: > Instead of needing to be explicit, we can just iterate the > pgstat_kind_infos array to find the memory locations to read into. > This was originally thought of by Andres in > 5891c7a8ed8f2d3d577e7eea34dacff12d7b6bbd. > > Not a fix, per se, but it does remove a comment. Perhaps the discussion will > just lead to someone deleting the comment, and keeping the code as is. > Either way, a win :). I think it's a good idea. I'd really like to allow extensions to register new types of stats eventually. Stuff like pg_stat_statements having its own, fairly ... mediocre, stats storage shouldn't be necessary. Could be useful for Neon in the future too. Do we need to increase the stats version, I didn't check if the order we currently store things in and the numerical order of the stats IDs are the same. I checked the orders, and they looked the same. 1. Archiver 2. BgWriter 3. Checkpointer 4. IO 5. SLRU 6. WAL -- Tristan Partin Neon (https://neon.tech)
Re: pg_sequence_last_value() for unlogged sequences on standbys
On Tue, May 07, 2024 at 03:02:01PM -0400, Tom Lane wrote: > Nathan Bossart writes: >> On Tue, May 07, 2024 at 01:44:16PM -0400, Tom Lane wrote: >>> +1 to include that, as it offers a defense if someone invokes this >>> function directly. In HEAD we could then rip out the test in the >>> view. > >> I apologize for belaboring this point, but I don't see how we would be >> comfortable removing that check unless we are okay with other sessions' >> temporary sequences appearing in the view, albeit with a NULL last_value. > > Oh! You're right, I'm wrong. I was looking at the CASE filter, which > we could get rid of -- but the "WHERE NOT pg_is_other_temp_schema(N.oid)" > part has to stay. Okay, phew. We can still do something like v3-0002 for v18. I'll give Michael a chance to comment on 0001 before committing/back-patching that one. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com >From 2a37834699587eef18b50bf8d58723790bbcdde7 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Tue, 30 Apr 2024 20:54:51 -0500 Subject: [PATCH v3 1/2] Fix pg_sequence_last_value() for non-permanent sequences on standbys. --- src/backend/commands/sequence.c | 22 -- src/test/recovery/t/001_stream_rep.pl | 8 2 files changed, 24 insertions(+), 6 deletions(-) diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c index 46103561c3..9d7468d7bb 100644 --- a/src/backend/commands/sequence.c +++ b/src/backend/commands/sequence.c @@ -1780,8 +1780,8 @@ pg_sequence_last_value(PG_FUNCTION_ARGS) Buffer buf; HeapTupleData seqtuple; Form_pg_sequence_data seq; - bool is_called; - int64 result; + bool is_called = false; + int64 result = 0; /* open and lock sequence */ init_sequence(relid, &elm, &seqrel); @@ -1792,12 +1792,22 @@ pg_sequence_last_value(PG_FUNCTION_ARGS) errmsg("permission denied for sequence %s", RelationGetRelationName(seqrel; - seq = read_seq_tuple(seqrel, &buf, &seqtuple); + /* + * For the benefit of the pg_sequences system view, we return NULL for + * temporary and unlogged sequences on standbys instead of throwing an + * error. We also always return NULL for other sessions' temporary + * sequences. + */ + if ((RelationIsPermanent(seqrel) || !RecoveryInProgress()) && + !RELATION_IS_OTHER_TEMP(seqrel)) + { + seq = read_seq_tuple(seqrel, &buf, &seqtuple); - is_called = seq->is_called; - result = seq->last_value; + is_called = seq->is_called; + result = seq->last_value; - UnlockReleaseBuffer(buf); + UnlockReleaseBuffer(buf); + } sequence_close(seqrel, NoLock); if (is_called) diff --git a/src/test/recovery/t/001_stream_rep.pl b/src/test/recovery/t/001_stream_rep.pl index 5311ade509..4c698b5ce1 100644 --- a/src/test/recovery/t/001_stream_rep.pl +++ b/src/test/recovery/t/001_stream_rep.pl @@ -95,6 +95,14 @@ $result = $node_standby_2->safe_psql('postgres', "SELECT * FROM seq1"); print "standby 2: $result\n"; is($result, qq(33|0|t), 'check streamed sequence content on standby 2'); +# Check pg_sequence_last_value() returns NULL for unlogged sequence on standby +$node_primary->safe_psql('postgres', + "CREATE UNLOGGED SEQUENCE ulseq; SELECT nextval('ulseq')"); +$node_primary->wait_for_replay_catchup($node_standby_1); +is($node_standby_1->safe_psql('postgres', + "SELECT pg_sequence_last_value('ulseq'::regclass) IS NULL"), + 't', 'pg_sequence_last_value() on unlogged sequence on standby 1'); + # Check that only READ-only queries can run on standbys is($node_standby_1->psql('postgres', 'INSERT INTO tab_int VALUES (1)'), 3, 'read-only queries on standby 1'); -- 2.25.1 >From b96d1f21f6144640561360c84b361f569a2edc48 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Tue, 7 May 2024 14:35:34 -0500 Subject: [PATCH v3 2/2] Simplify pg_sequences a bit. XXX: NEEDS CATVERSION BUMP --- src/backend/catalog/system_views.sql | 6 +- src/backend/commands/sequence.c | 15 +-- src/test/regress/expected/rules.out | 5 + 3 files changed, 7 insertions(+), 19 deletions(-) diff --git a/src/backend/catalog/system_views.sql b/src/backend/catalog/system_views.sql index 53047cab5f..b32e5c3170 100644 --- a/src/backend/catalog/system_views.sql +++ b/src/backend/catalog/system_views.sql @@ -176,11 +176,7 @@ CREATE VIEW pg_sequences AS S.seqincrement AS increment_by, S.seqcycle AS cycle, S.seqcache AS cache_size, -CASE -WHEN has_sequence_privilege(C.oid, 'SELECT,USAGE'::text) -THEN pg_sequence_last_value(C.oid) -ELSE NULL -END AS last_value +pg_sequence_last_value(C.oid) AS last_value FROM pg_sequence S JOIN pg_class C ON (C.oid = S.seqrelid) LEFT JOIN pg_namespace N ON (N.oid = C.relnamespace) WHERE NOT pg_is_other_temp_schema(N.oid) diff --git a/src/backend/commands/sequence.c b/src/backend/commands/sequence.c index 9d7468d7bb..f129375915 100644 --- a/src/backend/commands/sequ
Re: On disable_cost
On Mon, May 6, 2024 at 4:30 PM Peter Geoghegan wrote: > FWIW I always found those weird inconsistencies to be annoying at > best, and confusing at worst. I speak as somebody that uses > disable_cost a lot. > > I certainly wouldn't ask anybody to make it a priority for that reason > alone -- it's not *that* bad. I've given my opinion on this because > it's already under discussion. Thanks, it's good to have other perspectives. Here are some patches for discussion. 0001 gets rid of disable_cost as a mechanism for forcing a TID scan plan to be chosen when CurrentOfExpr is present. Instead, it arranges to generate only the valid path when that case occurs, and skip everything else. I think this is a good cleanup, and it doesn't seem totally impossible that it actually prevents a failure in some extreme case. 0002 cleans up the behavior of enable_indexscan and enable_indexonlyscan. Currently, setting enable_indexscan=false adds disable_cost to both the cost of index scans and the cost of index-only scans. I think that's indefensible and, in fact, a bug, although I believe David Rowley disagrees. With this patch, we simply don't generate index scans if enable_indexscan=false, and we don't generate index-only scans if enable_indexonlyscan=false, which seems a lot more consistent to me. However, I did revise one major thing from the patch I posted before, per feedback from David Rowley and also per my own observations: in this version, if enable_indexscan=true and enable_indexonlyscan=false, we'll generate index-scan paths for any cases where, with both set to true, we would have only generated index-only scan paths. That change makes the behavior of this patch a lot more comprehensible and intuitive: the only regression test changes are places where somebody expected that they could disable both index scans and index-only scans by setting enable_indexscan=false. 0003 and 0004 extend the approach of "just don't generate the disabled path" to bitmap scans and gather merge, respectively. I think these are more debatable, mostly because it's not clear how far we can really take this approach. Neither breaks any test cases, and 0003 is closely related to the work done in 0002, which seems like a point in its favor. 0004 was simply the only other case where it was obvious to me that this kind of approach made sense. In my view, it makes most sense to use this kind of approach for planner behaviors that seem like they're sort of optional: like if you don't use gather merge, you can still use gather, and if you don't use index scans, you can still use sequential scans. With all these patches applied, the remaining cases where we rely on disable_cost are: sequential scans sorts hash aggregation all 3 join types hash joins where a bucket holding the inner MCV would exceed hash_mem Sequential scans are clearly a last-ditch method. I find it a bit hard to decide whether hashing or sorting is the default, especially giving the asymmetry between enable_sort - presumptively anywhere - and enable_hashagg - specific to aggregation. As for the join types, it's tempting to consider nested-loop the default type -- it's the only way to satisfy parameterizations, for instance -- but the fact that it's the only method that can't do a full join undermines that position in my book. But, I don't want to pretend like I have all the answers here, either; I'm just sharing some thoughts. -- Robert Haas EDB: http://www.enterprisedb.com 0004-When-enable_gathermerge-false-don-t-generate-gather-.patch Description: Binary data 0002-Don-t-generate-index-scan-paths-when-enable_indexsca.patch Description: Binary data 0003-When-enable_bitmapscan-false-just-don-t-generate-bit.patch Description: Binary data 0001-Remove-grotty-use-of-disable_cost-for-TID-scan-plans.patch Description: Binary data
Re: Revisiting {CREATE INDEX, REINDEX} CONCURRENTLY improvements
Hi again! Made an error in `GlobalVisHorizonKindForRel` logic, and it was caught by a new test. Fixed version in attach. > From 9a8ea366f6d2d144979e825c4ac0bdd2937bf7c1 Mon Sep 17 00:00:00 2001 From: nkey Date: Tue, 7 May 2024 22:10:56 +0200 Subject: [PATCH v3] WIP: fix d9d076222f5b "VACUUM: ignore indexing operations with CONCURRENTLY" which was reverted by e28bb8851969. Issue was caused by absent of any snapshot actually protects the data in relation in the required to build index correctly. Introduce new type of visibility horizon to be used for relation with concurrently build indexes (in the case of "safe" index). Now `GlobalVisHorizonKindForRel` may dynamically decide which horizon to used base of the data about safe indexes being built concurrently. To reduce performance impact counter of concurrently built indexes updated in shared memory. --- src/backend/catalog/index.c | 36 ++ src/backend/commands/indexcmds.c | 20 +++ src/backend/storage/ipc/ipci.c | 2 + src/backend/storage/ipc/procarray.c | 85 - src/backend/utils/cache/relcache.c | 11 ++ src/bin/pg_amcheck/t/006_concurrently.pl | 155 +++ src/include/catalog/index.h | 5 + src/include/utils/rel.h | 1 + 8 files changed, 309 insertions(+), 6 deletions(-) create mode 100644 src/bin/pg_amcheck/t/006_concurrently.pl diff --git a/src/backend/catalog/index.c b/src/backend/catalog/index.c index 5a8568c55c..3caa2bab12 100644 --- a/src/backend/catalog/index.c +++ b/src/backend/catalog/index.c @@ -97,6 +97,11 @@ typedef struct Oid pendingReindexedIndexes[FLEXIBLE_ARRAY_MEMBER]; } SerializedReindexState; +typedef struct { + pg_atomic_uint32 numSafeConcurrentlyBuiltIndexes; +} SafeICSharedState; +static SafeICSharedState *SafeICStateShmem; + /* non-export function prototypes */ static bool relationHasPrimaryKey(Relation rel); static TupleDesc ConstructTupleDescriptor(Relation heapRelation, @@ -176,6 +181,37 @@ relationHasPrimaryKey(Relation rel) return result; } + +void SafeICStateShmemInit(void) +{ + bool found; + + SafeICStateShmem = (SafeICSharedState *) + ShmemInitStruct("Safe Concurrently Build Indexes", + sizeof(SafeICSharedState), + &found); + + if (!IsUnderPostmaster) + { + Assert(!found); + pg_atomic_init_u32(&SafeICStateShmem->numSafeConcurrentlyBuiltIndexes, 0); + } else + Assert(found); +} + +void UpdateNumSafeConcurrentlyBuiltIndexes(bool increment) +{ + if (increment) + pg_atomic_fetch_add_u32(&SafeICStateShmem->numSafeConcurrentlyBuiltIndexes, 1); + else + pg_atomic_fetch_sub_u32(&SafeICStateShmem->numSafeConcurrentlyBuiltIndexes, 1); +} + +bool IsAnySafeIndexBuildsConcurrently() +{ + return pg_atomic_read_u32(&SafeICStateShmem->numSafeConcurrentlyBuiltIndexes) > 0; +} + /* * index_check_primary_key * Apply special checks needed before creating a PRIMARY KEY index diff --git a/src/backend/commands/indexcmds.c b/src/backend/commands/indexcmds.c index d9016ef487..663450ba20 100644 --- a/src/backend/commands/indexcmds.c +++ b/src/backend/commands/indexcmds.c @@ -1636,6 +1636,8 @@ DefineIndex(Oid tableId, * hold lock on the parent table. This might need to change later. */ LockRelationIdForSession(&heaprelid, ShareUpdateExclusiveLock); + if (safe_index && concurrent) + UpdateNumSafeConcurrentlyBuiltIndexes(true); PopActiveSnapshot(); CommitTransactionCommand(); @@ -1804,7 +1806,15 @@ DefineIndex(Oid tableId, * to replan; so relcache flush on the index itself was sufficient.) */ CacheInvalidateRelcacheByRelid(heaprelid.relId); + /* Commit index as valid before reducing counter of safe concurrently build indexes */ + CommitTransactionCommand(); + Assert(concurrent); + if (safe_index) + UpdateNumSafeConcurrentlyBuiltIndexes(false); + + /* Start a new transaction to finish process properly */ + StartTransactionCommand(); /* * Last thing to do is release the session-level lock on the parent table. */ @@ -3902,6 +3912,8 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein indexRel->rd_indpred == NIL); idx->tableId = RelationGetRelid(heapRel); idx->amId = indexRel->rd_rel->relam; + if (idx->safe) + UpdateNumSafeConcurrentlyBuiltIndexes(true); /* This function shouldn't be called for temporary relations. */ if (indexRel->rd_rel->relpersistence == RELPERSISTENCE_TEMP) @@ -4345,6 +4357,14 @@ ReindexRelationConcurrently(const ReindexStmt *stmt, Oid relationOid, const Rein UnlockRelationIdForSession(lockrelid, ShareUpdateExclusiveLock); } + // now we may clear safe index building flags + foreach(lc, newIndexIds) + { + ReindexIndexInfo *newidx = lfirst(lc); + if (newidx->safe) + UpdateNumSafeConcurrentlyBuiltIndexes(false); + } + /* Start a new transaction to finish process properly */ StartTransactionCommand(); diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc
Re: partitioning and identity column
On 30.04.24 12:59, Ashutosh Bapat wrote: PFA patch which fixes all the three problems. I have committed this patch. Thanks all.
Re: [PATCH] json_lex_string: don't overread on bad UTF8
On Mon, May 6, 2024 at 8:43 PM Michael Paquier wrote: > On Fri, May 03, 2024 at 07:05:38AM -0700, Jacob Champion wrote: > > We could port something like that to src/common. IMO that'd be more > > suited for an actual conversion routine, though, as opposed to a > > parser that for the most part assumes you didn't lie about the input > > encoding and is just trying not to crash if you're wrong. Most of the > > time, the parser just copies bytes between delimiters around and it's > > up to the caller to handle encodings... the exceptions to that are the > > \u escapes and the error handling. > > Hmm. That would still leave the backpatch issue at hand, which is > kind of confusing to leave as it is. Would it be complicated to > truncate the entire byte sequence in the error message and just give > up because we cannot do better if the input byte sequence is > incomplete? Maybe I've misunderstood, but isn't that what's being done in v2? > > Maybe I'm missing > > code somewhere, but I don't see a conversion routine from > > json_errdetail() to the actual client/locale encoding. (And the parser > > does not support multibyte input_encodings that contain ASCII in trail > > bytes.) > > Referring to json_lex_string() that does UTF-8 -> ASCII -> give-up in > its conversion for FRONTEND, I guess? Yep. This limitation looks > like a problem, especially if plugging that to libpq. Okay. How we deal with that will likely guide the "optimal" fix to error reporting, I think... Thanks, --Jacob
Re: New GUC autovacuum_max_threshold ?
On Tue, May 07, 2024 at 10:31:00AM -0400, Robert Haas wrote: > On Wed, May 1, 2024 at 10:03 PM David Rowley wrote: >> I think we need at least 1a) before we can give autovacuum more work >> to do, especially if we do something like multiply its workload by >> 1024x, per your comment above. > > I guess I view it differently. It seems to me that right now, we're > not vacuuming large tables often enough. We should fix that, > independently of anything else. If the result is that small and medium > sized tables get vacuumed less often, then that just means there were > never enough resources to go around in the first place. We haven't > taken a system that was working fine and broken it: we've just moved > the problem from one category of tables (the big ones) to a different > category of tables. If the user wants to solve that problem, they need > to bump up the cost limit or add hardware. I don't see that we have > any particular reason to believe such users will be worse off on > average than they are today. On the other hand, users who do have a > sufficiently high cost limit and enough hardware will be better off, > because we'll start doing all the vacuuming work that needs to be done > instead of only some of it. > > Now, if we start vacuuming any class of table whatsoever 1024x as > often as we do today, we are going to lose. But that would still be > true even if we did everything on your list. Large tables need to be > vacuumed more frequently than we now do, but not THAT much more > frequently. Any system that produces that result is just using a wrong > algorithm, or wrong constants, or something. Even if all the necessary > resources are available, nobody is going to thank us for vacuuming > gigantic tables in a tight loop. The problem with such a large > increase is not that we don't have prioritization, but that such a > large increase is fundamentally the wrong thing to do. On the other > hand, I think a more modest increase is the right thing to do, and I > think it's the right thing to do whether we have prioritization or > not. This is about how I feel, too. In any case, I +1'd a higher default because I think we need to be pretty conservative with these changes, at least until we have a better prioritization strategy. While folks may opt to set this value super low, I think that's more likely to lead to some interesting secondary effects. If the default is high, hopefully these secondary effects will be minimized or avoided. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Weird test mixup
On Tue, May 07, 2024 at 11:53:10AM -0700, Noah Misch wrote: > On Tue, May 07, 2024 at 10:17:49AM +0900, Michael Paquier wrote: > > Overall, this switches from one detach behavior to a different one, > > Can you say more about that? The only behavior change known to me is that a > given injection point workload uses more of INJ_MAX_CONDITION. If there's > another behavior change, it was likely unintended. I see patch inplace030-inj-exit-race-v1.patch does not fix the race seen with repro-inj-exit-race-v1.patch. I withdraw inplace030-inj-exit-race-v1.patch, and I withdraw the above question. > To reproduce, apply [repro-inj-exit-race-v1.patch] to add > sleeps, and run: > > make -C src/test/modules/gin installcheck USE_MODULE_DB=1 & sleep 2; make -C > contrib/intarray installcheck USE_MODULE_DB=1 > > Separately, I see injection_points_attach() populates InjectionPointCondition > after InjectionPointAttach(). Shouldn't InjectionPointAttach() come last, to > avoid the same sort of race? I've not tried to reproduce that one.
Re: Revert: Remove useless self-joins *and* -DREALLOCATE_BITMAPSETS make server crash, regress test fail.
On Wed, 8 May 2024 at 06:20, Tom Lane wrote: > I find that Richard's proposed fix makes the core regression tests > pass, but we still fail check-world. So I'm afraid we need something > more aggressive, like the attached which makes make_restrictinfo > copy all its input bitmapsets. Without that, we still have sharing > of bitmapsets across different RestrictInfos, which seems pretty > scary given what we now see about the effects of 00b41463c. This > seems annoyingly expensive, but maybe there's little choice? We could make the policy copy-on-modify. If you put bms_copy around the bms_del_member() calls in remove_rel_from_query(), does it pass then? David
Re: Revert: Remove useless self-joins *and* -DREALLOCATE_BITMAPSETS make server crash, regress test fail.
On Wed, 8 May 2024 at 10:35, David Rowley wrote: > > On Wed, 8 May 2024 at 06:20, Tom Lane wrote: > > I find that Richard's proposed fix makes the core regression tests > > pass, but we still fail check-world. So I'm afraid we need something > > more aggressive, like the attached which makes make_restrictinfo > > copy all its input bitmapsets. Without that, we still have sharing > > of bitmapsets across different RestrictInfos, which seems pretty > > scary given what we now see about the effects of 00b41463c. This > > seems annoyingly expensive, but maybe there's little choice? > > We could make the policy copy-on-modify. If you put bms_copy around > the bms_del_member() calls in remove_rel_from_query(), does it pass > then? err, I mean bms_copy() the set before passing to bms_del_member(). David
Re: Revert: Remove useless self-joins *and* -DREALLOCATE_BITMAPSETS make server crash, regress test fail.
David Rowley writes: > On Wed, 8 May 2024 at 06:20, Tom Lane wrote: >> I find that Richard's proposed fix makes the core regression tests >> pass, but we still fail check-world. So I'm afraid we need something >> more aggressive, like the attached which makes make_restrictinfo >> copy all its input bitmapsets. Without that, we still have sharing >> of bitmapsets across different RestrictInfos, which seems pretty >> scary given what we now see about the effects of 00b41463c. This >> seems annoyingly expensive, but maybe there's little choice? > We could make the policy copy-on-modify. If you put bms_copy around > the bms_del_member() calls in remove_rel_from_query(), does it pass > then? Didn't test, but that route seems awfully invasive and fragile: how will we find all the places to modify, or ensure that the policy is followed by future patches? regards, tom lane
Re: Revert: Remove useless self-joins *and* -DREALLOCATE_BITMAPSETS make server crash, regress test fail.
On Wed, 8 May 2024 at 10:40, Tom Lane wrote: > > David Rowley writes: > > We could make the policy copy-on-modify. If you put bms_copy around > > the bms_del_member() calls in remove_rel_from_query(), does it pass > > then? > > Didn't test, but that route seems awfully invasive and fragile: how > will we find all the places to modify, or ensure that the policy > is followed by future patches? REALLOCATE_BITMAPSETS was invented for this and IMO, it found exactly the problem it was invented to find. Copy-on-modify is our policy for node mutation. Why is it ok there but awfully fragile here? David
Re: Revert: Remove useless self-joins *and* -DREALLOCATE_BITMAPSETS make server crash, regress test fail.
David Rowley writes: > On Wed, 8 May 2024 at 10:40, Tom Lane wrote: >> Didn't test, but that route seems awfully invasive and fragile: how >> will we find all the places to modify, or ensure that the policy >> is followed by future patches? > REALLOCATE_BITMAPSETS was invented for this and IMO, it found exactly > the problem it was invented to find. Not in a way that gives me any confidence that we found *all* the problems. If check-world finds a problem that the core tests did not, then there's no reason to think there aren't still more issues that check-world happened not to trip over either. > Copy-on-modify is our policy for node mutation. Why is it ok there but > awfully fragile here? It's only partly our policy: there are all those places where we don't do it that way. The main problem that I see for trying to be 100% consistent in that way is that once you modify a sub-node, full copy-on-modify dictates replacing every ancestor node all the way to the top of the tree. That's clearly impractical in the planner data structures. So where are we going to stop exactly? regards, tom lane
Re: Revert: Remove useless self-joins *and* -DREALLOCATE_BITMAPSETS make server crash, regress test fail.
On Wed, 8 May 2024 at 10:55, Tom Lane wrote: > > David Rowley writes: > > REALLOCATE_BITMAPSETS was invented for this and IMO, it found exactly > > the problem it was invented to find. > > Not in a way that gives me any confidence that we found *all* the > problems. Here are some statements I believe to be true: 1. If REALLOCATE_BITMAPSETS is defined then modifications to a Bitmapset will make a copy and free the original. 2. If a query runs successfully without REALLOCATE_BITMAPSETS and Assert fails due to an invalid Bitmapset when REALLOCATE_BITMAPSETS is defined, then we have > 1 pointer pointing to the same set and not all of them are being updated when the members are added/removed. Given the above, I can't see what Bitmapset sharing problems we won't find with REALLOCATE_BITMAPSETS. Can you share the exact scenario you're worried that we won't find so I can understand your concern? David
Re: Revert: Remove useless self-joins *and* -DREALLOCATE_BITMAPSETS make server crash, regress test fail.
David Rowley writes: > On Wed, 8 May 2024 at 10:55, Tom Lane wrote: >> Not in a way that gives me any confidence that we found *all* the >> problems. > Here are some statements I believe to be true: > 1. If REALLOCATE_BITMAPSETS is defined then modifications to a > Bitmapset will make a copy and free the original. > 2. If a query runs successfully without REALLOCATE_BITMAPSETS and > Assert fails due to an invalid Bitmapset when REALLOCATE_BITMAPSETS is > defined, then we have > 1 pointer pointing to the same set and not all > of them are being updated when the members are added/removed. > Given the above, I can't see what Bitmapset sharing problems we won't > find with REALLOCATE_BITMAPSETS. Anything where the trouble spots are in a code path we fail to exercise with our available test suites. If you think there are no such code paths, I'm sorry to disillusion you. I spent a little bit of time wondering if we could find problems in a more static way by marking bitmapset fields as "const", but I fear that would create a huge number of false positives. regards, tom lane
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
Hi, Here are some review comments for patch v6-0001 == Commit message 1. This patch allows user to alter two_phase option /allows user/allows the user/ /to alter two_phase option/to alter the 'two_phase' option/ == doc/src/sgml/ref/alter_subscription.sgml 2. two_phase can be altered only for disabled subscription. SUGGEST The two_phase parameter can only be altered when the subscription is disabled. == src/backend/access/transam/twophase.c 3. checkGid + +/* + * checkGid + */ +static bool +checkGid(char *gid, Oid subid) +{ + int ret; + Oid subid_written, + xid; + + ret = sscanf(gid, "pg_gid_%u_%u", &subid_written, &xid); + + if (ret != 2 || subid != subid_written) + return false; + + return true; +} 3a. The function comment should give more explanation of what it does. I think this function is the counterpart of the TwoPhaseTransactionGid() function of worker.c so the comment can say that too. ~ 3b. Indeed, perhaps the function name should be similar to TwoPhaseTransactionGid. e.g. call it like IsTwoPhaseTransactionGidForSubid? ~ 3c. Probably 'xid' should be TransactionId instead of Oid. ~ 3d. Why not have a single return? SUGGESTION return (ret == 2 && subid = subid_written); ~ 3e. I am wondering if the existing TwoPhaseTransactionGid function currently in worker.c should be moved here because IMO these 2 functions belong together and twophase.c seems the right place to put them. ~~~ 4. +LookupGXactBySubid(Oid subid) +{ + bool found = false; + + LWLockAcquire(TwoPhaseStateLock, LW_SHARED); + for (int i = 0; i < TwoPhaseState->numPrepXacts; i++) + { + GlobalTransaction gxact = TwoPhaseState->prepXacts[i]; + + /* Ignore not-yet-valid GIDs. */ + if (gxact->valid && checkGid(gxact->gid, subid)) + { + found = true; + break; + } + } + LWLockRelease(TwoPhaseStateLock); + return found; +} AFAIK the gxact also has the 'xid' available, so won't it be better to pass BOTH the 'xid' and 'subid' to the checkGid so you can do a full comparison instead of comparing only the subid part of the gid? == src/backend/commands/subscriptioncmds.c 5. AlterSubscription + /* XXX */ + if (IsSet(opts.specified_opts, SUBOPT_TWOPHASE_COMMIT)) + { The "XXX" comment looks like it is meant to say something more... ~~~ 6. + /* + * two_phase can be only changed for disabled + * subscriptions + */ + if (form->subenabled) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("cannot set %s for enabled subscription", + "two_phase"))); 6a. Should this have a more comprehensive comment giving the reason like the 'failover' option has? ~~~ 6b. Maybe this should include a "translator" comment to say don't translate the option name. ~~~ 7. + /* Check whether the number of prepared transactions */ + if (!opts.twophase && + form->subtwophasestate == LOGICALREP_TWOPHASE_STATE_ENABLED && + LookupGXactBySubid(subid)) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("cannot disable two_phase when uncommitted prepared transactions present"))); + 7a. The first comment seems to be an incomplete sentence. I think it should say something a bit like: two_phase cannot be disabled if there are any uncommitted prepared transactions present. ~ 7b. Also, if ereport occurs what is the user supposed to do about it? Shouldn't the ereport include some errhint with appropriate advice? ~~~ 8. + /* + * The changed failover option of the slot can't be rolled + * back. + */ + PreventInTransactionBlock(isTopLevel, "ALTER SUBSCRIPTION ... SET (two_phase)"); + + /* Change system catalog acoordingly */ + values[Anum_pg_subscription_subtwophasestate - 1] = + CharGetDatum(opts.twophase ? + LOGICALREP_TWOPHASE_STATE_PENDING : + LOGICALREP_TWOPHASE_STATE_DISABLED); + replaces[Anum_pg_subscription_subtwophasestate - 1] = true; + } Typo I think: /failover option/two_phase option/ == .../libpqwalreceiver/libpqwalreceiver.c 9. static void libpqrcv_alter_slot(WalReceiverConn *conn, const char *slotname, - bool failover) + bool two_phase, bool failover) Same comment as mentioned elsewhere (#15), IMO the new 'two_phase' parameter should be last. == src/backend/replication/logical/launcher.c 10. +/* + * Stop all the subscription workers. + */ +void +logicalrep_workers_stop(Oid subid) +{ + List*subworkers; + ListCell *lc; + + LWLockAcquire(LogicalRepWorkerLock, LW_SHARED); + subworkers = logicalrep_workers_find(subid, false); + LWLockRelease(LogicalRepWorkerLock); + foreach(lc, subworkers) + { + LogicalRepWorker *w = (LogicalRepWorker *) lfirst(lc); + + logicalrep_worker_stop(w->subid, w->relid); + } + list_free(subworkers); +} I was confused by the logicalrep_workers_find(subid, false). IIUC the 'false' means everything (instead of 'only_running') but then I don't know why we want to "stop" anything that is NOT running. OTOH I see that this code was extracted from where it was previously inlined in subscriptioncmds.c, so maybe the 'false' is necessary for ano
Re: Cutting support for OpenSSL 1.0.1 and 1.0.2 in 17~?
On Tue, May 07, 2024 at 12:36:24PM +0200, Daniel Gustafsson wrote: > Fair enough. I've taken a stab at documenting that the functions are > deprecated, while at the same time documenting when and how they can be used > (and be useful). The attached also removes one additional comment in the > testcode which is now obsolete (since removing 1.0.1 support really), and > fixes > the spurious whitespace you detected upthread. + This function is deprecated and only present for backwards compatibility, + it does nothing. [...] +and + are maintained for backwards compatibility, but are no longer required + since PostgreSQL 18. LGTM, thanks for doing this! -- Michael signature.asc Description: PGP signature
Re: bug: copy progress reporting of backends which run multiple COPYs
On Tue, May 07, 2024 at 07:27:54AM -0500, Justin Pryzby wrote: > This didn't get fixed for v16, and it seems unlikely that it'll be > addressed in back branches. > > But while I was reviewing forgotten threads, it occurred to me to raise > the issue in time to fix it for v17. Thanks for the reminder. FWIW, I'm rather annoyed by the fact that we rely on the ParseState to decide if reporting should happen or not. file_fdw tells, even if that's accidental, that status reporting can be useful if working on a single table. So, shutting down the whole reporting if a caller if BeginCopyFrom() does not give a ParseState is too heavy-handed, IMO. The addition of report_progress in the COPY FROM state data is a good idea, though isn't that something we should give as an argument of BeginCopyFrom() instead if sticking this knowledge in COPY FROM? Different idea: could it be something worth controlling with a query-level option? It would then be possible to provide the same level of control for COPY TO which has reporting paths, given the application control over the reporting even with file_fdw, and store the value controlling the reporting in CopyFormatOptions. I am wondering if there would be a case where someone wants to do a COPY but hide entirely the reporting done. The query-level option has some appeal. -- Michael signature.asc Description: PGP signature
Re: backend stuck in DataFileExtend
On Wed, May 8, 2024 at 6:54 AM Justin Pryzby wrote: > On Tue, May 07, 2024 at 10:55:28AM +1200, Thomas Munro wrote: > > https://github.com/openzfs/zfs/issues/11641 > > > > I don't know enough to say anything useful about that but it certainly > > smells similar... > > Wow - I'd completely forgotten about that problem report. > The symptoms are the same, even with a zfs version 3+ years newer. > I wish the ZFS people would do more with their problem reports. If I had to guess, my first idea would be that your 1MB or ginormous 16MB recordsize (a relatively new option) combined with PostgreSQL's 8KB block-at-a-time random order I/O patterns are tickling strange corners and finding a bug that no one has seen before. I would imagine that almost everyone in the galaxy who uses very large records does so with 'settled' data that gets streamed out once sequentially (for example I think some of the OpenZFS maintainers are at Lawrence Livermore National Lab where I guess they might pump around petabytes of data produced by particle physics research or whatever it might be, probably why they they are also adding direct I/O to avoid caches completely...). But for us, if we have lots of backends reading, writing and extending random 8KB fragments of a 16MB page concurrently (2048 pages per record!), maybe we hit some broken edge... I'd be sure to include that sort of detail in any future reports. Normally I suppress urges to blame problems on kernels, file systems etc and in the past accusations that ZFS was buggy turned out to be bugs in PostgreSQL IIRC, but user space sure seems to be off the hook for this one...
Re: Use pgstat_kind_infos to read fixed shared stats structs
On Tue, May 07, 2024 at 02:07:42PM -0500, Tristan Partin wrote: > On Tue May 7, 2024 at 1:29 PM CDT, Andres Freund wrote: >> I think it's a good idea. I'd really like to allow extensions to register new >> types of stats eventually. Stuff like pg_stat_statements having its own, >> fairly ... mediocre, stats storage shouldn't be necessary. > > Could be useful for Neon in the future too. Interesting thing to consider. If you do that, I'm wondering if you could, actually, lift the restriction on pg_stat_statements.max and make it a SIGHUP so as it could be increased on-the-fly.. Hmm, just a thought in passing. >> Do we need to increase the stats version, I didn't check if the order we >> currently store things in and the numerical order of the stats IDs are the >> same. > > I checked the orders, and they looked the same. > > 1. Archiver > 2. BgWriter > 3. Checkpointer > 4. IO > 5. SLRU > 6. WAL Yup, I've looked at that yesterday and the read order remains the same so I don't see a need for a version bump. -- Michael signature.asc Description: PGP signature
Re: pg_sequence_last_value() for unlogged sequences on standbys
On Tue, May 07, 2024 at 02:39:42PM -0500, Nathan Bossart wrote: > Okay, phew. We can still do something like v3-0002 for v18. I'll give > Michael a chance to comment on 0001 before committing/back-patching that > one. What you are doing in 0001, and 0002 for v18 sounds fine to me. -- Michael signature.asc Description: PGP signature
Re: [PoC] Improve dead tuple storage for lazy vacuum
On Wed, May 1, 2024 at 4:29 PM John Naylor wrote: > > On Thu, Apr 25, 2024 at 8:36 AM Masahiko Sawada wrote: > > > > On Mon, Apr 15, 2024 at 6:12 PM John Naylor wrote: > > > > - RT_KEY_GET_SHIFT is not covered for key=0: > > > > > > https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/include/lib/radixtree.h.gcov.html#L803 > > > > > > That should be fairly simple to add to the tests. > > > > There are two paths to call RT_KEY_GET_SHIFT(): > > > > 1. RT_SET() -> RT_KEY_GET_SHIFT() > > 2. RT_SET() -> RT_EXTEND_UP() -> RT_KEY_GET_SHIFT() > > > > In both cases, it's called when key > tree->ctl->max_val. Since the > > minimum value of max_val is 255, RT_KEY_GET_SHIFT() is never called > > when key=0. > > Ah, right, so it is dead code. Nothing to worry about, but it does > point the way to some simplifications, which I've put together in the > attached. Thank you for the patch. It looks good to me. + /* compute the smallest shift that will allowing storing the key */ + start_shift = pg_leftmost_one_pos64(key) / RT_SPAN * RT_SPAN; The comment is moved from RT_KEY_GET_SHIFT() but I think s/will allowing storing/will allow storing/. > > > > - RT_DELETE: "if (key > tree->ctl->max_val)" is not covered: > > > > > > https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/include/lib/radixtree.h.gcov.html#L2644 > > > > > > That should be easy to add. > > > > Agreed. The patch is attached. > > LGTM > > > > - TidStoreCreate* has some memory clamps that are not covered: > > > > > > https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/access/common/tidstore.c.gcov.html#L179 > > > https://anarazel.de/postgres/cov/16-vs-HEAD-2024-04-14/src/backend/access/common/tidstore.c.gcov.html#L234 > > > > > > Maybe we could experiment with using 1MB for shared, and something > > > smaller for local. > > > > I've confirmed that the local and shared tidstore with small max sizes > > such as 4kB and 1MB worked. Currently the max size is hard-coded in > > test_tidstore.c but if we use work_mem as the max size, we can pass > > different max sizes for local and shared in the test script. > > Seems okay, do you want to try that and see how it looks? I've attached a simple patch for this. In test_tidstore.sql, we used to create two local tidstore and one shared tidstore. I thought of specifying small work_mem values for these three cases but it would remove the normal test cases. So I created separate tidstore for this test. Also, the new test is just to check if tidstore can be created with such a small size, but it might be a good idea to add some TIDs to check if it really works fine. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com use_work_mem_as_max_bytes.patch Description: Binary data
Re: Use WALReadFromBuffers in more places
Hi, Bharath. I've been testing this. It's cool. Is there any way we could monitor the hit rate about directly reading from WAL buffers by exporting to some views? --- Regards, Jingtang
Re: SQL function which allows to distinguish a server being in point in time recovery mode and an ordinary replica
On Mon, May 06, 2024 at 06:55:46PM +0300, m.litsa...@postgrespro.ru wrote: > as a first step I have introduced the `SharedRecoveryDataFlags` bitmask > instead of three boolean SharedHotStandbyActive, SharedPromoteIsTriggered > and SharedStandbyModeRequested flags (the last one from my previous patch) > and made minimal updates in corresponding code based on that change. Thanks for the patch. /* - * Local copy of SharedHotStandbyActive variable. False actually means "not + * Local copy of XLR_HOT_STANDBY_ACTIVE flag. False actually means "not * known, need to check the shared state". */ static bool LocalHotStandbyActive = false; /* - * Local copy of SharedPromoteIsTriggered variable. False actually means "not + * Local copy of XLR_PROMOTE_IS_TRIGGERED flag. False actually means "not * known, need to check the shared state". */ static bool LocalPromoteIsTriggered = false; It's a bit strange to have a bitwise set of flags in shmem while we keep these local copies as booleans. Perhaps it would be cleaner to merge both local variables into a single bits32 store? + uint32 SharedRecoveryDataFlags; I'd switch to bits32 for flags here. +bool +StandbyModeIsRequested(void) +{ + /* +* Spinlock is not needed here because XLR_STANDBY_MODE_REQUESTED flag +* can only be read after startup process is done. +*/ + return (XLogRecoveryCtl->SharedRecoveryDataFlags & XLR_STANDBY_MODE_REQUESTED) != 0; +} How about introducing a single wrapper function that returns the whole value SharedRecoveryDataFlags, with the flags published in a header? Sure, XLR_HOT_STANDBY_ACTIVE is not really exciting because being able to query a standby implies it, but XLR_PROMOTE_IS_TRIGGERED could be interesting? Then this could be used with a function that returns a text[] array with all the states retrieved? The refactoring pieces and the function pieces should be split, for clarity. -- Michael signature.asc Description: PGP signature
Re: [PATCH] json_lex_string: don't overread on bad UTF8
On Tue, May 07, 2024 at 02:06:10PM -0700, Jacob Champion wrote: > Maybe I've misunderstood, but isn't that what's being done in v2? Something a bit different.. I was wondering if it could be possible to tweak this code to truncate the data in the generated error string so as the incomplete multi-byte sequence is entirely cut out, which would come to setting token_terminator to "s" (last byte before the incomplete byte sequence) rather than "term" (last byte available, even if incomplete): #define FAIL_AT_CHAR_END(code) \ do { \ char *term = s + pg_encoding_mblen(lex->input_encoding, s); \ lex->token_terminator = (term <= end) ? term : s; \ return code; \ } while (0) But looking closer, I can see that in the JSON_INVALID_TOKEN case, when !tok_done, we set token_terminator to point to the end of the token, and that would include an incomplete byte sequence like in your case. :/ At the end of the day, I think that I'm OK with your patch and avoid the overread for now in the back-branches. This situation makes me uncomfortable and we should put more effort in printing error messages in a readable format, but that could always be tackled later as a separate problem.. And I don't see something backpatchable at short sight for v16. Thoughts and/or objections? -- Michael signature.asc Description: PGP signature