Re: POC: Parallel processing of indexes in autovacuum
On Thu, May 1, 2025 at 8:03 AM Masahiko Sawada wrote: > > As I understand it, we initially disabled parallel vacuum for > autovacuum because their objectives are somewhat contradictory. > Parallel vacuum aims to accelerate the process by utilizing additional > resources, while autovacuum is designed to perform cleaning operations > with minimal impact on foreground transaction processing (e.g., > through vacuum delay). > Yep, we also decided that we must not create more a/v workers for index processing. In current implementation, the leader process sends a signal to the a/v launcher, and the launcher tries to launch all requested workers. But the number of workers never exceeds `autovacuum_max_workers`. Thus, we will never have more a/v workers than in the standard case (without this feature). > Nevertheless, I see your point about the potential benefits of using > parallel vacuum within autovacuum in specific scenarios. The crucial > consideration is determining appropriate criteria for triggering > parallel vacuum in autovacuum. Given that we currently support only > parallel index processing, suitable candidates might be autovacuum > operations on large tables that have a substantial number of > sufficiently large indexes and a high volume of garbage tuples. > > Although the actual number of parallel workers ultimately depends on > the number of eligible indexes, it might be beneficial to introduce a > storage parameter, say parallel_vacuum_workers, that allows control > over the number of parallel vacuum workers on a per-table basis. > For now, we have three GUC variables for this purpose: max_parallel_index_autovac_workers, autovac_idx_parallel_min_rows, autovac_idx_parallel_min_indexes. That is, everything is as you said. But we are still conducting research on this issue. I would like to get rid of some of these parameters. > Regarding implementation: I notice the WIP patch implements its own > parallel vacuum mechanism for autovacuum. Have you considered simply > setting at_params.nworkers to a value greater than zero? > About `at_params.nworkers = N` - that's exactly what we're doing (you can see it in the `vacuum_rel` function). But we cannot fully reuse code of VACUUM PARALLEL, because it creates its own processes via dynamic bgworkers machinery. As I said above - we don't want to consume additional resources. Also we don't want to complicate communication between processes (the idea is that a/v workers can only send signals to the a/v launcher). As a result, we created our own implementation of parallel index processing control - see changes in vacuumparallel.c and autovacuum.c. -- Best regards, Daniil Davydov
Re: [PoC] Federated Authn/z with OAUTHBEARER
Jacob Champion writes: > On Fri, May 2, 2025 at 10:35 AM Tom Lane wrote: >> FWIW, on my Mac a meson build from HEAD goes through fine, with or >> without this patch. I'm getting openssl and libcurl from MacPorts >> not Homebrew, but I don't know why that would make any difference. > Do your and live in the same place? Mine do, > so I had to disable NLS to get this to break. Yeah, they are both under /opt/local/include in a MacPorts setup. But disabling NLS doesn't break it for me. I tried meson setup build --auto-features=disabled -Dlibcurl=enabled to make sure that /opt/local/include wasn't getting pulled in some other way, and it still builds. Apropos of that: our fine manual claims that option is spelled --auto_features, but that fails for me. Is that a typo in our manual, or do some meson versions accept the underscore? regards, tom lane
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Fri, May 02, 2025 at 10:42:25AM -0700, Jacob Champion wrote: > On Fri, May 2, 2025 at 10:35 AM Tom Lane wrote: >> FWIW, on my Mac a meson build from HEAD goes through fine, with or >> without this patch. I'm getting openssl and libcurl from MacPorts >> not Homebrew, but I don't know why that would make any difference. > > Do your and live in the same place? Mine do, > so I had to disable NLS to get this to break. I enabled NLS and the problem disappeared for me, but that seems to be a side effect of setting -Dextra_{include,lib}_dirs to point to my Homebrew directories, which I needed to do to get NLS to work. -- nathan
Assert("vacrel->eager_scan_remaining_successes > 0")
Hi, I hit the assertion failure in the subject line with the following script: create table t (a int) with (autovacuum_enabled = off); insert into t select generate_series(1, 1_000_000); vacuum t; insert into t select generate_series(1, 1_000_000); set vacuum_freeze_min_age to 0; vacuum t; When the success count reaches to 0, we disable the eager scan by resetting related fields as follows: /* * If we hit our success cap, permanently disable eager * scanning by setting the other eager scan management * fields to their disabled values. */ vacrel->eager_scan_remaining_fails = 0; vacrel->next_eager_scan_region_start = InvalidBlockNumber; vacrel->eager_scan_max_fails_per_region = 0; However, there is a possibility that we have already eagerly scanned another page and returned it to the read stream before we freeze the eagerly-scanned page and disable the eager scan. In this case, the next block that we retrieved from the read stream could also be an eagerly-scanned page. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Adding skip scan (including MDAM style range skip scan) to nbtree
On Fri, May 2, 2025 at 2:22 PM Peter Geoghegan wrote: > A slight variant of my fuzzing Python script did in fact go on to > detect a couple of bugs. > > I'm attaching a compressed SQL file with repros for 2 different bugs. > The first bug was independently detected by some kind of fuzzing > performed by Mark Dilger, reported elsewhere [1]. Picking up from the email with the big attachment... Both bugs are from commit 8a510275, "Further optimize nbtree search scan key comparisons" (not the main skip scan commit). I actually wrote code very much like the code from these patches that appeared in certain versions of the skip scan patch series -- it was originally supposed to be defensive hardening. This so-called hardening wasn't kept in the final committed version because I incorrectly believed that it wasn't necessary. I would like to commit the first patch later today, ahead of shipping beta1. But the second patch won't make it into beta1. In practice the bug fixed by the first patch is more likely to affect users, and (unlike the bug fixed by the second patch), it involves a hard crash. The first patch prevents us from dereferencing a NULL pointer (pstate) within _bt_advance_array_keys (unless on an assert-enabled build, where we get an assertion failure first). It would also be possible to fix the issue by testing if pstate itself is not a NULL pointer in the usual defensive style, but I find the approach taken in the first patch slightly more natural. The second patch is more complicated, and seems like something that I'll need to spend more time thinking about before proceeding with commit. It has subtle behavioral implications, in that it causes the pstate.forcenonrequired mechanism to influence when and how _bt_advance_array_keys schedules primitive index scans in a tiny minority of forward scan cases. I know of only 3 queries where this happens, 2 of which are from my repro -- it's actually really hard to find an example of this, even if you go out of your way to. Allowing pstate.forcenonrequired to affect primscan scheduling like this is something that I have been avoiding up until now, since that makes things cleaner -- but I'm starting to think that that goal isn't important enough to force the second patch to be significantly more complicated than what I came up with here. It's not like the behavioral differences represent a clear regression; they're just slightly different to what we see in cases where pstate.forcenonrequired/pstate.ikey is forcibly not applied (e.g., by commenting-out the calls to _bt_set_startikey made by _bt_readpage). My approach in the second patch is to simply call _bt_start_array_keys ahead of the finaltup call to _bt_checkkeys when pstate.forcenonrequired, which has the merit of being relatively simple (it's likely the simplest possible approach). I'm unwilling to pay too much of a cost in implementation complexity just to avoid side-effects in _bt_advance_array_keys/primscan scheduling, but maybe I'll find that the cost isn't too high. -- Peter Geoghegan v1-0001-Avoid-treating-nonrequired-nbtree-keys-as-require.patch Description: Binary data v1-0002-Prevent-prematurely-nbtree-array-advancement.patch Description: Binary data
Re: [PoC] Federated Authn/z with OAUTHBEARER
Jacob Champion writes: > On Fri, May 2, 2025 at 11:26 AM Tom Lane wrote: >> Apropos of that: our fine manual claims that option is spelled >> --auto_features, but that fails for me. Is that a typo in our >> manual, or do some meson versions accept the underscore? > --auto_features doesn't work for me either. That looks like an > accidental mashup of --auto-features and -Dauto_features. Ah, I see somebody already complained of this [1], but apparently we did nothing about it. I shall go fix it now. regards, tom lane [1] https://www.postgresql.org/message-id/flat/172465652540.862882.17808523044292761256%40wrigleys.postgresql.org
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Fri, May 2, 2025 at 11:56 AM Jacob Champion wrote: > -I/opt/homebrew/Cellar/openssl@3/3.5.0/include Except it _is_ right there. Oh, ha -- I'm not using Homebrew's Curl in this minimal build. Looks like it's coming from the sysroot. % ls -l /Library/Developer/CommandLineTools/SDKs/MacOSX15.2.sdk/usr/include/curl total 208 -rw-r--r-- 1 root wheel 129052 Nov 9 22:54 curl.h -rw-r--r-- 1 root wheel3044 Nov 9 22:54 curlver.h ... Well, that was fun. --Jacob
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Fri, May 2, 2025 at 10:31 AM Nathan Bossart wrote: > Yup, thanks! Great, thanks. I'll push it soon. --Jacob
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Fri, May 2, 2025 at 10:35 AM Tom Lane wrote: > FWIW, on my Mac a meson build from HEAD goes through fine, with or > without this patch. I'm getting openssl and libcurl from MacPorts > not Homebrew, but I don't know why that would make any difference. Do your and live in the same place? Mine do, so I had to disable NLS to get this to break. --Jacob
Re: Slot's restart_lsn may point to removed WAL segment after hard restart unexpectedly
Dear All, Thank you for the attention to the patch. I updated a patch with a better solution for the master branch which can be easily backported to the other branches as we agree on the final solution. Two tests are introduced which are based on Tomas Vondra's test for logical slots with injection points from the discussion (patches: [1], [2], [3]). Tests are implemented as module tests in src/test/modules/test_replslot_required_lsn directory. I slightly modified the original test for logical slots and created a new simpler test for physical slots without any additional injection points. I prepared a new solution (patch [4]) which is also based on Tomas Vondra's proposal. With a fresh eye, I realized that it can fix the issue as well. It is easy and less invasive to implement. The new solution differs from my original solution: it is backward compatible (doesn't require any changes in ReplicationSlot structure). My original solution can be backward compatible as well if to allocate flushed_restart_lsn in a separate array in shmem, not in the ReplicationSlot structure, but I believe the new solution is the better one. If you still think that my previous solution is the better (I don't think so), I will prepare a backward compatible patch with my previous solution. I also proposed one more commit (patch [5]) which removes unnecessary calls of ReplicationSlotsComputeRequiredLSN function which seems to be redundant. This function updates the oldest required LSN for slots and it is called every time when slots' restart_lsn is changed. Once, we use the oldest required LSN in CreateCheckPoint/CreateRestartPoint to remove old WAL segments, I believe, there is no need to calculate the oldest value immediately when the slot is advancing and in other cases when restart_lsn is changed. It may affect on GetWALAvailability function because the oldest required LSN will be not up to date, but this function seems to be used in the system view pg_get_replication_slots and doesn't affect the logic of old WAL segments removal. I also have some doubts concerning advancing of logical replication slots: the call of ReplicationSlotsComputeRequiredLSN was removed. Not sure, how it can affect on ReplicationSlotsComputeRequiredXmin. This commit is not necessary for the fix but I think it is worth to consider. It may be dropped or applied only to the master branch. This patch can be easily backported to the major release branches. I will quickly prepare the patches for the major releases as we agree on the final solution. I apologize for such too late change in patch when it is already on commitfest. I'm not well experienced yet in the internals of PostgreSQL at the moment, sometimes the better solution needs some time to grow. In doing we learn :) [1] 0001-Add-injection-points-to-test-replication-slot-advanc.v2.patch [2] 0002-Add-TAP-test-to-check-logical-repl-slot-advance-duri.v2.patch [3] 0003-Add-TAP-test-to-check-physical-repl-slot-advance-dur.v2.patch [4] 0004-Keep-WAL-segments-by-slot-s-flushed-restart-LSN.v2.patch [5] 0005-Remove-redundant-ReplicationSlotsComputeRequiredLSN-.v2.patch With best regards, Vitaly From a8693c3003df7f9850af0be5284bb6f0e7a82fa6 Mon Sep 17 00:00:00 2001 From: Vitaly Davydov Date: Wed, 30 Apr 2025 12:48:27 +0300 Subject: [PATCH 3/5] Add TAP test to check physical repl slot advance during checkpoint The test verifies that the physical replication slot is still valid after immediate restart on checkpoint completion in case when the slot was advanced during checkpoint. Discussion: https://www.postgresql.org/message-id/flat/1d12d2-67235980-35-19a406a0%4063439497 --- .../test_replslot_required_lsn/meson.build| 3 +- .../t/002_physical_slot.pl| 126 ++ 2 files changed, 128 insertions(+), 1 deletion(-) create mode 100644 src/test/modules/test_replslot_required_lsn/t/002_physical_slot.pl diff --git a/src/test/modules/test_replslot_required_lsn/meson.build b/src/test/modules/test_replslot_required_lsn/meson.build index 999c16201fb..44d2546632b 100644 --- a/src/test/modules/test_replslot_required_lsn/meson.build +++ b/src/test/modules/test_replslot_required_lsn/meson.build @@ -9,7 +9,8 @@ tests += { 'enable_injection_points': get_option('injection_points') ? 'yes' : 'no', }, 'tests': [ - 't/001_logical_slot.pl' + 't/001_logical_slot.pl', + 't/002_physical_slot.pl' ], }, } diff --git a/src/test/modules/test_replslot_required_lsn/t/002_physical_slot.pl b/src/test/modules/test_replslot_required_lsn/t/002_physical_slot.pl new file mode 100644 index 000..f89aec1da32 --- /dev/null +++ b/src/test/modules/test_replslot_required_lsn/t/002_physical_slot.pl @@ -0,0 +1,126 @@ +# Copyright (c) 2025, PostgreSQL Global Development Group +# +# This test verifies the case when the physical slot is advanced during +# checkpoint. The test checks that the physical slot's restart_lsn still refers +# to an existed WAL segmen
Re: SQL functions: avoid making a tuplestore unnecessarily
I wrote: > Given the lack of field complaints over the many years it's been > like this, there doesn't seem to be a live problem. I think the > explanation for that is > (1) those mechanisms are never used to call set-returning functions, > (2) therefore, the tuplestore will not be called on to hold more > than one result row, > (3) therefore, it won't get large enough that it wants to spill > to disk, > (4) therefore, its potentially dangling resowner pointer is never > used. > However, this is an uncomfortably shaky chain of assumptions. > I want to cut it down by fixing things so that there is no > tuplestore, period, in a non-set-returning SQL function. Following up on this thread: Alexander Lakhin's report at [1] shows that point (3) above is wrong: the tuplestore code will spill to disk even when holding just one tuple, if that tuple is bigger than the tuplestore's allowed work_mem. (This seems kinda dubious to me, since no memory savings can ensue. But I have no desire to rejigger that right now.) So there may actually be a live bug associated with use of a deleted resource owner here. I've not tried to pursue that though. More immediately: Alexander's report also shows that there's an easily reached bug in HEAD when the tuplestore does spill to disk. When it reads that tuple back in, it allocates it in the caller's memory context, and fmgr_sql is now calling that in the short-lived context it was called in not in its long-lived fcontext. The end result of that is that the long-lived result TupleTableSlot is now holding a should_free pointer to a short-lived tuple, which ends up in an attempt to pfree already-wiped storage during the next call of the SQL function. The patch I presented here eliminates that problem because with it, fmgr_sql no longer pulls tuples out of the tuplestore at all. So I want to apply this patch now instead of holding it for v19. regards, tom lane [1] https://www.postgresql.org/message-id/9f975803-1a1c-4f21-b987-f572e110e860%40gmail.com
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Fri, May 2, 2025 at 11:26 AM Tom Lane wrote: > Yeah, they are both under /opt/local/include in a MacPorts setup. > But disabling NLS doesn't break it for me. I tried > > meson setup build --auto-features=disabled -Dlibcurl=enabled > > to make sure that /opt/local/include wasn't getting pulled in > some other way, and it still builds. Hm. If you clear out the build artifacts under src/interfaces/libpq-oauth, and then build with $ ninja -v src/interfaces/libpq-oauth/libpq-oauth.a does that help surface anything interesting? > Apropos of that: our fine manual claims that option is spelled > --auto_features, but that fails for me. Is that a typo in our > manual, or do some meson versions accept the underscore? --auto_features doesn't work for me either. That looks like an accidental mashup of --auto-features and -Dauto_features. --Jacob
Re: POC: Parallel processing of indexes in autovacuum
On Fri, May 2, 2025 at 11:58 PM Sami Imseih wrote: > > I am generally -1 on the idea of autovacuum performing parallel > index vacuum, because I always felt that the parallel option should > be employed in a targeted manner for a specific table. if you have a bunch > of large tables, some more important than others, a/c may end > up using parallel resources on the least important tables and you > will have to adjust a/v settings per table, etc to get the right table > to be parallel index vacuumed by a/v. Hm, this is a good point. I think I should clarify one moment - in practice, there is a common situation when users have one huge table among all databases (with 80+ indexes created on it). But, of course, in general there may be few such tables. But we can still adjust the autovac_idx_parallel_min_rows parameter. If a table has a lot of dead tuples => it is actively used => table is important (?). Also, if the user can really determine the "importance" of each of the tables - we can provide an appropriate table option. Tables with this option set will be processed in parallel in priority order. What do you think about such an idea? > > Also, with the TIDStore improvements for index cleanup, and the practical > elimination of multi-pass index vacuums, I see this being even less > convincing as something to add to a/v. If I understood correctly, then we are talking about the fact that TIDStore can store so many tuples that in fact a second pass is never needed. But the number of passes does not affect the presented optimization in any way. We must think about a large number of indexes that must be processed. Even within a single pass we can have a 40% increase in speed. > > Now, If I am going to allocate extra workers to run vacuum in parallel, why > not just provide more autovacuum workers instead so I can get more tables > vacuumed within a span of time? For now, only one process can clean up indexes, so I don't see how increasing the number of a/v workers will help in the situation that I mentioned above. Also, we don't consume additional resources during autovacuum in this patch - total number of a/v workers always <= autovacuum_max_workers. BTW, see v2 patch, attached to this letter (bug fixes) :-) -- Best regards, Daniil Davydov From 1c93a729b844a1dfe109e8d9e54d5cc0a941d061 Mon Sep 17 00:00:00 2001 From: Daniil Davidov Date: Sat, 3 May 2025 00:27:45 +0700 Subject: [PATCH v2] WIP Allow autovacuum to process indexes of single table in parallel --- src/backend/commands/vacuum.c | 27 + src/backend/commands/vacuumparallel.c | 289 +- src/backend/postmaster/autovacuum.c | 906 +- src/backend/utils/misc/guc_tables.c | 30 + src/backend/utils/misc/postgresql.conf.sample | 6 + src/include/postmaster/autovacuum.h | 23 + src/test/modules/autovacuum/.gitignore| 1 + src/test/modules/autovacuum/Makefile | 14 + .../autovacuum/t/001_autovac_parallel.pl | 137 +++ 9 files changed, 1387 insertions(+), 46 deletions(-) create mode 100644 src/test/modules/autovacuum/.gitignore create mode 100644 src/test/modules/autovacuum/Makefile create mode 100644 src/test/modules/autovacuum/t/001_autovac_parallel.pl diff --git a/src/backend/commands/vacuum.c b/src/backend/commands/vacuum.c index 33a33bf6b1c..a5ef5319ccc 100644 --- a/src/backend/commands/vacuum.c +++ b/src/backend/commands/vacuum.c @@ -2234,6 +2234,33 @@ vacuum_rel(Oid relid, RangeVar *relation, VacuumParams *params, else toast_relid = InvalidOid; + /* + * Decide whether we need to process table with given oid in parallel mode + * during autovacuum. + */ + if (AmAutoVacuumWorkerProcess() && + params->index_cleanup != VACOPTVALUE_DISABLED) + { + PgStat_StatTabEntry *tabentry; + + /* fetch the pgstat table entry */ + tabentry = pgstat_fetch_stat_tabentry_ext(rel->rd_rel->relisshared, + rel->rd_id); + if (tabentry && tabentry->dead_tuples >= autovac_idx_parallel_min_rows) + { + List *indexes = RelationGetIndexList(rel); + int num_indexes = list_length(indexes); + + list_free(indexes); + + if (num_indexes >= autovac_idx_parallel_min_indexes && +max_parallel_index_autovac_workers > 0) + { +params->nworkers = max_parallel_index_autovac_workers; + } + } + } + /* * Switch to the table owner's userid, so that any index functions are run * as that user. Also lock down security-restricted operations and diff --git a/src/backend/commands/vacuumparallel.c b/src/backend/commands/vacuumparallel.c index 2b9d548cdeb..cb4b7c23010 100644 --- a/src/backend/commands/vacuumparallel.c +++ b/src/backend/commands/vacuumparallel.c @@ -1,20 +1,23 @@ /*- * * vacuumparallel.c - * Support routines for parallel vacuum execution. + * Support routines for parallel [auto]vacuum execution. * * This file contains routines that are inten
Re: [PoC] Federated Authn/z with OAUTHBEARER
Jacob Champion writes: > Hm. If you clear out the build artifacts under > src/interfaces/libpq-oauth, and then build with > $ ninja -v src/interfaces/libpq-oauth/libpq-oauth.a > does that help surface anything interesting? $ rm -rf src/interfaces/libpq-oauth $ ninja -v src/interfaces/libpq-oauth/libpq-oauth.a [1/2] ccache cc -Isrc/interfaces/libpq-oauth/libpq-oauth.a.p -Isrc/interfaces/libpq-oauth -I../src/interfaces/libpq-oauth -Isrc/interfaces/libpq -I../src/interfaces/libpq -Isrc/port -I../src/port -Isrc/include -I../src/include -I/opt/local/include -I/opt/local/libexec/openssl3/include -fdiagnostics-color=always -Wall -Winvalid-pch -O2 -g -isysroot /Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX15.4.sdk -fno-strict-aliasing -fwrapv -fexcess-precision=standard -Wmissing-prototypes -Wpointer-arith -Werror=vla -Werror=unguarded-availability-new -Wendif-labels -Wmissing-format-attribute -Wcast-function-type -Wformat-security -Wdeclaration-after-statement -Wmissing-variable-declarations -Wno-unused-command-line-argument -Wno-compound-token-split-by-macro -Wno-format-truncation -Wno-cast-function-type-strict -MD -MQ src/interfaces/libpq-oauth/libpq-oauth.a.p/oauth-curl.c.o -MF src/interfaces/libpq-oauth/libpq-oauth.a.p/oauth-curl.c.o.d -o src/interfaces/libpq-oauth/libpq-oauth.a.p/oauth-curl.c.o -c ../src/interfaces/libpq-oauth/oauth-curl.c [2/2] rm -f src/interfaces/libpq-oauth/libpq-oauth.a && ar csr src/interfaces/libpq-oauth/libpq-oauth.a src/interfaces/libpq-oauth/libpq-oauth.a.p/oauth-curl.c.o && ranlib -c src/interfaces/libpq-oauth/libpq-oauth.a So it's getting -I/opt/local/include and also -I/opt/local/libexec/openssl3/include from somewhere, which I guess must be libcurl's pkg-config data ... yup: $ pkg-config --cflags libcurl -I/opt/local/include -I/opt/local/libexec/openssl3/include -I/opt/local/include I bet Homebrew's libcurl packaging doesn't do that. regards, tom lane
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Fri, May 2, 2025 at 11:52 AM Tom Lane wrote: > $ pkg-config --cflags libcurl > -I/opt/local/include -I/opt/local/libexec/openssl3/include > -I/opt/local/include > > I bet Homebrew's libcurl packaging doesn't do that. Nope, Homebrew breaks them out into smaller pieces: % PKG_CONFIG_PATH=/opt/homebrew/opt/curl/lib/pkgconfig pkg-config --cflags libcurl -I/opt/homebrew/Cellar/curl/8.13.0/include -I/opt/homebrew/Cellar/brotli/1.1.0/include -I/opt/homebrew/opt/zstd/include -I/opt/homebrew/Cellar/libssh2/1.11.1/include -I/opt/homebrew/Cellar/rtmpdump/2.4-20151223_3/include -I/opt/homebrew/Cellar/openssl@3/3.5.0/include -I/opt/homebrew/Cellar/libnghttp2/1.65.0/include --Jacob
Re: extension_control_path and "directory"
On 29.04.25 17:06, Matheus Alcantara wrote: On Tue, Apr 29, 2025 at 11:08 AM David E. Wheeler wrote: Right. My point is a minor one, but I thin you can use an if/ else there: ```c if (strcmp(piece, "$system") == 0) { /* Substitute the path macro if needed */ mangled = substitute_path_macro(piece, "$system", system_dir); } else { /* * Append "extension" suffix in case is a custom extension * control path. */ mangled = psprintf("%s/extension", mangled); } ``` The substitute_path_macro() already handles the if/else on "piece" but I think that this if/else version looks nicer. Fixed. I've also included some documentation changes for this v5 version to remove the "extension" from the examples and also mention the scenario when using the "directory" on the .control file. Thanks, I have committed this. I did a bit of code reformatting and adjusted the documentation a bit. It's good to get this in before beta1 so that we don't have to change the valid values of extension_control_path past beta1.
Re: queryId constant squashing does not support prepared statements
On 2025-May-02, Michael Paquier wrote: > That depends. If we conclude that tracking this information through > the parser based on the start and end positions in a query string > for a set of values is more relevant, then we would be redesigning the > facility from the ground, so the old approach would not be really > relevant.. I disagree that a revert is warranted for this reason. If you want to change the implementation later, that's fine, as long as the user interface doesn't change. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/ "Doing what he did amounts to sticking his fingers under the hood of the implementation; if he gets his fingers burnt, it's his problem." (Tom Lane)
Small fixes needed by high-availability tools
Hi hackers! I want to revive attempts to fix some old edge cases of physical quorum replication. Please find attached draft patches that demonstrate ideas. These patches are not actually proposed code changes, I rather want to have a design consensus first. 1. Allow checking standby sync before making data visible after crash recovery Problem: Postgres instance must not allow to read data, if it is not yet known to be replicated. Instantly after the crash we do not know if we are still cluster primary. We can disallow new connections until standby quorum is established. Of course, walsenders and superusers must be exempt from this restriction. Key change is following: @@ -1214,6 +1215,16 @@ InitPostgres(const char *in_dbname, Oid dboid, if (PostAuthDelay > 0) pg_usleep(PostAuthDelay * 100L); + /* Check if we need to wait for startup synchronous replication */ + if (!am_walsender && + !superuser() && + !StartupSyncRepEstablished()) + { + ereport(FATAL, + (errcode(ERRCODE_CANNOT_CONNECT_NOW), +errmsg("cannot connect until synchronous replication is established with standbys according to startup_synchronous_standby_level"))); + } We might also want to have some kind of cache that quorum was already established. Also the place where the check is done might be not most appropriate. 2. Do not allow to cancel locally written transaction The problem was discussed many times [0,1,2,3] with some agreement on taken approach. But there was concerns that the solution is incomplete without first patch in the current thread. Problem: user might try to cancel locally committed transaction and if we do so we will show non-replicated data as committed. This leads to loosing data with UPSERTs. The key change is how we process cancels in SyncRepWaitForLSN(). 3. Allow reading LSN written by walreciever, but not flushed yet Problem: if we have synchronous_standby_names = ANY(node1,node2), node2 might be ahead of node1 by flush LSN, but before by written LSN. If we do a failover we choose node2 instead of node1 and loose data recently committed with synchronous_commit=remote_write. Caveat: we already have a function pg_last_wal_receive_lsn(), which in fact returns flushed LSN, not written. I propose to add a new function which returns LSN actually written. Internals of this function are already implemented (GetWalRcvWriteRecPtr()), but unused. Currently we just use a separate program lwaldump [4] which just reads WAL until last valid record. In case of failover pg_consul uses LSNs from lwaldump. This approach works well, but is cumbersome. There are other caveats of replication, but IMO these 3 problems are most annoying in terms of data durability. I'd greatly appreciate any thoughts on this. Best regards, Andrey Borodin. [0] https://www.postgresql.org/message-id/flat/C1F7905E-5DB2-497D-ABCC-E14D4DEE506C%40yandex-team.ru [1] https://www.postgresql.org/message-id/flat/caeet0zhg5off7iecby6tzadh1moslmfz1hlm311p9vot7z+...@mail.gmail.com [2] https://www.postgresql.org/message-id/flat/6a052e81060824a8286148b1165bafedbd7c86cd.ca...@j-davis.com#415dc2f7d41b8a251b419256407bb64d [3] https://www.postgresql.org/message-id/flat/CALj2ACUrOB59QaE6%3DjF2cFAyv1MR7fzD8tr4YM5%2BOwEYG1SNzA%40mail.gmail.com [4] https://github.com/g0djan/lwaldump 0001-Allow-checking-standby-sync-before-making-data-visib.patch Description: Binary data 0002-Do-not-allow-to-cancel-locally-written-transaction.patch Description: Binary data 0003-Allow-reading-LSN-written-by-walreciever-but-not-flu.patch Description: Binary data
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Fri, May 2, 2025 at 8:11 AM Nathan Bossart wrote: > > After commit b0635bf, I'm seeing the following meson build failures on > macOS: Thanks for the report, and sorry for the breakage. > In file included from > ../postgresql/src/interfaces/libpq-oauth/oauth-curl.c:51: > ../postgresql/src/interfaces/libpq/libpq-int.h:70:10: fatal error: > 'openssl/ssl.h' file not found >70 | #include > | ^~~ > 1 error generated. Hm. My test setup here is Homebrew with -Dextra_include_dirs, which may explain why it's not failing for me. Looks like Cirrus also has -Dextra_include_dirs... > The following patch seems to resolve it. I'm curious if commit 4ea1254 > might apply to meson, too, but FWIW I haven't noticed any related failures > on my machine. Yeah, I wonder if libintl is being similarly "cheated" on the Meson side. > diff --git a/meson.build b/meson.build > index 29d46c8ad01..19ad03042d3 100644 > --- a/meson.build > +++ b/meson.build > @@ -3295,6 +3295,7 @@ libpq_deps += [ > > libpq_oauth_deps += [ >libcurl, > + ssl, > ] Thanks! I think the include directory is the only thing needed for the static library, not the full link dependency. But I'll try to build up a new Meson setup, with minimal added settings, to see if I can reproduce here. Can you share your Meson configuration? --Jacob
Re: queryId constant squashing does not support prepared statements
> On Fri, May 02, 2025 at 04:18:37PM GMT, Michael Paquier wrote: > On Fri, May 02, 2025 at 09:13:39AM +0200, Dmitry Dolgov wrote: > > Squashing constants was ment to be a first step towards doing the same > > for other types of queries (params, rte_values), reverting it to > > implement everything at once makes very little sense to me. > > That depends. If we conclude that tracking this information through > the parser based on the start and end positions in a query string > for a set of values is more relevant, then we would be redesigning the > facility from the ground, so the old approach would not be really > relevant.. If I understand you correctly, changing the way how element list is identified is not going to address the question whether or not to squash parameters, right?
Re: Fix slot synchronization with two_phase decoding enabled
On Wed, Apr 30, 2025 at 5:45 AM Masahiko Sawada wrote: > > On Tue, Apr 29, 2025 at 5:00 AM Nisha Moond wrote: > > Thank you for updating the patch! Here are some comments on v10. > Thanks for reviewing the patch! > --- > + > +# Also wait for two-phase to be enabled > +$subscriber1->poll_query_until( > + 'postgres', qq[ > + SELECT count(1) = 0 FROM pg_subscription WHERE subname = > 'regress_mysub2' and subtwophasestate NOT IN ('e');] > +) or die "Timed out while waiting for subscriber to enable twophase"; > + > +# Try to enable the failover for the subscription, should give error > +($result, $stdout, $stderr) = $subscriber1->psql( > + 'postgres', > + "ALTER SUBSCRIPTION regress_mysub2 DISABLE; > +ALTER SUBSCRIPTION regress_mysub2 SET (failover = true);"); > +ok( $stderr =~ > + qr/ERROR: could not alter replication slot "lsub2_slot": ERROR: > cannot enable failover for a two-phase enabled replication slot/, > + "failover can't be enabled if restart_lsn < two_phase_at on a > two_phase subscription." > +); > > I think it's possible between two tests that the walsender consumes > some changes (e.g. generated by autovacuums) then the second check > fails (i.e., ALTER SUBSCRIPTION SET (failover = ture) succeeds). > Yes, this is a possibility. To account for this negative test case where restart_lsn < two_phase_at, I updated the test to attempt altering a two_phase-enabled slot without any associated subscription. > --- > +# Test that SQL API disallows slot creation with both two_phase and > failover enabled > +($result, $stdout, $stderr) = $publisher->psql('postgres', > + "SELECT pg_create_logical_replication_slot('regress_mysub3', > 'pgoutput', false, true, true);" > +); > +ok( $stderr =~ > + /ERROR: cannot enable both "failover" and "two_phase" options > during replication slot creation/, > + "cannot create slot with both two_phase and failover enabled"); > > Isn't this test already covered by test_decoding/sql/slot.sql? > Yes, it is covered in slot.sql, hence removed from here. > I've attached a patch that contains comment changes I mentioned above. > Please incorporate it if you agree with them. > I have incorporated the suggested changes and updated a couple more places accordingly. ~~~ Please find the v11 patch addressing the above points and all other comments. I have also optimized the test by reducing the number of subscriptions and slots. -- Thanks, Nisha v11-0001-PG17-Approach-3-Fix-slot-synchronization-for-two.patch Description: Binary data
Re: fixing CREATEROLE
On Thu, May 1, 2025 at 4:31 PM Tom Lane wrote: > In any case, I'd be happier about createrole_self_grant if it had > been a role property bit instead of a GUC. But we'd still need > to worry about whether it corrupts the results of dump/restore > (offhand I think it still would, if it results in GRANTs that > weren't there before). Hmm. That might have been a better design. :-( -- Robert Haas EDB: http://www.enterprisedb.com
Re: extension_control_path and "directory"
On Fri, May 2, 2025 at 11:51 AM Peter Eisentraut wrote: > > Thanks, I have committed this. I did a bit of code reformatting and > adjusted the documentation a bit. It's good to get this in before beta1 > so that we don't have to change the valid values of > extension_control_path past beta1. > Thanks Peter! -- Matheus Alcantara
Re: [PoC] Federated Authn/z with OAUTHBEARER
After commit b0635bf, I'm seeing the following meson build failures on macOS: In file included from ../postgresql/src/interfaces/libpq-oauth/oauth-curl.c:51: ../postgresql/src/interfaces/libpq/libpq-int.h:70:10: fatal error: 'openssl/ssl.h' file not found 70 | #include | ^~~ 1 error generated. The following patch seems to resolve it. I'm curious if commit 4ea1254 might apply to meson, too, but FWIW I haven't noticed any related failures on my machine. diff --git a/meson.build b/meson.build index 29d46c8ad01..19ad03042d3 100644 --- a/meson.build +++ b/meson.build @@ -3295,6 +3295,7 @@ libpq_deps += [ libpq_oauth_deps += [ libcurl, + ssl, ] subdir('src/interfaces/libpq') -- nathan
Re: POC: Parallel processing of indexes in autovacuum
Thanks for raising this idea! I am generally -1 on the idea of autovacuum performing parallel index vacuum, because I always felt that the parallel option should be employed in a targeted manner for a specific table. if you have a bunch of large tables, some more important than others, a/c may end up using parallel resources on the least important tables and you will have to adjust a/v settings per table, etc to get the right table to be parallel index vacuumed by a/v. Also, with the TIDStore improvements for index cleanup, and the practical elimination of multi-pass index vacuums, I see this being even less convincing as something to add to a/v. Now, If I am going to allocate extra workers to run vacuum in parallel, why not just provide more autovacuum workers instead so I can get more tables vacuumed within a span of time? > Once we have parallel heap vacuum, as discussed in thread[1], it would > also likely be beneficial to incorporate it into autovacuum during > aggressive vacuum or failsafe mode. IIRC, index cleanup is disabled by failsafe. -- Sami Imseih Amazon Web Services (AWS)
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Fri, May 2, 2025 at 8:59 AM Jacob Champion wrote: > libintl is already coming in via frontend_stlib_code, so that's fine. > So now I'm wondering if any other static clients of libpq-int.h (if > there are any) need the ssl dependency too, for correctness, or if > it's just me. Looks like it's just me. And using partial_dependency for the includes seems like overkill, so I've kept the full ssl dependency object, but moved it to the staticlib only, which is enough to solve the breakage on my machine. Nathan, if you get a chance, does the attached patch work for you? Thanks! --Jacob 0001-oauth-Correct-SSL-dependency-for-libpq-oauth.a.patch Description: Binary data
Re: POC: Parallel processing of indexes in autovacuum
> On Fri, May 2, 2025 at 11:58 PM Sami Imseih wrote: > > > > I am generally -1 on the idea of autovacuum performing parallel > > index vacuum, because I always felt that the parallel option should > > be employed in a targeted manner for a specific table. if you have a bunch > > of large tables, some more important than others, a/c may end > > up using parallel resources on the least important tables and you > > will have to adjust a/v settings per table, etc to get the right table > > to be parallel index vacuumed by a/v. > > Hm, this is a good point. I think I should clarify one moment - in > practice, there is a common situation when users have one huge table > among all databases (with 80+ indexes created on it). But, of course, > in general there may be few such tables. > But we can still adjust the autovac_idx_parallel_min_rows parameter. > If a table has a lot of dead tuples => it is actively used => table is > important (?). > Also, if the user can really determine the "importance" of each of the > tables - we can provide an appropriate table option. Tables with this > option set will be processed in parallel in priority order. What do > you think about such an idea? I think in most cases, the user will want to determine the priority of a table getting parallel vacuum cycles rather than having the autovacuum determine the priority. I also see users wanting to stagger vacuums of large tables with many indexes through some time period, and give the tables the full amount of parallel workers they can afford at these specific periods of time. A/V currently does not really allow for this type of scheduling, and if we give some kind of GUC to prioritize tables, I think users will constantly have to be modifying this priority. I am basing my comments on the scenarios I have seen on the field, and others may have a different opinion. > > Also, with the TIDStore improvements for index cleanup, and the practical > > elimination of multi-pass index vacuums, I see this being even less > > convincing as something to add to a/v. > > If I understood correctly, then we are talking about the fact that > TIDStore can store so many tuples that in fact a second pass is never > needed. > But the number of passes does not affect the presented optimization in > any way. We must think about a large number of indexes that must be > processed. Even within a single pass we can have a 40% increase in > speed. I am not discounting that a single table vacuum with many indexes will maybe perform better with parallel index scan, I am merely saying that the TIDStore optimization now makes index vacuums better and perhaps there is less of an incentive to use parallel. > > Now, If I am going to allocate extra workers to run vacuum in parallel, why > > not just provide more autovacuum workers instead so I can get more tables > > vacuumed within a span of time? > > For now, only one process can clean up indexes, so I don't see how > increasing the number of a/v workers will help in the situation that I > mentioned above. > Also, we don't consume additional resources during autovacuum in this > patch - total number of a/v workers always <= autovacuum_max_workers. Increasing a/v workers will not help speed up a specific table, what I am suggesting is that instead of speeding up one table, let's just allow other tables to not be starved of a/v cycles due to lack of a/v workers. -- Sami
Re: Parallel CREATE INDEX for GIN indexes
On 4/30/25 14:39, Tomas Vondra wrote: > > On 4/18/25 03:03, Vinod Sridharan wrote: >> ... >> > > The patch seems fine to me - I repeated the tests with mailing list > archives, with MemoryContextStats() in _gin_parallel_merge, and it > reliably minimizes the memory usage. So that's fine. > > I was also worried if this might have performance impact, but it > actually seems to make it a little bit faster. > > I'll get this pushed. > And pushed, so it'll be in beta1. Thanks! -- Tomas Vondra
Re: Improve explicit cursor handling in pg_stat_statements
> Hmm. What are the workloads that you have seen as problematic? Do > these involve cursor names generated randomly, where most of them are > similar with a random factor for the name? postgres_fdw, as an example, in which cursor name get reused for different queries. Notice below "c1" and "c2" is reused for different queries, so now what underlying sql is FETCH, i.e. FETCH 100 FROM c1 referring to? v2-0001 does not help us with the FETCH problem because as I mentioned we don't have access to the underlying sql ( and parsing is even too early to do a portal lookup to find the underlying sql to base the queryId on). What v2-0001 will do is at least group the DECLARE CURSOR statements together for cursors referencing the same query and reduce the # of entries. ``` create foreign table t2(id int) server r1; create foreign table t1(id int) server r1; postgres=# select * from t2, t ; id | id + 1 | 1 (1 row) postgres=# select * from t, t2 ; id | id + 1 | 1 (1 row) ``` on the remote side ``` postgres=# select calls, query from pg_stat_statements where query like '% c%'; calls | query ---+- 1 | DECLARE c2 CURSOR FOR + | SELECT id FROM public.t2 2 | DECLARE c1 CURSOR FOR + | SELECT id FROM public.t2 3 | CLOSE c2 3 | CLOSE c1 2 | DECLARE c2 CURSOR FOR + | SELECT id FROM public.t 3 | FETCH 100 FROM c1 3 | FETCH 100 FROM c2 1 | DECLARE c1 CURSOR FOR + | SELECT id FROM public.t 2 | select calls, query from pg_stat_statements where query like $1 (9 rows) ``` > Too much normalization > here would limit the amount of verbosity that we have for this area, > especially if we are dealing with query patterns that rely on few > CLOSE naming patterns spread across a couple of key queries, because > we would now know anymore about their distribution. The FETCH and CLOSE are already not clear to what underlying SQL they are referring to, and there is not much chance to actually improve that unless we track a cursor queryId in pg_stat_statements ( at that point we can show that FETCH or CLOSE refer to this specific cursor statement ). -- Sami
Re: queryId constant squashing does not support prepared statements
> On Fri, May 02, 2025 at 07:10:19AM GMT, Michael Paquier wrote: > > I am really leaning towards that we should revert this feature as the > > limitation we have now with parameters is a rather large one and I think > > we need to go back and address this issue. > > I am wondering if this would not be the best move to do on HEAD. > Let's see where the discussion drives us. Squashing constants was ment to be a first step towards doing the same for other types of queries (params, rte_values), reverting it to implement everything at once makes very little sense to me.
Re: queryId constant squashing does not support prepared statements
On Fri, May 02, 2025 at 09:13:39AM +0200, Dmitry Dolgov wrote: > Squashing constants was ment to be a first step towards doing the same > for other types of queries (params, rte_values), reverting it to > implement everything at once makes very little sense to me. That depends. If we conclude that tracking this information through the parser based on the start and end positions in a query string for a set of values is more relevant, then we would be redesigning the facility from the ground, so the old approach would not be really relevant.. -- Michael signature.asc Description: PGP signature
Re: Fix slot synchronization with two_phase decoding enabled
On Fri, May 2, 2025 at 12:57 PM Nisha Moond wrote: > > > Please find the v11 patch addressing the above points and all other > comments. I have also optimized the test by reducing the number of > subscriptions and slots. > Thanks for the patch. Few comments: 1) pg_upgrade/t/003_logical_slots.pl: - "SELECT slot_name, two_phase, failover FROM pg_replication_slots"); -is($result, qq(regress_sub|t|t), 'check the slot exists on new cluster'); + "SELECT slot_name, two_phase FROM pg_replication_slots"); +is($result, qq(regress_sub|t), 'check the slot exists on new cluster'); With this change, the failover property test during upgrade is removed. Shall we add it again either using ALTER SUB to enable failover and two_phase together or a new subscription with failover alone? 2) + The default is false. This option cannot be set together with + failover when creating a logical replication slot. + However, once the slot is created with two_phase = true, + failover can be set to true after the slot has + consumed all transactions up to the point where two-phase decoding + was enabled. Suggestion: all transactions --> all the changes. thanks Shveta
Re: Add an option to skip loading missing publication to avoid logical replication failure
Yeh, tks for your clarification. I have a basic understanding of it now. I mean is this considered a bug or design defect in the codebase? If so, should we prevent it from occuring in general, not just for this specific test. vignesh C > > We have three processes involved in this scenario: > A walsender process on the publisher, responsible for decoding and > sending WAL changes. > An apply worker process on the subscriber, which applies the changes. > A session executing the ALTER SUBSCRIPTION command. > > Due to the asynchronous nature of these processes, the ALTER > SUBSCRIPTION command may not be immediately observed by the apply > worker. Meanwhile, the walsender may process and decode an INSERT > statement. > If the insert targets a table (e.g., tab_3) that does not belong to > the current publication (pub1), the walsender silently skips > replicating the record and advances its decoding position. This > position is sent in a keepalive message to the subscriber, and since > there are no pending transactions to flush, the apply worker reports > it as the latest received LSN. > Later, when the apply worker eventually detects the subscription > change, it restarts—but by then, the insert has already been skipped > and is no longer eligible for replay, as the table was not part of the > publication (pub1) at the time of decoding. > This race condition arises because the three processes run > independently and may progress at different speeds due to CPU > scheduling or system load. > Thoughts? > > Regards, > Vignesh >
Re: Make COPY format extendable: Extract COPY TO format implementations
On Thu, May 1, 2025 at 4:04 PM Michael Paquier wrote: > > On Thu, May 01, 2025 at 12:15:30PM -0700, Masahiko Sawada wrote: > > In light of these concerns, I've been contemplating alternative > > interface designs. One promising approach would involve registering > > custom copy formats via a C function during module loading > > (specifically, in _PG_init()). This method would require extension > > authors to invoke a registration function, say > > RegisterCustomCopyFormat(), in _PG_init() as follows: > > > > JsonLinesFormatId = RegisterCustomCopyFormat("jsonlines", > > &JsonLinesCopyToRoutine, > > &JsonLinesCopyFromRoutine); > > > > The registration function would validate the format name and store it > > in TopMemoryContext. It would then return a unique identifier that can > > be used subsequently to reference the custom copy format extension. > > Hmm. How much should we care about the observability of the COPY > format used by a given backend? Storing this information in a > backend's TopMemoryContext is OK to get the extensibility basics to > work, but could it make sense to use some shmem state to allocate a > uint32 ID that could be shared by all backends. Contrary to EXPLAIN, > COPY commands usually run for a very long time, so I am wondering if > these APIs should be designed so as it would be possible to monitor > the format used. One layer where the format information could be made > available is the progress reporting view for COPY, for example. I can > also imagine a pgstats kind where we do COPY stats aggregates, with a > per-format pgstats kind, and sharing a fixed ID across multiple > backends is relevant (when flushing the stats at shutdown, we would > use a name/ID mapping like replication slots). > > I don't think that this needs to be relevant for the option part, just > for the format where, I suspect, we should store in a shmem array > based on the ID allocated the name of the format, the library of the > callback and the function name fed to load_external_function(). > > Note that custom LWLock and wait events use a shmem state for > monitoring purposes, where we are able to do ID->format name lookups > as much as format->ID lookups. Perhaps it's OK not to do that for > COPY, but I am wondering if we'd better design things from scratch > with states in shmem state knowing that COPY is a long-running > operation, and that if one mixes multiple formats they would most > likely want to know which formats are bottlenecks, through SQL. Cloud > providers would love that. Good point. It would make sense to have such information as a map on shmem. It might be better to use dshash here since a custom copy format module can be loaded at runtime. Or we can use dynahash with large enough elements. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Make COPY format extendable: Extract COPY TO format implementations
On Thursday, May 1, 2025, Masahiko Sawada wrote: > > In light of these concerns, I've been contemplating alternative > interface designs. One promising approach would involve registering > custom copy formats via a C function during module loading > (specifically, in _PG_init()). This method would require extension > authors to invoke a registration function, say > RegisterCustomCopyFormat(), in _PG_init() as follows: > > JsonLinesFormatId = RegisterCustomCopyFormat("jsonlines", > &JsonLinesCopyToRoutine, > &JsonLinesCopyFromRoutine); > > The registration function would validate the format name and store it > in TopMemoryContext. It would then return a unique identifier that can > be used subsequently to reference the custom copy format extension. > How does this fix the search_path concern? Are query writers supposed to put JsonLinesFormatId into their queries? Or are you just prohibiting a DBA from ever installing an extension that wants to register a format name that is already registered so that no namespace is ever required? ISTM accommodating a namespace for formats is required just like we do for virtually every other named object in the system. At least, if we want to play nice with extension authors. It doesn’t have to be within the existing pg_proc scope, we can create a new scope if desired, but abolishing it seems unwise. It would be more consistent with established policy if we didn’t make exceptions for text/csv/binary - if the DBA permits a text format to exist in a different schema and that schema appears first in the search_path, unqualified references to text would resolve to the non-core handler. We already protect ourselves with safe search_paths. This is really no different than if someone wanted to implement a now() function and people are putting pg_catalog from of existing usage. It’s the DBAs problem, not ours. David J.
Re: Make COPY format extendable: Extract COPY TO format implementations
On Fri, May 2, 2025 at 10:36 PM David G. Johnston wrote: > > On Thursday, May 1, 2025, Masahiko Sawada wrote: >> >> >> In light of these concerns, I've been contemplating alternative >> interface designs. One promising approach would involve registering >> custom copy formats via a C function during module loading >> (specifically, in _PG_init()). This method would require extension >> authors to invoke a registration function, say >> RegisterCustomCopyFormat(), in _PG_init() as follows: >> >> JsonLinesFormatId = RegisterCustomCopyFormat("jsonlines", >> &JsonLinesCopyToRoutine, >> &JsonLinesCopyFromRoutine); >> >> The registration function would validate the format name and store it >> in TopMemoryContext. It would then return a unique identifier that can >> be used subsequently to reference the custom copy format extension. > > > How does this fix the search_path concern? Are query writers supposed to put > JsonLinesFormatId into their queries? Or are you just prohibiting a DBA from > ever installing an extension that wants to register a format name that is > already registered so that no namespace is ever required? Users can specify "jsonlines", passed in the first argument to the register function, to the COPY FORMAT option in this case. While JsonLinesFormatId is reserved for internal operations such as module processing and monitoring, any attempt to load another custom COPY format module named 'jsonlines' will result in an error. > ISTM accommodating a namespace for formats is required just like we do for > virtually every other named object in the system. At least, if we want to > play nice with extension authors. It doesn’t have to be within the existing > pg_proc scope, we can create a new scope if desired, but abolishing it seems > unwise. > > It would be more consistent with established policy if we didn’t make > exceptions for text/csv/binary - if the DBA permits a text format to exist in > a different schema and that schema appears first in the search_path, > unqualified references to text would resolve to the non-core handler. We > already protect ourselves with safe search_paths. This is really no > different than if someone wanted to implement a now() function and people are > putting pg_catalog from of existing usage. It’s the DBAs problem, not ours. I'm concerned about allowing multiple 'text' format implementations with identical names within the database, as this could lead to considerable confusion. When users specify 'text', it would be more logical to guarantee that the built-in 'text' format is consistently used. This principle aligns with other customizable components, such as custom resource managers, wait events, lightweight locks, and custom scans. These components maintain their built-in data/types and explicitly prevent the registration of duplicate names. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Make COPY format extendable: Extract COPY TO format implementations
On Fri, May 2, 2025 at 9:56 PM Sutou Kouhei wrote: > > Hi, > > In > "Re: Make COPY format extendable: Extract COPY TO format implementations" > on Fri, 2 May 2025 21:38:32 -0700, > Masahiko Sawada wrote: > > >> How about requiring schema for all custom formats? > >> > >> Valid: > >> > >> COPY ... TO ... (FORMAT 'text'); > >> COPY ... TO ... (FORMAT 'my_schema.jsonlines'); > >> > >> Invalid: > >> > >> COPY ... TO ... (FORMAT 'jsonlines'); -- no schema > >> COPY ... TO ... (FORMAT 'pg_catalog.text'); -- needless schema > >> > >> If we require "schema" for all custom formats, we don't need > >> to depend on search_path. > > > > I'm concerned that users cannot use the same format name in the FORMAT > > option depending on which schema the handler function is created. > > I'm not sure that it's a problem or not. If users want to > use the same format name, they can install the handler > function to the same schema. > > >> Why do we need to assign a unique ID? For performance? For > >> RegisterCustomCopyFormatOption()? > > > > I think it's required for monitoring purposes for example. For > > instance, we can set the format ID in the progress information and the > > progress view can fetch the format name by the ID so that users can > > see what format is being used in the COPY command. > > How about setting the format name instead of the format ID > in the progress information? The progress view can know only numbers. We need to extend the progress view infrastructure so that we can pass other data types. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: POC: Parallel processing of indexes in autovacuum
On Fri, May 2, 2025 at 11:13 AM Daniil Davydov <3daniss...@gmail.com> wrote: > > On Thu, May 1, 2025 at 8:03 AM Masahiko Sawada wrote: > > > > As I understand it, we initially disabled parallel vacuum for > > autovacuum because their objectives are somewhat contradictory. > > Parallel vacuum aims to accelerate the process by utilizing additional > > resources, while autovacuum is designed to perform cleaning operations > > with minimal impact on foreground transaction processing (e.g., > > through vacuum delay). > > > Yep, we also decided that we must not create more a/v workers for > index processing. > In current implementation, the leader process sends a signal to the > a/v launcher, and the launcher tries to launch all requested workers. > But the number of workers never exceeds `autovacuum_max_workers`. > Thus, we will never have more a/v workers than in the standard case > (without this feature). I have concerns about this design. When autovacuuming on a single table consumes all available autovacuum_max_workers slots with parallel vacuum workers, the system becomes incapable of processing other tables. This means that when determining the appropriate autovacuum_max_workers value, users must consider not only the number of tables to be processed concurrently but also the potential number of parallel workers that might be launched. I think it would more make sense to maintain the existing autovacuum_max_workers parameter while introducing a new parameter that would either control the maximum number of parallel vacuum workers per autovacuum worker or set a system-wide cap on the total number of parallel vacuum workers. > > > Regarding implementation: I notice the WIP patch implements its own > > parallel vacuum mechanism for autovacuum. Have you considered simply > > setting at_params.nworkers to a value greater than zero? > > > About `at_params.nworkers = N` - that's exactly what we're doing (you > can see it in the `vacuum_rel` function). But we cannot fully reuse > code of VACUUM PARALLEL, because it creates its own processes via > dynamic bgworkers machinery. > As I said above - we don't want to consume additional resources. Also > we don't want to complicate communication between processes (the idea > is that a/v workers can only send signals to the a/v launcher). Could you elaborate on the reasons why you don't want to use background workers and avoid complicated communication between processes? I'm not sure whether these concerns provide sufficient justification for implementing its own parallel index processing. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: AIO v2.5
On Wed, Apr 30, 2025 at 04:00:35PM -0400, Andres Freund wrote: > pgaio_io_wait_for_free() does what it says on the tin. For that, after a bunch > of other things, finds the oldest in-flight IO and waits for it. > > PgAioHandle *ioh = dclist_head_element(PgAioHandle, node, > >&pgaio_my_backend->in_flight_ios); > > switch (ioh->state) > { > ... > case PGAIO_HS_COMPLETED_IO: > case PGAIO_HS_SUBMITTED: > pgaio_debug_io(DEBUG2, ioh, > "waiting for free io > with %d in flight", > > dclist_count(&pgaio_my_backend->in_flight_ios)); > ... > pgaio_io_wait(ioh, ioh->generation); > break; > > > The problem is that, if the log level is low enough, ereport() (which is > called by pgaio_debug_io()), processes interrupts. The interrupt processing > may end up execute ProcessBarrierSmgrRelease(), which in turn needs to wait > for all in-flight IOs before the IOs are closed. > > Which then leads to the > elog(PANIC, "waiting for own IO in wrong state: %d", >state); > > error. Printing state 0 (PGAIO_HS_IDLE), right? I think the chief problem is that pgaio_io_wait_for_free() is fetching ioh->state, then possibly processing interrupts in pgaio_debug_io(), then finally fetching ioh->generation. If it fetched ioh->generation to a local variable before pgaio_debug_io, I think that would resolve this one. Then the pgaio_io_was_recycled() would prevent the PANIC: if (pgaio_io_was_recycled(ioh, ref_generation, &state)) return; if (am_owner) { if (state != PGAIO_HS_SUBMITTED && state != PGAIO_HS_COMPLETED_IO && state != PGAIO_HS_COMPLETED_SHARED && state != PGAIO_HS_COMPLETED_LOCAL) { elog(PANIC, "waiting for own IO in wrong state: %d", state); } } Is that right? If that's the solution, pgaio_closing_fd() and pgaio_shutdown() would need similar care around fetching the generation before the pgaio_debug_io. Maybe there's an opportunity for a common inline function. Or at least a comment at the "generation" field on how to safely time a fetch thereof and any barrier required. > A similar set of steps can lead to the "no free IOs despite no in-flight IOs" > ERROR that Alexander also observed - if pgaio_submit_staged() triggers a debug > ereport that executes ProcessBarrierSmgrRelease() in an interrupt, we might > wait for all in-flight IOs during IO submission, triggering the error. That makes sense. > I'm not yet sure how to best fix it - locally I have done so by pgaio_debug() > do a HOLD_INTERRUPTS()/RESUME_INTERRUPTS() around the call to ereport. But > that doesn't really seem great - otoh requiring various pieces of code to know > that anything emitting debug messages needs to hold interrupts etc makes for > rare and hard to understand bugs. > > We could just make the relevant functions hold interrupts, and that might be > the best path forward, but we don't really need to hold all interrupts > (e.g. termination would be fine), so it's a bit coarse grained. It would need > to happen in a few places, which isn't great either. > > Other suggestions? For the "no free IOs despite no in-flight IOs" case, I'd replace the ereport(ERROR) with "return;" since we now know interrupt processing reclaimed an IO. Then decide what protection if any, we need against bugs causing an infinite loop in caller pgaio_io_acquire(). What's the case motivating the unbounded loop in pgaio_io_acquire(), as opposed to capping at two pgaio_io_acquire_nb() calls? If the theory is that pgaio_io_acquire() could be reentrant, what scenario would reach that reentrancy? > Thanks again for finding and reporting this Alexander! +1!
Re: PG 18 release notes draft committed
On Sat, May 3, 2025 at 01:46:29AM +0200, Jelte Fennema-Nio wrote: > On Fri, 2 May 2025 at 04:45, Bruce Momjian wrote: > > > > I have committd the first draft of the PG 18 release notes. The item > > count looks strong: > > Thanks for all the work. Some notes: > > 1. There's currently no mention that protocol version 3.2 was > introduced in this release. I'm not sure where/how this should be > mentioned, but I definitely think it should be somewhere. It's a > pretty major change. One option is to replace/amend the "Make cancel > request keys 256 bits" item. Maybe replace that with something like: > "Postgres 18 introduces protocol version 3.2. This is the first new > protocol version since 3.0, which was introduced in Postgres 7.4. This > new protocol version 3.2 allows a server to use longer cancel request > keys. When the client advertises support for protocol version 3.2 (or > higher) Postgres 18 will use a cancel key size of 256 bits." Okay, I added a mention next to the libpq version function entries. > 2. Obviously biased since it's my contribution, but I think d38bab5 > might deserve a mention. I disagree. pgbench limits like this are not something we give much detail around error avoidance to in the release notes. > 3. The "Add PQtrace() output..." commitlist should also contain 7adec2d5fc Added. Patch attached. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future. diff --git a/doc/src/sgml/release-18.sgml b/doc/src/sgml/release-18.sgml index 2654960eef8..ed7fa5d903f 100644 --- a/doc/src/sgml/release-18.sgml +++ b/doc/src/sgml/release-18.sgml @@ -2146,6 +2146,10 @@ Author: Robert Haas Add function PQfullProtocolVersion() to report the full, including minor, protocol version number (Jacob Champion, Jelte Fennema-Nio) § + + +This release introduces wire protocol version 3.2. +
Re: POC: Parallel processing of indexes in autovacuum
> I think it would more make > sense to maintain the existing autovacuum_max_workers parameter while > introducing a new parameter that would either control the maximum > number of parallel vacuum workers per autovacuum worker or set a > system-wide cap on the total number of parallel vacuum workers. +1, and would it make sense for parallel workers to come from max_parallel_maintenance_workers? This is capped by max_parallel_workers and max_worker_processes, so increasing the defaults for all 3 will be needed as well. -- Sami
Re: PG 18 release notes draft committed
On Fri, 2 May 2025 at 04:45, Bruce Momjian wrote: > > I have committd the first draft of the PG 18 release notes. The item > count looks strong: Thanks for all the work. Some notes: 1. There's currently no mention that protocol version 3.2 was introduced in this release. I'm not sure where/how this should be mentioned, but I definitely think it should be somewhere. It's a pretty major change. One option is to replace/amend the "Make cancel request keys 256 bits" item. Maybe replace that with something like: "Postgres 18 introduces protocol version 3.2. This is the first new protocol version since 3.0, which was introduced in Postgres 7.4. This new protocol version 3.2 allows a server to use longer cancel request keys. When the client advertises support for protocol version 3.2 (or higher) Postgres 18 will use a cancel key size of 256 bits." 2. Obviously biased since it's my contribution, but I think d38bab5 might deserve a mention. 3. The "Add PQtrace() output..." commitlist should also contain 7adec2d5fc
Re: pgsql: Add function to log the memory contexts of specified backend pro
On 2025/05/02 14:58, torikoshia wrote: I confirmed that with this patch applied, the process no longer crashes even after applying the change Robert used to trigger the crash. a small nitpick: + * requested repeatedly and rapidly, potentially leading to infinite It looks like there are two spaces between "requested" and "repeatedly". Thanks for the review and testing! I've fixed the issue you pointed out. Updated patch attached. Since git cherry-pick didn't work cleanly for v16 and earlier, I've also prepared a separate patch for those versions. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION From c39b3b83f8a04e781812bd3b83acb235babe3116 Mon Sep 17 00:00:00 2001 From: Fujii Masao Date: Sat, 3 May 2025 10:21:01 +0900 Subject: [PATCH v2] Add guard to prevent recursive memory contexts logging. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Previously, if memory contexts logging was triggered repeatedly and rapidly while a previous request was still being processed, it could result in recursive calls to ProcessLogMemoryContextInterrupt(). This could lead to infinite recursion and potentially crash the process. This commit adds a guard to prevent such recursion. If ProcessLogMemoryContextInterrupt() is already in progress and logging memory contexts, subsequent calls will exit immediately, avoiding unintended recursive calls. While this scenario is unlikely in practice, it’s not impossible. This change adds a safety check to prevent such failures. Back-patch to v14, where memory contexts logging was introduced. Reported-by: Robert Haas Author: Fujii Masao Reviewed-by: Atsushi Torikoshi Discussion: https://postgr.es/m/ca+tgmozmrv32tbnrrftvf9iwlntgqbhyslvcrhguwzvctph...@mail.gmail.com Backpatch-through: 14 --- src/backend/utils/mmgr/mcxt.c | 60 --- 1 file changed, 41 insertions(+), 19 deletions(-) diff --git a/src/backend/utils/mmgr/mcxt.c b/src/backend/utils/mmgr/mcxt.c index 9fc83f11f6f..73dcd64c351 100644 --- a/src/backend/utils/mmgr/mcxt.c +++ b/src/backend/utils/mmgr/mcxt.c @@ -1198,28 +1198,50 @@ HandleLogMemoryContextInterrupt(void) void ProcessLogMemoryContextInterrupt(void) { + static bool in_progress = false; + LogMemoryContextPending = false; - /* -* Use LOG_SERVER_ONLY to prevent this message from being sent to the -* connected client. -*/ - ereport(LOG_SERVER_ONLY, - (errhidestmt(true), -errhidecontext(true), -errmsg("logging memory contexts of PID %d", MyProcPid))); + PG_TRY(); + { + /* +* Exit immediately if memory contexts logging is already in progress. +* This prevents recursive calls, which could occur if logging is +* requested repeatedly and rapidly, potentially leading to infinite +* recursion and a crash. +*/ + if (in_progress) + return; + in_progress = true; - /* -* When a backend process is consuming huge memory, logging all its memory -* contexts might overrun available disk space. To prevent this, we limit -* the number of child contexts to log per parent to 100. -* -* As with MemoryContextStats(), we suppose that practical cases where the -* dump gets long will typically be huge numbers of siblings under the -* same parent context; while the additional debugging value from seeing -* details about individual siblings beyond 100 will not be large. -*/ - MemoryContextStatsDetail(TopMemoryContext, 100, false); + /* +* Use LOG_SERVER_ONLY to prevent this message from being sent to the +* connected client. +*/ + ereport(LOG_SERVER_ONLY, + (errhidestmt(true), +errhidecontext(true), +errmsg("logging memory contexts of PID %d", MyProcPid))); + + /* +* When a backend process is consuming huge memory, logging all its +* memory contexts might overrun available disk space. To prevent +* this, we limit the number of child contexts to log per parent to +* 100. +* +* As with MemoryContextStats(), we suppose that practical cases where +* the dump gets long will typically be huge numbers of siblings under +* the same parent context; while the additional debugging value from +* seeing details about individual siblings beyond 100 will not be +* large. +*/ + MemoryContextStatsD
Re: Make COPY format extendable: Extract COPY TO format implementations
Hi, In "Re: Make COPY format extendable: Extract COPY TO format implementations" on Thu, 1 May 2025 12:15:30 -0700, Masahiko Sawada wrote: > One of the primary considerations we need to address is the treatment > of the specified format name. The current patch set utilizes built-in > formats (namely 'csv', 'text', and 'binary') when the format name is > either unqualified or explicitly specified with 'pg_catalog' as the > schema. In all other cases, we search for custom format handler > functions based on the search_path. To be frank, I have reservations > about this interface design, as the dependence of the specified custom > format name on the search_path could potentially confuse users. How about requiring schema for all custom formats? Valid: COPY ... TO ... (FORMAT 'text'); COPY ... TO ... (FORMAT 'my_schema.jsonlines'); Invalid: COPY ... TO ... (FORMAT 'jsonlines'); -- no schema COPY ... TO ... (FORMAT 'pg_catalog.text'); -- needless schema If we require "schema" for all custom formats, we don't need to depend on search_path. > In light of these concerns, I've been contemplating alternative > interface designs. One promising approach would involve registering > custom copy formats via a C function during module loading > (specifically, in _PG_init()). This method would require extension > authors to invoke a registration function, say > RegisterCustomCopyFormat(), in _PG_init() as follows: > > JsonLinesFormatId = RegisterCustomCopyFormat("jsonlines", > &JsonLinesCopyToRoutine, > &JsonLinesCopyFromRoutine); > > The registration function would validate the format name and store it > in TopMemoryContext. It would then return a unique identifier that can > be used subsequently to reference the custom copy format extension. I don't object the suggested interface because I don't have a strong opinion how to implement this feature. Why do we need to assign a unique ID? For performance? For RegisterCustomCopyFormatOption()? I think that we don't need to use it so much in COPY. We don't need to use format name and assigned ID after we retrieve a corresponding Copy{To,From}Routine. Because all needed information are in Copy{To,From}Routine. > Extensions could register their own options within this > framework, for example: > > RegisterCustomCopyFormatOption(JsonLinesFormatId, > "custom_option", > custom_option_handler); Can we defer to discuss how to add support for custom options while we focus on the first implementation? Earlier patch sets with the current approach had custom options support but it's removed in the first implementation. (BTW, I think that it's not a good API because we want COPY FROM only options and COPY TO only options something like "compression level".) > This approach offers several advantages: it would eliminate the > search_path issue, provide greater flexibility, and potentially > simplify the overall interface for users and developers alike. What contributes to the "flexibility"? Developers can call multiple Register* functions in _PG_Init(), right? Thanks, -- kou
Re: Make COPY format extendable: Extract COPY TO format implementations
Hi, In "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 2 May 2025 15:52:49 -0700, Masahiko Sawada wrote: >> Hmm. How much should we care about the observability of the COPY >> format used by a given backend? Storing this information in a >> backend's TopMemoryContext is OK to get the extensibility basics to >> work, but could it make sense to use some shmem state to allocate a >> uint32 ID that could be shared by all backends. Contrary to EXPLAIN, >> COPY commands usually run for a very long time, so I am wondering if >> these APIs should be designed so as it would be possible to monitor >> the format used. One layer where the format information could be made >> available is the progress reporting view for COPY, for example. I can >> also imagine a pgstats kind where we do COPY stats aggregates, with a >> per-format pgstats kind, and sharing a fixed ID across multiple >> backends is relevant (when flushing the stats at shutdown, we would >> use a name/ID mapping like replication slots). >> >> I don't think that this needs to be relevant for the option part, just >> for the format where, I suspect, we should store in a shmem array >> based on the ID allocated the name of the format, the library of the >> callback and the function name fed to load_external_function(). >> >> Note that custom LWLock and wait events use a shmem state for >> monitoring purposes, where we are able to do ID->format name lookups >> as much as format->ID lookups. Perhaps it's OK not to do that for >> COPY, but I am wondering if we'd better design things from scratch >> with states in shmem state knowing that COPY is a long-running >> operation, and that if one mixes multiple formats they would most >> likely want to know which formats are bottlenecks, through SQL. Cloud >> providers would love that. > > Good point. It would make sense to have such information as a map on > shmem. It might be better to use dshash here since a custom copy > format module can be loaded at runtime. Or we can use dynahash with > large enough elements. If we don't need to assign an ID for each format, can we avoid it? If we implement it, is this approach more complex than the current table sampling method like approach? Thanks, -- kou
Re: PG 18 release notes draft committed
On Fri, May 2, 2025 at 12:18:06PM -0400, Bruce Momjian wrote: > Finally, I see the big increases in this release as being the optimizer, > monitoring, and constraints. Also, and I am loving the chapter markers linking to gitweb. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: extension_control_path and "directory"
Re: Matheus Alcantara > > Thanks, I have committed this. I did a bit of code reformatting and > > adjusted the documentation a bit. It's good to get this in before beta1 > > so that we don't have to change the valid values of > > extension_control_path past beta1. > > Thanks Peter! And thanks everyone! Christoph
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Fri, May 2, 2025 at 8:46 AM Jacob Champion wrote: > Yeah, I wonder if libintl is being similarly "cheated" on the Meson side. libintl is already coming in via frontend_stlib_code, so that's fine. So now I'm wondering if any other static clients of libpq-int.h (if there are any) need the ssl dependency too, for correctness, or if it's just me. > But I'll try to build up > a new Meson setup, with minimal added settings, to see if I can > reproduce here. Can you share your Meson configuration? (Never mind -- this was pretty easy to hit in a from-scratch configuration.) --Jacob
Re: PG 18 release notes draft committed
On Fri, May 2, 2025 at 01:00:57PM +0900, Amit Langote wrote: > 1. Speed up execution of cached plans by deferring locks on partitions > subject to pruning (Amit Langote) > (bb3ec16e1, d47cbf474, cbc127917, 525392d57) > > 2. Speed up child EquivalenceMember lookup in planner (Yuya Watari, > David Rowley) > (d69d45a5a) > > 3. Speed up derived clause lookup in EquivalenceClass (Ashutosh Bapat) > (88f55bc97) > > Alternatively, 2 and 3 can be combined as: > > 2. Speed up partition planning by improving EquivalenceClass lookups > (Yuya Watari, David Rowley, Ashutosh Bapat) > > I think 1 should go under Partitioning, which I see is currently missing. > > Any thoughts, David? > > Can work on a patch if you'd like. So, a few things. First, these set of commits was in a group of 10 that I added since there have been complaints in the past that optimizer improvements were not listed and therefore patch authors were not given sufficient credit. That means the 209 item count for PG 18 is 10 higher than my normal filtering would produce. Second, looking at the items, these are a case of "X is faster", which we don't normally mention in the release notes. We normally mention "faster" when it is so much faster that use cases which were not possible before might be possible now, so it is recommended to retest. That is what I saw this grouped item as, whereas I don't think the individual items meet that criteria. Also, I didn't see enough partition items to warrant a separate partition section, and we didn't have one in PG 17 either. We could pull all the partition items from the sections they are already in, but they seem more natural in the sections they are in. I don't think most people would know what EquivalenceMember is, and even if they did, would they be able to connect it to an SQL query? Finally, I see the big increases in this release as being the optimizer, monitoring, and constraints. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
First-draft back-branch release notes are up
Much less exciting than the v18 release notes, but we still gotta do 'em. See https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=176877f461a8b55e921f597fb217f6ab89ee019f As usual, please send corrections by Sunday. regards, tom lane
Re: RFC: Additional Directory for Extensions
On May 1, 2025, at 16:24, Peter Eisentraut wrote: > I see. I have committed it now describing the current state. Quick follow-up to tweak a couple of commas. --- a/doc/src/sgml/extend.sgml +++ b/doc/src/sgml/extend.sgml @@ -1813,8 +1813,8 @@ include $(PGXS) You can select a separate directory prefix in which to install your -extension's files, by setting the make variable -prefix when executing make install +extension's files by setting the make variable +prefix when executing make install, like so: make install prefix=/usr/local/postgresql Best, David signature.asc Description: Message signed with OpenPGP
Re: Add an option to skip loading missing publication to avoid logical replication failure
vignesh C writes: > Due to the asynchronous nature of these processes, the ALTER > SUBSCRIPTION command may not be immediately observed by the apply > worker. Meanwhile, the walsender may process and decode an INSERT > statement. > If the insert targets a table (e.g., tab_3) that does not belong to > the current publication (pub1), the walsender silently skips > replicating the record and advances its decoding position. This > position is sent in a keepalive message to the subscriber, and since > there are no pending transactions to flush, the apply worker reports > it as the latest received LSN. So this theory presumes that the apply worker receives and reacts to the keepalive message, yet it has not observed a relevant subscriber-side catalog update that surely committed before the keepalive was generated. It's fairly hard to see how that is okay, because it's at least adjacent to something that must be considered a bug: applying transmitted data without having observed DDL updates to the target table. Why is the processing of keepalives laxer than the processing of data messages? regards, tom lane
Re: PG 18 release notes draft committed
On Thu, May 1, 2025 at 7:44 PM Bruce Momjian wrote: > I will probably add markup in 1-3 weeks. Let the feedback begin. ;-) Thanks! > > Version 18 contains a number of changes that may affect compatibility > with previous releases. Observe the following incompatibilities: > >[...] > Rename server variable ssl_ecdh_curve to ssl_groups and allow multiple > colon-separated ECDH curves to be specified (Erica Zhang, Daniel Gustafsson) The previous setting name should continue to function correctly, since it's mapped as an alias, so this can probably be moved into the "standard" config features rather than a compatibility change. --Jacob
Re: PG 18 release notes draft committed
On Fri, May 2, 2025 at 08:24:42AM -0700, Jacob Champion wrote: > On Thu, May 1, 2025 at 7:44 PM Bruce Momjian wrote: > > I will probably add markup in 1-3 weeks. Let the feedback begin. ;-) > > Thanks! > > > > > Version 18 contains a number of changes that may affect compatibility > > with previous releases. Observe the following incompatibilities: > > > >[...] > > Rename server variable ssl_ecdh_curve to ssl_groups and allow multiple > > colon-separated ECDH curves to be specified (Erica Zhang, Daniel Gustafsson) > > The previous setting name should continue to function correctly, since > it's mapped as an alias, so this can probably be moved into the > "standard" config features rather than a compatibility change. Thanks, done. The commit message didn't indicate the old name would still work, and I didn't review the patch for that. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: First-draft back-branch release notes are up
On Fri, May 2, 2025 at 12:39:15PM -0400, Tom Lane wrote: > Much less exciting than the v18 release notes, but we > still gotta do 'em. See > > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=176877f461a8b55e921f597fb217f6ab89ee019f > > As usual, please send corrections by Sunday. They look good to me, as does my entry: Avoid incorrect optimizations based on IS [NOT] NULL tests that are applied to composite values (Bruce Momjian) § My commit message erroneously said "domains" instead of "composite values". -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Re: [PoC] Federated Authn/z with OAUTHBEARER
On Fri, May 02, 2025 at 10:05:26AM -0700, Jacob Champion wrote: > Nathan, if you get a chance, does the attached patch work for you? Yup, thanks! -- nathan
Re: [PoC] Federated Authn/z with OAUTHBEARER
Jacob Champion writes: > Looks like it's just me. And using partial_dependency for the includes > seems like overkill, so I've kept the full ssl dependency object, but > moved it to the staticlib only, which is enough to solve the breakage > on my machine. > Nathan, if you get a chance, does the attached patch work for you? FWIW, on my Mac a meson build from HEAD goes through fine, with or without this patch. I'm getting openssl and libcurl from MacPorts not Homebrew, but I don't know why that would make any difference. regards, tom lane
Re: Issue attaching a table to a partitioned table with an auto-referenced foreign key
Alvaro Herrera 于2025年5月1日周四 20:17写道: > Hello, > > I've been looking at this bug once again and I think I finally > understood what's going on and how to fix it. > > Ref 1: https://postgr.es/m/20230707175859.17c91538@karst >Re: Issue attaching a table to a partitioned table with an >auto-referenced foreign key >(Guillaume Lelarge) > Ref 2: https://postgr.es/m/18156-a44bc7096f068...@postgresql.org >BUG #18156: Self-referential foreign key in partitioned table not >enforced on deletes >(Matthew Gabeler-Lee) > Ref 3: > https://postgr.es/m/myvsif-attja5dcwouwh21r12r-sfxecy2-3ynt8kao...@mail.gmail.com >Self referential foreign keys in partitioned table not working as >expected >(Luca Vallisa) > > First of all -- apparently we broke this in commit 5914a22f6ea5 (which > fixed the other problems that had been reported by G. Lelarge in Ref 1) > even worse than how it was before, by having the new functions just skip > processing the referenced side altogether. Previously we were at least > partially setting up the necessary triggers, at least some of the time. > So what the report by Luca is saying is, plain and simple, that the > referenced-side action triggers just do not exist, which is why no error > is thrown even on the most trivial cases, on the releases that contain > that commit (17.1, 16.5, 15.9). > Hmm. I didn't get the same conclusion. Before commit 5914a22f6ea5, the issue reported by Luca could have happened. Look at the test below on v17.0: psql (17.0) Type "help" for help. postgres=# create table test ( id_1 int4 not null, id_2 int4 not null, parent_id_2 int4 null, primary key (id_1, id_2), foreign key (id_1, parent_id_2) references test (id_1, id_2) ) partition by list (id_1); create table test_1 partition of test for values in (1); insert into test values (1, 1, null), (1, 2, 1); delete from test where (id_1, id_2) = (1, 1); CREATE TABLE CREATE TABLE INSERT 0 2 DELETE 1 You can see from the above test that no error was reported. But if I revert the commit 614a406b4ff1, above test would report error on v16devel: psql (16devel) Type "help" for help. postgres=# create table test ( id_1 int4 not null, id_2 int4 not null, parent_id_2 int4 null, primary key (id_1, id_2), foreign key (id_1, parent_id_2) references test (id_1, id_2) ) partition by list (id_1); create table test_1 partition of test for values in (1); insert into test values (1, 1, null), (1, 2, 1); delete from test where (id_1, id_2) = (1, 1); CREATE TABLE CREATE TABLE INSERT 0 2 ERROR: update or delete on table "test_1" violates foreign key constraint "test_id_1_parent_id_2_fkey1" on table "test" DETAIL: Key (id_1, id_2)=(1, 1) is still referenced from table "test". > Anyway, if people have a chance to give this a look, it would be > helpful. > It's midnight in my time zone. I will look at this tomorrow. -- Thanks, Tender Wang
Re: POC: Parallel processing of indexes in autovacuum
On Fri, May 2, 2025 at 9:58 AM Sami Imseih wrote: > > > Once we have parallel heap vacuum, as discussed in thread[1], it would > > also likely be beneficial to incorporate it into autovacuum during > > aggressive vacuum or failsafe mode. > > IIRC, index cleanup is disabled by failsafe. Yes. My idea is to use parallel *heap* vacuum in autovacuum during failsafe mode. I think it would make sense as users want to complete freezing tables as soon as possible in this situation. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Make COPY format extendable: Extract COPY TO format implementations
On Fri, May 2, 2025 at 7:20 PM Sutou Kouhei wrote: > > Hi, > > In > "Re: Make COPY format extendable: Extract COPY TO format implementations" > on Thu, 1 May 2025 12:15:30 -0700, > Masahiko Sawada wrote: > > > One of the primary considerations we need to address is the treatment > > of the specified format name. The current patch set utilizes built-in > > formats (namely 'csv', 'text', and 'binary') when the format name is > > either unqualified or explicitly specified with 'pg_catalog' as the > > schema. In all other cases, we search for custom format handler > > functions based on the search_path. To be frank, I have reservations > > about this interface design, as the dependence of the specified custom > > format name on the search_path could potentially confuse users. > > How about requiring schema for all custom formats? > > Valid: > > COPY ... TO ... (FORMAT 'text'); > COPY ... TO ... (FORMAT 'my_schema.jsonlines'); > > Invalid: > > COPY ... TO ... (FORMAT 'jsonlines'); -- no schema > COPY ... TO ... (FORMAT 'pg_catalog.text'); -- needless schema > > If we require "schema" for all custom formats, we don't need > to depend on search_path. I'm concerned that users cannot use the same format name in the FORMAT option depending on which schema the handler function is created. > > > In light of these concerns, I've been contemplating alternative > > interface designs. One promising approach would involve registering > > custom copy formats via a C function during module loading > > (specifically, in _PG_init()). This method would require extension > > authors to invoke a registration function, say > > RegisterCustomCopyFormat(), in _PG_init() as follows: > > > > JsonLinesFormatId = RegisterCustomCopyFormat("jsonlines", > > &JsonLinesCopyToRoutine, > > &JsonLinesCopyFromRoutine); > > > > The registration function would validate the format name and store it > > in TopMemoryContext. It would then return a unique identifier that can > > be used subsequently to reference the custom copy format extension. > > I don't object the suggested interface because I don't have > a strong opinion how to implement this feature. > > Why do we need to assign a unique ID? For performance? For > RegisterCustomCopyFormatOption()? I think it's required for monitoring purposes for example. For instance, we can set the format ID in the progress information and the progress view can fetch the format name by the ID so that users can see what format is being used in the COPY command. > > > Extensions could register their own options within this > > framework, for example: > > > > RegisterCustomCopyFormatOption(JsonLinesFormatId, > > "custom_option", > > custom_option_handler); > > Can we defer to discuss how to add support for custom > options while we focus on the first implementation? Earlier > patch sets with the current approach had custom options > support but it's removed in the first implementation. I think we can skip the custom option patch for the first implementation but still need to discuss how we will be able to implement it to understand the big picture of this feature. Otherwise we could end up going the wrong direction. > > (BTW, I think that it's not a good API because we want COPY > FROM only options and COPY TO only options something like > "compression level".) Why does this matter in terms of API? I think that even with this API we can pass is_from to the option handler function so that it validates the option based on it. > > > This approach offers several advantages: it would eliminate the > > search_path issue, provide greater flexibility, and potentially > > simplify the overall interface for users and developers alike. > > What contributes to the "flexibility"? Developers can call > multiple Register* functions in _PG_Init(), right? I think that with a tablesample-like approach we need to do everything based on one handler function and callbacks returned from it whereas there is no such limitation with C API style. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Make COPY format extendable: Extract COPY TO format implementations
Hi, In "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 2 May 2025 21:38:32 -0700, Masahiko Sawada wrote: >> How about requiring schema for all custom formats? >> >> Valid: >> >> COPY ... TO ... (FORMAT 'text'); >> COPY ... TO ... (FORMAT 'my_schema.jsonlines'); >> >> Invalid: >> >> COPY ... TO ... (FORMAT 'jsonlines'); -- no schema >> COPY ... TO ... (FORMAT 'pg_catalog.text'); -- needless schema >> >> If we require "schema" for all custom formats, we don't need >> to depend on search_path. > > I'm concerned that users cannot use the same format name in the FORMAT > option depending on which schema the handler function is created. I'm not sure that it's a problem or not. If users want to use the same format name, they can install the handler function to the same schema. >> Why do we need to assign a unique ID? For performance? For >> RegisterCustomCopyFormatOption()? > > I think it's required for monitoring purposes for example. For > instance, we can set the format ID in the progress information and the > progress view can fetch the format name by the ID so that users can > see what format is being used in the COPY command. How about setting the format name instead of the format ID in the progress information? > I think we can skip the custom option patch for the first > implementation but still need to discuss how we will be able to > implement it to understand the big picture of this feature. Otherwise > we could end up going the wrong direction. I think that we don't need to discuss it deeply because we have many options with this approach. We can call C functions in _PG_Init(). I think that this feature will not be a blocker of this approach. >> (BTW, I think that it's not a good API because we want COPY >> FROM only options and COPY TO only options something like >> "compression level".) > > Why does this matter in terms of API? I think that even with this API > we can pass is_from to the option handler function so that it > validates the option based on it. If we choose the API, each custom format developer needs to handle the case in handler function. For example, if we pass information whether this option is only for TO to PostgreSQL, ProcessCopyOptions() not handler functions can handle it. Anyway, I think that we don't need to discuss this deeply for now. >> What contributes to the "flexibility"? Developers can call >> multiple Register* functions in _PG_Init(), right? > > I think that with a tablesample-like approach we need to do everything > based on one handler function and callbacks returned from it whereas > there is no such limitation with C API style. Thanks for clarifying it. It seems that my understanding is correct. I hope that the flexibility is needed flexibility and too much flexibility doesn't introduce too much complexity. Thanks, -- kou
Re: Make COPY format extendable: Extract COPY TO format implementations
Hi, In "Re: Make COPY format extendable: Extract COPY TO format implementations" on Fri, 2 May 2025 23:02:25 -0700, Masahiko Sawada wrote: > The progress view can know only numbers. We need to extend the > progress view infrastructure so that we can pass other data types. Sorry. Could you tell me what APIs referred here? pgstat_progress_*() functions in src/include/utils/backend_progress.h? Thanks, -- kou
Re: Make COPY format extendable: Extract COPY TO format implementations
On Fri, May 2, 2025 at 11:20 PM Sutou Kouhei wrote: > > Hi, > > In > "Re: Make COPY format extendable: Extract COPY TO format implementations" > on Fri, 2 May 2025 23:02:25 -0700, > Masahiko Sawada wrote: > > > The progress view can know only numbers. We need to extend the > > progress view infrastructure so that we can pass other data types. > > Sorry. Could you tell me what APIs referred here? > pgstat_progress_*() functions in > src/include/utils/backend_progress.h? The progress information is stored in PgBackendStatus defined in backend_status.h: /* * Command progress reporting. Any command which wishes can advertise * that it is running by setting st_progress_command, * st_progress_command_target, and st_progress_param[]. * st_progress_command_target should be the OID of the relation which the * command targets (we assume there's just one, as this is meant for * utility commands), but the meaning of each element in the * st_progress_param array is command-specific. */ ProgressCommandType st_progress_command; Oid st_progress_command_target; int64 st_progress_param[PGSTAT_NUM_PROGRESS_PARAM]; Then the progress view maps the numbers to the corresponding strings: CREATE VIEW pg_stat_progress_copy AS SELECT S.pid AS pid, S.datid AS datid, D.datname AS datname, S.relid AS relid, CASE S.param5 WHEN 1 THEN 'COPY FROM' WHEN 2 THEN 'COPY TO' END AS command, CASE S.param6 WHEN 1 THEN 'FILE' WHEN 2 THEN 'PROGRAM' WHEN 3 THEN 'PIPE' WHEN 4 THEN 'CALLBACK' END AS "type", S.param1 AS bytes_processed, S.param2 AS bytes_total, S.param3 AS tuples_processed, S.param4 AS tuples_excluded, S.param7 AS tuples_skipped FROM pg_stat_get_progress_info('COPY') AS S LEFT JOIN pg_database D ON S.datid = D.oid; So the idea is that the backend process sets the format ID somewhere in st_progress_param, and then the progress view calls a SQL function, say pg_stat_get_copy_format_name(), with the format ID that returns the corresponding format name. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
Re: Make COPY format extendable: Extract COPY TO format implementations
On Friday, May 2, 2025, Masahiko Sawada wrote: > > I'm concerned about allowing multiple 'text' format implementations > with identical names within the database, as this could lead to > considerable confusion. When users specify 'text', it would be more > logical to guarantee that the built-in 'text' format is consistently > used. Do you want to only give text/csv/binary this special treatment or also any future format name we ever decide to implement in core. If an extension takes up “xml” and we try to do that in core do we fail an upgrade because of the conflict, and make it impossible to actually use said extension? This principle aligns with other customizable components, such > as custom resource managers, wait events, lightweight locks, and > custom scans. These components maintain their built-in data/types and > explicitly prevent the registration of duplicate names. > I am totally lost on how any of those resemble this feature. I’m all for registration to enable additional options and features - but am against moving away from turning format into a namespaced identifier. This is a query-facing feature where namespaces are common and fundamentally required. I have some sympathy for the fact that until now one could not prefix text/binary/csv with pg_catalog to be fully safe, but in reality DBAs/query authors either put pg_catalog first in their search_path or make an informed decision when they deviate. That is the established precedent relevant to this feature. The power, and responsibility for education, lies with the user. David J.