Re: Memory context can be its own parent and child in replication command
I've updated the patch with another approach: 0001: This sets the MemoryContext type to T_Invalid just before adding it to the aset freelist, allowing to distinguish between a MemoryContext that was placed in the freelist and shouldn't be used, and a valid one. The existing MemoryContextIsValid checks will be triggered if a MemoryContext in the freelist is used. I've also added additional MemoryContextIsValid to make sure the MemoryContext restored by a transaction is valid. 0002: Keep the replication command alive when there's an ongoing snapshot export. This way, the context restored is valid and can be reused. This requires the replication command context to be created under the TopMemoryContext. Regards, Anthonin Bonnefoy v02-0001-Add-additional-memory-context-checks.patch Description: Binary data v02-0002-Avoid-using-deleted-context-with-replication-com.patch Description: Binary data
Re: DOCS - Generated Column Replication Examples
On Sat, 1 Mar 2025 at 19:01, Amit Kapila wrote: > > On Mon, Feb 3, 2025 at 4:23 AM Peter Smith wrote: > > > > A recent commit [1] added a new section "29.6. Generated Column > > Replication" to the documentation [2]. But, no "Examples" were > > included. > > > > I've created this new thread to make a case for adding an "Examples" > > subsection to that page. > > > > (the proposed examples patch is attached) > > > > == > > > > 1. Reasons AGAINST Adding Examples to this section (and why I disagree): > > > > 1a. The necessary information to describe the feature can already be > > found in the text. > > > > I still feel that the current text and example given on the page are > sufficient for now. This is a new feature, and I think it is better to > get some feedback from users. We can add more examples or information > based on real user feedback. Also, we haven't got any other response > on this thread yet. So, I think we can wait for a week or so more to > see if anyone else also thinks that the additional examples will be > useful; if not, then return this patch for now. It seems there is currently little interest in this, so I’ve closed the commitfest entry for now. We can revisit this later if we receive feedback from users indicating it's needed. Regards, Vignesh
Re: Non-text mode for pg_dumpall
On Wed, Mar 12, 2025 at 1:06 AM Álvaro Herrera wrote: > > Hello, > > On 2025-Mar-11, Mahendra Singh Thalor wrote: > > > In map.dat file, I tried to fix this issue by adding number of characters > > in dbname but as per code comments, as of now, we are not supporting \n\r > > in dbnames so i removed handling. > > I will do some more study to fix this issue. > > Yeah, I think this is saying that you should not consider the contents > of map.dat as a shell string. After all, you're not going to _execute_ > that file via the shell. > > Maybe for map.dat you need to escape such characters somehow, so that > they don't appear as literal newlines/carriage returns. > I am confused. currently pg_dumpall plain format will abort when encountering dbname containing newline. the left dumped plain file does not contain all the cluster databases data. if pg_dumpall non-text format aborts earlier, it's aligned with pg_dumpall plain format? it's also an improvement since aborts earlier, nothing will be dumped? am i missing something?
Re: Selectively invalidate caches in pgoutput module
On Tue, Mar 11, 2025 at 5:47 PM Hayato Kuroda (Fujitsu) wrote: > > Attached patch address comments by you and Amit [1]. What's new; > Thanks, the patch looks mostly good to me. I have made few cosmetic changes in the attached and combined both the patches. The existing Alter Publication ... Rename tests don't test invalidations arising from that command. As this patch changes that code path, it would be good to add a few tests for the same. We can add one for individual relations and another for ALL Tables publication. -- With Regards, Amit Kapila. v11-0001-Avoid-invalidating-all-RelationSyncCache-entries.patch Description: Binary data
Re: Wrong results with subquery pullup and grouping sets
On Wed, 12 Mar 2025 at 07:45, Richard Guo wrote: > > I refined the comment in v2 and ended up with: > > * This analysis could be tighter: in particular, a non-strict > * construct hidden within a lower-level PlaceHolderVar is not > * reason to add another PHV. But for now it doesn't seem > - * worth the code to be more exact. > + * worth the code to be more exact. This is also why it's > + * preferable to handle bare PHVs in the above branch, rather > + * than this branch. We also prefer to handle bare Vars in a > + * separate branch, as it's cheaper this way and parallels the > + * handling of PHVs. > LGTM. > For backpatching, 0001 seems more like a cosmetic change, and I don't > think it needs to be backpatched. 0002 is a bug fix, but I'm not > confident enough to commit it to the back branches. Given that there > are other wrong result issues with grouping sets fixed in version 18 > but not in earlier versions, and that there are no field reports about > this issue, perhaps it's OK to refrain from backpatching this one? > Yes, I was thinking along the same lines. This particular issue feels a bit manufactured, and unlikely to occur in practice, but I'm happy to go with whatever you decide. Thanks for taking care of this. Regards, Dean
Re: pgsql: reindexdb: Add the index-level REINDEX with multiple jobs
On 2025-Mar-11, Alexander Korotkov wrote: > Thank you for your feedback! I also think that pipelining would be a > better options, but it's too invasive for backpatching. I've written > comments for gen_reindex_command() and run_reindex_command(). I'm > going to push this patch to master and 17 if no objections. Looks good. Actually, I don't understand why is run_reindex_command() calling PQfinish(). Hmm, also, why is the 'async' parameter always given as true? Is that PQfinish() actually dead code? [... Looks at git history ...] Ah! this is actually a problem older than your patch -- it appears that the last (only?) caller with async=false was removed by commit 2cbc3c17a5c1. I think it's worth removing that while you're at it. We no longer need executeMaintenanceCommand() in reindexdb.c anymore either, I think. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/
Re: AIO v2.5
On Tue, Mar 11, 2025 at 07:55:35PM -0400, Andres Freund wrote: > On 2025-03-11 12:41:08 -0700, Noah Misch wrote: > > On Mon, Sep 16, 2024 at 01:51:42PM -0400, Andres Freund wrote: > > > On 2024-09-16 07:43:49 -0700, Noah Misch wrote: > What do we want to do for ConditionalLockBufferForCleanup() (I don't think > IsBufferCleanupOK() can matter)? I suspect we should also make it wait for > the IO. See below: I agree IsBufferCleanupOK() can't matter. It asserts that the caller already holds the exclusive buffer lock, and it doesn't loop or wait. > [...] leads me to think that causing > the IO to complete is probably the safest bet. Triggering IO completion never > requires acquiring new locks that could participate in a deadlock, so it'd be > safe. Yes, that decision looks right to me. I scanned the callers, and none of them clearly prefers a different choice. If we someday find one caller prefers a false return over blocking on I/O completion, we can always introduce a new ConditionalLock* variant for that. > > > - To allow io_workers to be PGC_SIGHUP, and to eventually allow to > > > automatically in/decrease active workers, the max number of workers > > > (32) is > > > always allocated. That means we use more semaphores than before. I think > > > that's ok, it's not 1995 anymore. Alternatively we can add a > > > "io_workers_max" GUC and probe for it in initdb. > > > > Let's start as you have it. If someone wants to make things perfect for > > non-root BSD users, they can add the GUC later. io_method=sync is a > > sufficient backup plan indefinitely. > > Cool. > > I think we'll really need to do something about this for BSD users regardless > of AIO. Or maybe those OSs should fix something, but somehow I am not having > high hopes for an OS that claims to have POSIX confirming unnamed semaphores > due to having a syscall that always returns EPERM... [1]. I won't mind a project making things better for non-root BSD users. I do think such a project should not block other projects making things better for everything else (like $SUBJECT). > > > - Check if documentation for track_io_timing needs to be adjusted, after > > > the > > > bufmgr.c changes we only track waiting for an IO. > > > > Yes. > > The relevant sentences seem to be: > > - "Enables timing of database I/O calls." > > s/calls/waits/ > > - "Time spent in {read,write,writeback,extend,fsync} operations" > > s/in/waiting for/ > > Even though not all of these will use AIO, the "waiting for" formulation > seems just as accurate. > > - "Columns tracking I/O time will only be non-zero whenlinkend="guc-track-io-timing"/> is enabled." > > s/time/wait time/ Sounds good. > > On Mon, Mar 10, 2025 at 02:23:12PM -0400, Andres Freund wrote: > > > Attached is v2.6 of the AIO patchset. > > > > > - 0005, 0006 - io_uring support - close, but we need to do something about > > > set_max_fds(), which errors out spuriously in some cases > > > > What do we know about those cases? I don't see a set_max_fds(); is that > > set_max_safe_fds(), or something else? > > Sorry, yes, set_max_safe_fds(). The problem basically is that with io_uring we > will have a large number of FDs already allocated by the time > set_max_safe_fds() is called. set_max_safe_fds() subtracts already_open from > max_files_per_process allowing few, and even negative, IOs. > > I think we should redefine max_files_per_process to be about the number of > files each *backend* will additionally open. Jelte was working on related > patches, see [2] Got it. max_files_per_process is a quaint setting, documented as follows (I needed the reminder): If the kernel is enforcing a safe per-process limit, you don't need to worry about this setting. But on some platforms (notably, most BSD systems), the kernel will allow individual processes to open many more files than the system can actually support if many processes all try to open that many files. If you find yourself seeing Too many open files failures, try reducing this setting. I could live with v6-0003-Reflect-the-value-of-max_safe_fds-in-max_files_pe.patch but would lean against it since it feels unduly novel to have a setting where we use the postgresql.conf value to calculate a value that becomes the new SHOW-value of the same setting. Options I'd consider before that: - Like you say, "redefine max_files_per_process to be about the number of files each *backend* will additionally open". It will become normal that each backend's actual FD list length is max_files_per_process + MaxBackends if io_method=io_uring. Outcome is not unlike v6-0002-Bump-postmaster-soft-open-file-limit-RLIMIT_NOFIL.patch + v6-0003-Reflect-the-value-of-max_safe_fds-in-max_files_pe.patch but we don't mutate max_files_per_process. Benchmark results should not change beyond the inter-major-version noise level unless one sets io_method=io_uring. I'm fee
Re: Special-case executor expression steps for common combinations
> On 9 Mar 2025, at 23:35, Daniel Gustafsson wrote: >> On 8 Mar 2025, at 17:15, Andreas Karlsson wrote: >> +1 Still seems like a nice change to me too. > > Thanks, I actually had it staged to go in early next week. ..which was done yesterday, thanks for review! -- Daniel Gustafsson
Re: Change COPY ... ON_ERROR ignore to ON_ERROR ignore_row
On 12.03.25 09:00, jian he wrote: >> 1) WARNING might be a better fit than NOTICE here. >> > but NOTICE, on_errror set_to_null is aligned with on_errror ignore. > >> I would still leave the extra messages from "log_verbosity verbose" as >> NOTICE though. What do you think? >> >> > When LOG_VERBOSITY option is set to verbose, > for ignore option, a NOTICE message containing the line of the input > file and the column name > whose input conversion has failed is emitted for each discarded row; > for set_to_null option, a NOTICE message containing the line of the > input file and the column name > where value was replaced with NULL for each input conversion failure. > > see the above desciption, > on_errror set_to_null is aligned with on_errror ignore. > it's just on_errror ignore is per row, on_errror set_to_null is per > column/field. > so NOTICE is aligned with other on_error option. I considered using a WARNING due to the severity of the issue - the failure to import data - but either NOTICE or WARNING works for me. >> 2) Inconsistent terminology. Invalid values in "on_error set_to_null" >> mode are names as "erroneous", but as "invalid" in "on_error stop" mode. >> I don't want to get into the semantics of erroneous or invalid, but >> sticking to one terminology would IMHO look better. >> > I am open to changing it. > what do you think "invalid values in %llu row was replaced with null"? LGTM: "invalid values in %llu rows were replaced with null" Thanks for the patch! Best, Jim
Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints
On Tue, Mar 11, 2025 at 3:46 PM Rushabh Lathia wrote: >> >> > JFYI, I can reproduce the failure Ashutosh Bapat reported, while >> > running the pg_upgrade test through meson commands. I am >> > investigating that further. >> >> Ah good, thanks. > > > Somehow, I am now not able to reproduce after the clean build. Yesterday > I was able to reproduce, so I was happy, but again trying to analyze the issue > when I start with the If the test passes for you, can you please try the patches at [1] on top of your patches? Please apply those, set and export environment variable PG_TEST_EXTRA=regress_dump_test, and run 002_pg_upgrade test? I intended to do this but can not do it since the test always fails with your patches applied. [1] https://www.postgresql.org/message-id/CAExHW5uQoyOddBKLBBJpfxXqqok%3DBTeMvt5OpnM6gw0SroiUUw%40mail.gmail.com -- Best Wishes, Ashutosh Bapat
Re: Wrong results with subquery pullup and grouping sets
On Wed, Mar 12, 2025 at 1:32 AM Dean Rasheed wrote: > On Mon, 10 Mar 2025 at 13:05, Richard Guo wrote: > > Attached are the patches. > These 2 comment changes from 0002 could be made part of 0001: > > 1). In pull_up_simple_subquery(), removing the word "Again" from the > comment following the deleted block, since this is now the only place > that sets wrap_non_vars there. > > 2). In pull_up_constant_function(), moving "(See comments in > pull_up_simple_subquery().)" to the next comment. Nice catch. > > Another thing is that when deciding whether to wrap the newnode in > > pullup_replace_vars_callback(), I initially thought that we didn't > > need to handle Vars/PHVs specifically, and could instead merge them > > into the branch for handling general expressions. However, doing so > > causes a plan diff in the regression tests. The reason is that a > > non-strict construct hidden within a lower-level PlaceHolderVar can > > lead the code for handling general expressions to mistakenly think > > that another PHV is needed, even when it isn't. Therefore, the > > branches for handling Vars/PHVs are retained in 0002. > Right. The comment addition in 0002, relating to that, confused me at first: > > * This analysis could be tighter: in particular, a non-strict > * construct hidden within a lower-level PlaceHolderVar is not > * reason to add another PHV. But for now it doesn't seem > -* worth the code to be more exact. > +* worth the code to be more exact. This is also why we need > +* to handle Vars and PHVs in the above branches, rather than > +* merging them into this one. > > AIUI, it's not really that it *needs* to handle Vars and PHVs > separately, it's just better if it does, since that avoids > unnecessarily wrapping a PHV again, if it contains non-strict > constructs. Also, AFAICS there's no technical reason why simple Vars > couldn't be handled here (all the tests pass if that branch is > removed), but as a comment higher up says, that would be more > expensive. So perhaps this new comment should say "This is why it's > preferable to handle simple PHVs in the branch above, rather than this > branch." Exactly. Technically, we could remove the branch for Vars. However, I chose to keep it because: 1) it's more efficient this way, and 2) it's better to keep the handling of Vars and PHVs aligned. I refined the comment in v2 and ended up with: * This analysis could be tighter: in particular, a non-strict * construct hidden within a lower-level PlaceHolderVar is not * reason to add another PHV. But for now it doesn't seem - * worth the code to be more exact. + * worth the code to be more exact. This is also why it's + * preferable to handle bare PHVs in the above branch, rather + * than this branch. We also prefer to handle bare Vars in a + * separate branch, as it's cheaper this way and parallels the + * handling of PHVs. > Finally, ReplaceWrapOption should be added to pgindent's typedefs.list. Nice catch. For backpatching, 0001 seems more like a cosmetic change, and I don't think it needs to be backpatched. 0002 is a bug fix, but I'm not confident enough to commit it to the back branches. Given that there are other wrong result issues with grouping sets fixed in version 18 but not in earlier versions, and that there are no field reports about this issue, perhaps it's OK to refrain from backpatching this one? Thanks Richard v2-0001-Remove-code-setting-wrap_non_vars-to-true-for-UNION-ALL-subqueries.patch Description: Binary data v2-0002-Fix-incorrect-handling-of-subquery-pullup.patch Description: Binary data
Re: Conflict detection for update_deleted in logical replication
On Thu, 20 Feb 2025 at 12:50, Zhijie Hou (Fujitsu) wrote: > > > Here is the v28 patch set, which converts the subscription option > max_conflict_retention_duration into a GUC. Other logic remains unchanged. After discussing with Hou internally, I have moved this to the next CommitFest since it will not be committed in the current release. This also allows reviewers to focus on the remaining patches in the current CommitFest. Regards, Vignesh
Re: Question about duplicate JSONTYPE_JSON check
On 2025-Mar-12, Amit Langote wrote: > I was able to construct a test case that crashes due to this bug: > > CREATE TYPE mood AS ENUM ('happy', 'sad', 'neutral'); > CREATE FUNCTION mood_to_json(mood) RETURNS json AS $$ > SELECT to_json($1::text); > $$ LANGUAGE sql IMMUTABLE; > CREATE CAST (mood AS json) WITH FUNCTION mood_to_json(mood) AS IMPLICIT; > > SELECT JSON_OBJECT('happy'::mood: '123'::jsonb); > server closed the connection unexpectedly Good reaction time :-) I see that that line shows as not even uncovered in the report, but as non-existant (no background color as opposed to red): https://coverage.postgresql.org/src/backend/utils/adt/jsonb.c.gcov.html#660 Evidently the compiler must be optimizing it out as dead code. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Update: super-fast reaction on the Postgres bugs mailing list. The report was acknowledged [...], and a fix is under discussion. The wonders of open-source !" https://twitter.com/gunnarmorling/status/1596080409259003906
Re: Parallel heap vacuum
On Wed, Mar 12, 2025 at 3:40 PM Amit Kapila wrote: > > On Wed, Mar 12, 2025 at 11:24 AM Dilip Kumar wrote: > > > > On Wed, Mar 12, 2025 at 3:17 AM Masahiko Sawada > > wrote: > > > > > > On Tue, Mar 11, 2025 at 5:51 AM Amit Kapila > > > wrote: > > > > > > > > Some thoughts/questions on the idea > > > > I notice that we are always considering block-level parallelism for > > heaps and object-level parallelism for indexes. I'm wondering, when > > multiple tables are being vacuumed together—either because the user > > has provided a list of tables or has specified a partitioned table > > with multiple children—does it still make sense to default to > > block-level parallelism? Or could we consider table-level parallelism > > in such cases? For example, if there are 4 tables and 6 workers, with > > 2 tables being small and the other 2 being large, perhaps we could > > allocate 4 workers to vacuum all 4 tables in parallel. For the larger > > tables, we could apply block-level parallelism, using more workers for > > internal parallelism. On the other hand, if all tables are small, we > > could just apply table-level parallelism without needing block-level > > parallelism at all. This approach could offer more flexibility, isn't > > it? > > > > I have not thought from this angle, but it seems we can build this > even on top of block-level vacuum for large tables. Yes, that can be built on top of block-level vacuum. In that case, we can utilize the workers more efficiently, depending on how many workers we have and how many tables need to be vacuumed. And yes, that could also be discussed separately. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Test mail for pgsql-hackers
Hi Blessy, On Tue, Mar 11, 2025 at 6:03 PM BharatDB wrote: >> >> Hi , > > >> >> I’ve been exploring logical replication and noticed that if the column >> datatypes don’t match between the publisher and subscriber, PostgreSQL >> doesn’t give a warning. This can cause unexpected behavior, and I thought it >> might be helpful to alert users when this happens. > > >> >> ### **What This Patch Does:** > > >> >> - Adds a warning when a column's datatype in the subscriber doesn’t match >> the publisher. >> >> - Helps users catch issues early instead of running into silent errors later. > > >> >> Why I Think It’s Useful:- Avoids confusion when replication doesn’t work as >> expected. - Makes debugging easier by pointing out potential problems. I’d >> love to get feedback on whether this is a good idea and if I’ve approached >> it correctly. Since I’m still learning, any suggestions for improvement >> would be really helpful. I’ve attached the patch—please let me know what you >> think! Large part of your patch is renaming files which are not related to this patch. Can you please clean that up. The code also looks problematic +Oid local_typid = TupleDescAttr(tupdesc, i)->atttypid; +Oid remote_typid = lrel.atttyps[i]; AFAIK, the attribute positions on the publisher and subscriber need not match, but this code assumes that the attributes at ith position on publisher has to be mapped to the attribute at the same position on the subscriber. + +/* Get human-readable type names */ +char *local_typname = format_type_be(local_typid); +char *remote_typname = format_type_be(remote_typid); format_type_be() will give the name of the type on the subscriber not on the publisher. You will need to look up logical replication type cache for the name of the type on remote. If further assumes that the OIDs on the same types on the publisher and the subscriber are same, which may not be. My memory is hazy as to whether we consider the types with the same name on publisher and subscriber to be equivalent. If not, then I see no way to make sure that the mapped columns are of the same type. -- Best Wishes, Ashutosh Bapat
Re: Index AM API cleanup
While studying the patch "[PATCH v21 08/12] Allow non-btree indexes to participate in mergejoin", I figured we could do the sortsupport.c piece much simpler. The underlying issue there is similar to commits 0d2aa4d4937 and c594f1ad2ba: We're just using the btree strategy numbers to pass information about the sort direction. We can do this simpler by just passing a bool. See attached patch.From f1855169b20ecfa12016c89abac52913cedf6688 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 12 Mar 2025 11:02:33 +0100 Subject: [PATCH v21.1] Simplify and generalize PrepareSortSupportFromIndexRel() PrepareSortSupportFromIndexRel() was accepting btree strategy numbers purely for the purpose of comparing it later against btree strategies to determine if the sort direction was forward or reverse. Change that. Instead, pass a bool directly, to indicate the same without an unfortunate assumption that a strategy number refers specifically to a btree strategy. (This is similar in spirit to commits 0d2aa4d4937 and c594f1ad2ba.) (This could arguably be simplfied further by having the callers fill in ssup_reverse directly. But this way, it preserves consistency by having all PrepareSortSupport*() variants be responsible for filling in ssup_reverse.) Moreover, remove the hardcoded check against BTREE_AM_OID, and check against amcanorder instead, which is the actual requirement. Discussion: https://www.postgresql.org/message-id/flat/e72eaa49-354d-4c2e-8eb9-255197f55...@enterprisedb.com --- src/backend/access/nbtree/nbtsort.c| 7 +++ src/backend/utils/sort/sortsupport.c | 15 ++- src/backend/utils/sort/tuplesortvariants.c | 14 ++ src/include/utils/sortsupport.h| 2 +- 4 files changed, 16 insertions(+), 22 deletions(-) diff --git a/src/backend/access/nbtree/nbtsort.c b/src/backend/access/nbtree/nbtsort.c index fa336ba0096..3794cc924ad 100644 --- a/src/backend/access/nbtree/nbtsort.c +++ b/src/backend/access/nbtree/nbtsort.c @@ -1171,7 +1171,7 @@ _bt_load(BTWriteState *wstate, BTSpool *btspool, BTSpool *btspool2) { SortSupport sortKey = sortKeys + i; ScanKey scanKey = wstate->inskey->scankeys + i; - int16 strategy; + boolreverse; sortKey->ssup_cxt = CurrentMemoryContext; sortKey->ssup_collation = scanKey->sk_collation; @@ -1183,10 +1183,9 @@ _bt_load(BTWriteState *wstate, BTSpool *btspool, BTSpool *btspool2) Assert(sortKey->ssup_attno != 0); - strategy = (scanKey->sk_flags & SK_BT_DESC) != 0 ? - BTGreaterStrategyNumber : BTLessStrategyNumber; + reverse = (scanKey->sk_flags & SK_BT_DESC) != 0; - PrepareSortSupportFromIndexRel(wstate->index, strategy, sortKey); + PrepareSortSupportFromIndexRel(wstate->index, reverse, sortKey); } for (;;) diff --git a/src/backend/utils/sort/sortsupport.c b/src/backend/utils/sort/sortsupport.c index 6037031eaa3..0b4f08ed2ec 100644 --- a/src/backend/utils/sort/sortsupport.c +++ b/src/backend/utils/sort/sortsupport.c @@ -150,15 +150,15 @@ PrepareSortSupportFromOrderingOp(Oid orderingOp, SortSupport ssup) } /* - * Fill in SortSupport given an index relation, attribute, and strategy. + * Fill in SortSupport given an index relation and attribute. * * Caller must previously have zeroed the SortSupportData structure and then * filled in ssup_cxt, ssup_attno, ssup_collation, and ssup_nulls_first. This - * will fill in ssup_reverse (based on the supplied strategy), as well as the + * will fill in ssup_reverse (based on the supplied argument), as well as the * comparator function pointer. */ void -PrepareSortSupportFromIndexRel(Relation indexRel, int16 strategy, +PrepareSortSupportFromIndexRel(Relation indexRel, bool reverse, SortSupport ssup) { Oid opfamily = indexRel->rd_opfamily[ssup->ssup_attno - 1]; @@ -166,12 +166,9 @@ PrepareSortSupportFromIndexRel(Relation indexRel, int16 strategy, Assert(ssup->comparator == NULL); - if (indexRel->rd_rel->relam != BTREE_AM_OID) - elog(ERROR, "unexpected non-btree AM: %u", indexRel->rd_rel->relam); - if (strategy != BTGreaterStrategyNumber && - strategy != BTLessStrategyNumber) - elog(ERROR, "unexpected sort support strategy: %d", strategy); - ssup->ssup_reverse = (strategy == BTGreaterStrategyNumber); + if (!indexRel->rd_indam->amcanorder) + elog(ERROR, "unexpected non-amcanorder AM: %u", indexRel->rd_rel->relam); + ssup->ssup_reverse = reverse; FinishSortSupportFunction(opfamily, opcintype, ssup); } di
Re: meson vs. llvm bitcode files
Hi, On Tue, 11 Mar 2025 at 01:04, Diego Fronza wrote: > I did a full review on the provided patches plus some tests, I was able to > validate that the loading of bitcode modules is working also JIT works for > both backend and contrib modules. Thank you! > To test JIT on contrib modules I just lowered the costs for all jit settings > and used the intarray extension, using the data/test__int.data: > CREATE EXTENSION intarray; > CREATE TABLE test__int( a int[] );1 > \copy test__int from 'data/test__int.data' > > For queries any from line 98+ on contrib/intarray/sql/_int.sql will work. > > Then I added extra debug messages to llvmjit_inline.cpp on > add_module_to_inline_search_path() function, also on > llvm_build_inline_plan(), I was able to see many functions in this module > being successfully inlined. > > I'm attaching a new patch based on your original work which add further > support for generating bitcode from: Thanks for doing that! > - Generated backend sources: processed by flex, bison, etc. > - Generated contrib module sources, I think we do not need to separate these two. foreach srcfile : bitcode_module['srcfiles'] -if meson.version().version_compare('>=0.59') +srcfilename = '@0@'.format(srcfile) +if srcfilename.startswith('=0.59') Also, checking if the string starts with ' On this patch I just included fmgrtab.c and src/backend/parser for the > backend generated code. > For contrib generated sources I added contrib/cube as an example. I applied your contrib/cube example and did the same thing for the contrib/seg. > All relevant details about the changes are included in the patch itself. > > As you may know already I also created a PR focused on llvm bitcode emission > on meson, it generates bitcode for all backend and contribution modules, > currently under review by some colleagues at Percona: > https://github.com/percona/postgres/pull/103 > I'm curious if we should get all or some of the generated backend sources > compiled to bitcode, similar to contrib modules. I think we can do this. I added other backend sources like you did in the PR but attached it as another patch (0007) because I wanted to hear other people's opinions on that first. v3 is attached. -- Regards, Nazir Bilal Yavuz Microsoft From 62c7c05b19476939941b656a6eab43dc3661ef09 Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Thu, 24 Oct 2024 07:23:05 -0400 Subject: [PATCH v3 1/7] meson: Add generated header stamps Otherwise build commands become too long and this has visible effect on creation time of meson build files. Author: Andres Freund Author: Nazir Bilal Yavuz Discussion: https://postgr.es/m/206b001d-1884-4081-bd02-bed5c92f02ba%40eisentraut.org --- src/include/meson.build | 18 ++ src/backend/meson.build | 2 +- src/fe_utils/meson.build | 2 +- meson.build | 16 +--- 4 files changed, 29 insertions(+), 9 deletions(-) diff --git a/src/include/meson.build b/src/include/meson.build index 2e4b7aa529e..c488a5dc4c9 100644 --- a/src/include/meson.build +++ b/src/include/meson.build @@ -177,3 +177,21 @@ install_subdir('catalog', # autoconf generates the file there, ensure we get a conflict generated_sources_ac += {'src/include': ['stamp-h']} + +# Instead of having targets depending directly on the generated headers, have +# them depend on a stamp files for all of them. Dependencies on headers are +# implemented as order-only dependencies in meson (later using compiler +# generated dependencies). The benefit of using a stamp file is that it makes +# ninja.build smaller and meson setup faster. +generated_headers_stamp = custom_target('generated-headers-stamp.h', + output: 'generated-headers-stamp.h', + input: generated_headers, + command: stamp_cmd, +) + +generated_backend_headers_stamp = custom_target('generated-backend-headers-stamp.h', + output: 'generated-backend-headers-stamp.h', + input: generated_backend_headers, + depends: generated_headers_stamp, + command: stamp_cmd, +) diff --git a/src/backend/meson.build b/src/backend/meson.build index 2b0db214804..7fc649c3ebd 100644 --- a/src/backend/meson.build +++ b/src/backend/meson.build @@ -169,7 +169,7 @@ backend_mod_code = declare_dependency( compile_args: pg_mod_c_args, include_directories: postgres_inc, link_args: pg_mod_link_args, - sources: generated_headers + generated_backend_headers, + sources: [generated_backend_headers_stamp], dependencies: backend_mod_deps, ) diff --git a/src/fe_utils/meson.build b/src/fe_utils/meson.build index a18cbc939e4..5a9ddb73463 100644 --- a/src/fe_utils/meson.build +++ b/src/fe_utils/meson.build @@ -29,7 +29,7 @@ generated_sources += psqlscan fe_utils_sources += psqlscan fe_utils = static_library('libpgfeutils', - fe_utils_sources + generated_headers, + fe_utils_sources, c_pch: pch_postgres_fe_h, include_directories: [postgres_inc, libpq_inc], c_args: host_system == 'windows' ? ['-DFD_SETSIZE=1024'] : [], di
Re: Add an option to skip loading missing publication to avoid logical replication failure
On Tue, Mar 11, 2025 at 4:01 PM vignesh C wrote: > > On Mon, 10 Mar 2025 at 09:33, Amit Kapila wrote: > > > > On Tue, Mar 4, 2025 at 6:54 PM vignesh C wrote: > > > > > > On further thinking, I felt the use of publications_updated variable > > > is not required we can use publications_valid itself which will be set > > > if the publication system table is invalidated. Here is a patch for > > > the same. > > > > > > > The patch relies on the fact that whenever a publication's data is > > invalidated, it will also invalidate all the RelSyncEntires as per > > publication_invalidation_cb. But note that we are discussing removing > > that inefficiency in the thread [1]. So, we should try to rebuild the > > entry when we have skipped the required publication previously. > > > > Apart from this, please consider updating the docs, as mentioned in my > > response to Sawada-San's email. > > The create subscription documentation already has "We allow > non-existent publications to be specified so that users can add those > later. This means pg_subscription can have non-existent publications." > and should be enough at [1]. Let me know if we need to add more > documentation. > > Apart from this I have changed the log level that logs "skipped > loading publication" to WARNING as we log a warning "WARNING: > publications "pub2", "pub3" do not exist on the publisher" in case of > CREATE SUBSCRIPTION and looked similar to this. I can change it to a > different log level in case you feel this is not the right level. > > Also I have added a test case for dilip's comment from [2]. > The attached v7 version patch has the changes for the same. > Thanks, Vignesh, for adding the test. I believe you've tested the effect of DROP PUBLICATION. However, I think we should also test the behavior of ALTER SUBSCRIPTION...SET PUBLICATION before creating the PUBLICATION, and then create the PUBLICATION at a later stage. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints
On Wed, Mar 12, 2025 at 3:20 PM Alvaro Herrera wrote: > > On 2025-Mar-12, Ashutosh Bapat wrote: > > > If the test passes for you, can you please try the patches at [1] on > > top of your patches? Please apply those, set and export environment > > variable PG_TEST_EXTRA=regress_dump_test, and run 002_pg_upgrade test? > > I intended to do this but can not do it since the test always fails > > with your patches applied. > > Oh, I need to enable a PG_TEST_EXTRA option in order for the test to > run? FFS. That explains why the tests passed just fine for me. > I'll re-run. You need PG_TEST_EXTRA to run the testcase I am adding there. Rest of the testcases run without PG_TEST_EXTRA. -- Best Wishes, Ashutosh Bapat
Re: [PATCH] SVE popcount support
On Wed, Mar 12, 2025 at 02:41:18AM +, nathandboss...@gmail.com wrote: > v5-no-sve is the result of using a function pointer, but pointing to the > "slow" versions instead of the SVE version. v5-sve is the result of the > latest patch in this thread on a machine with SVE support, and v5-4reg is > the result of the latest patch in this thread modified to process 4 > register's worth of data at a time. Nice, I wonder why I did not observe any performance gain in the 4reg version. Did you modify the 4reg version code? One possible explanation is that you used Graviton4 based instances whereas I used Graviton3 instances. > For the latter point, I think we should consider trying to add a separate > Neon implementation that we use as a fallback for machines that don't have > SVE. My understanding is that Neon is virtually universally supported on > 64-bit Arm gear, so that will not only help offset the function pointer > overhead but may even improve performance for a much wider set of machines. I have added the NEON implementation in the latest patch. Here are the numbers for drive_popcount(100, 1024) on m7g.8xlarge: Scalar - 692ms Neon - 298ms SVE - 112ms -Chiranmoy v6-0001-SVE-and-NEON-support-for-popcount.patch Description: v6-0001-SVE-and-NEON-support-for-popcount.patch
Re: Add an option to skip loading missing publication to avoid logical replication failure
On Tue, Mar 11, 2025 at 9:48 AM Dilip Kumar wrote: > > On Mon, Mar 10, 2025 at 10:54 AM Amit Kapila wrote: > > > > > >> BTW, I am planning to commit this only on HEAD as this is a behavior > > >> change. Please let me know if you guys think otherwise. > > > > > > > > > Somehow this looks like a bug fix which should be backported no? Am I > > > missing something? > > > > > > > We can consider this a bug-fix and backpatch it, but I am afraid that > > because this is a behavior change, some users may not like it. Also, I > > don't remember seeing public reports for this behavior; that is > > probably because it is hard to hit. FYI, we found this via BF > > failures. So, I thought it would be better to get this field tested > > via HEAD only at this point in time. > > At the moment, I don't have a strong opinion on this. Since no one has > encountered or reported this issue, it might be the case that it's not > affecting anyone, and we could simply backpatch without causing any > dissatisfaction. However, I'm fine with whatever others decide. > Sawada-San, others, do you have an opinion on whether to backpatch this change? -- With Regards, Amit Kapila.
Re: SQL function which allows to distinguish a server being in point in time recovery mode and an ordinary replica
On Fri, 28 Feb 2025 at 19:37, wrote: > > Hi, > > Fix an error in the patch. I felt you might have missed attaching the test patches added at [1]. Also the test from [1] is failing with the latest v6 version patch. This change is not required: extern void XLogRecGetBlockRefInfo(XLogReaderState *record, bool pretty, bool detailed_format, StringInfo buf, uint32 *fpi_len); - /* [1] - https://www.postgresql.org/message-id/4ba66566b84df983c881b996eb8831f1%40postgrespro.ru Regards, Vignesh
Re: Vacuum timing in pg_stat_all_tables
Hi, On Tue, Mar 04, 2025 at 03:24:26PM +, Bertrand Drouvot wrote: > Like "more stats are always nice" I think that "more explanations in the doc" > are > always nice, so I don't see any reason why not to add this extra explanation. Attached an attempt to do so. Note that it does not add extra explanation to "cost-based delay". If we feel the need we could add a link to "" like it has been done for delay_time in bb8dff9995f. Regards, -- Bertrand Drouvot PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com >From 8c333bcb97541b1238602ae84ea85d4792a5577f Mon Sep 17 00:00:00 2001 From: Bertrand Drouvot Date: Wed, 12 Mar 2025 10:36:13 + Subject: [PATCH v1] Add more details for pg_stat_all_tables 30a6ed0ce4b added the [auto]vacuum_time and [auto]analyze _time fields to pg_stat_all_tables and bb8dff9995f added cost-based vacuum delay time to progress views. This commit highlights the fact that the fields added in 30a6ed0ce4b include the time spent sleeping due to cost-based delay. Suggested-by: Magnus Hagander --- doc/src/sgml/monitoring.sgml | 12 1 file changed, 8 insertions(+), 4 deletions(-) 100.0% doc/src/sgml/ diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index aaa6586d3a4..a37ae4d2d1d 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -4047,7 +4047,8 @@ description | Waiting for a newly initialized WAL file to reach durable storage total_vacuum_time double precision - Total time this table has been manually vacuumed, in milliseconds + Total time this table has been manually vacuumed, in milliseconds. (This + includes the time spent sleeping due to cost-based delay.) @@ -4057,7 +4058,8 @@ description | Waiting for a newly initialized WAL file to reach durable storage Total time this table has been vacuumed by the autovacuum daemon, - in milliseconds + in milliseconds. (This includes the time spent sleeping due to cost-based + delay.) @@ -4066,7 +4068,8 @@ description | Waiting for a newly initialized WAL file to reach durable storage total_analyze_time double precision - Total time this table has been manually analyzed, in milliseconds + Total time this table has been manually analyzed, in milliseconds. (This + includes the time spent sleeping due to cost-based delay.) @@ -4076,7 +4079,8 @@ description | Waiting for a newly initialized WAL file to reach durable storage Total time this table has been analyzed by the autovacuum daemon, - in milliseconds + in milliseconds. (This includes the time spent sleeping due to cost-based + delay.) -- 2.34.1
Re: Improve monitoring of shared memory allocations
Hi Andres, >> > + if (hashp->isshared) >> > + { >> > + int nsegs; >> > + int nbuckets; >> > + nsegs = find_num_of_segs(nelem, &nbuckets, >> hctl->num_partitions, hctl->ssize); >> > + >> > + curr_offset = (((char *) hashp->hctl) + >> sizeof(HASHHDR) + (info->dsize * sizeof(HASHSEGMENT)) + >> > ++ (sizeof(HASHBUCKET) * hctl->ssize * nsegs)); >> > + } >> > + >> >> Why only do this for shared hashtables? Couldn't we allocate the elments >> together with the rest for non-share hashtables too? >> > > I have now made the changes uniformly across shared and non-shared hash tables, making the code fix look cleaner. I hope this aligns with your suggestions. Please find attached updated and rebased versions of both patches. Kindly let me know your views. Thank you, Rahila Syed v3-0001-Account-for-initial-shared-memory-allocated-by-hash_.patch Description: Binary data v3-0002-Replace-ShmemAlloc-calls-by-ShmemInitStruct.patch Description: Binary data
Re: Proposal: Limitations of palloc inside checkpointer
On Wed, 12 Mar 2025 at 10:27, Xuneng Zhou wrote: > Hi, > The patch itself looks ok to me. I'm curious about the trade-offs between > this incremental approach and the alternative of > using palloc_extended() with the MCXT_ALLOC_HUGE flag. The approach of > splitting the requests into fixed-size slices avoids OOM failures or > process termination by the OOM killer, which is good. However, it does add > some overhead with additional lock acquisition/release cycles and memory > movement operations via memmove(). The natural question is whether the > security justify the cost. Regarding the slice size of 1 GB, is this > derived from MaxAllocSize limit, or was it chosen for other performance > reasons? whether a different size might offer better performance under > typical workloads? > I think 1 GB is derived purely from MaxAllocSize. This "palloc" is a relatively old one, and no one expected the number of requests to exceed 1 GB. Now we have the ability to set the shared_buffers to a huge number (without discussing now whether this makes any real sense), thus this limit for palloc becomes a problem. -- Best regards, Maxim Orlov.
Re: INSERT ... ON CONFLICT DO SELECT [FOR ...] take 2
> On 3/4/25 10:24 AM, Andreas Karlsson wrote: > Rebased the patch to add support for OLD.* and NEW.*. create table t (key int primary key, val text); insert into t values (1, 'old'); insert into t values (1, 'new') on conflict (key) do select for update returning old.*, new.*; key | val | key | val -+-+-+- 1 | old | | (1 row) IMO this should cause new.* be the same as old.*. This is kind-of like a "do update" that changes nothing, with the end result being that old and new are the same, because nothing was changed. Currently, the only command that can cause new to be NULL is a DELETE, because the new state of the table is that the row no longer exists, which isn't the case here. On Mon, 10 Mar 2025 at 13:11, Kirill Reshke wrote: > > On Mon, 10 Mar 2025 at 18:05, Kirill Reshke wrote: > > > > 1) Should we answer INSERT 0 10 here? > > Sorry, i mean: INSERT 0 0 Hmm, I would say that the patch is correct -- the count should be the number of rows inserted, updated or selected for return (and the "Outputs" section of the doc page for INSERT should be updated to say that). That way, the count always matches the number of rows returned when there's a RETURNING clause, which I think is true of all other DML commands. Regards, Dean
Re: Implement waiting for wal lsn replay: reloaded
10.03.2025 14:30, Alexander Korotkov пишет: > On Fri, Feb 28, 2025 at 3:55 PM Yura Sokolov wrote: >> 28.02.2025 16:03, Yura Sokolov пишет: >>> 17.02.2025 00:27, Alexander Korotkov wrote: On Thu, Feb 6, 2025 at 10:31 AM Yura Sokolov wrote: > I briefly looked into patch and have couple of minor remarks: > > 1. I don't like `palloc` in the `WaitLSNWakeup`. I believe it wont issue > problems, but still don't like it. I'd prefer to see local fixed array, > say > of 16 elements, and loop around remaining function body acting in batch of > 16 wakeups. Doubtfully there will be more than 16 waiting clients often, > and even then it wont be much heavier than fetching all at once. OK, I've refactored this to use static array of 16 size. palloc() is used only if we don't fit static array. >>> >>> I've rebased patch and: >>> - fixed compiler warning in wait.c ("maybe uninitialized 'result'"). >>> - made a loop without call to palloc in WaitLSNWakeup. It is with "goto" to >>> keep indentation, perhaps `do {} while` would be better? >> >> And fixed: >>'WAIT' is marked as BARE_LABEL in kwlist.h, but it is missing from >> gram.y's bare_label_keyword rule > > Thank you, Yura. I've further revised the patch. Mostly added the > documentation including SQL command reference and few paragraphs in > the high availability chapter explaining the read-your-writes > consistency concept. Good day, Alexander. Looking "for the last time" to the patch I found there remains `pg_wal_replay_wait` function in documentation and one comment. So I fixed it in documentation, and removed sentence from comment. Otherwise v6 is just rebased v5. --- regards Yura Sokolov aka funny-falconFrom 80b4cb8c0ac75168ab1fce55feccc4f08f32ce34 Mon Sep 17 00:00:00 2001 From: Alexander Korotkov Date: Mon, 10 Mar 2025 12:59:38 +0200 Subject: [PATCH v6] Implement WAIT FOR command WAIT FOR is to be used on standby and specifies waiting for the specific WAL location to be replayed. This option is useful when the user makes some data changes on primary and needs a guarantee to see these changes are on standby. The queue of waiters is stored in the shared memory as an LSN-ordered pairing heap, where the waiter with the nearest LSN stays on the top. During the replay of WAL, waiters whose LSNs have already been replayed are deleted from the shared memory pairing heap and woken up by setting their latches. WAIT FOR needs to wait without any snapshot held. Otherwise, the snapshot could prevent the replay of WAL records, implying a kind of self-deadlock. This is why separate utility command seems appears to be the most robust way to implement this functionality. It's not possible to implement this as a function. Previous experience shows that stored procedures also have limitation in this aspect. Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru Author: Kartyshov Ivan Author: Alexander Korotkov Reviewed-by: Michael Paquier Reviewed-by: Peter Eisentraut Reviewed-by: Dilip Kumar Reviewed-by: Amit Kapila Reviewed-by: Alexander Lakhin Reviewed-by: Bharath Rupireddy Reviewed-by: Euler Taveira Reviewed-by: Heikki Linnakangas Reviewed-by: Kyotaro Horiguchi Reviewed-by: Yura Sokolov --- doc/src/sgml/high-availability.sgml | 54 +++ doc/src/sgml/ref/allfiles.sgml| 1 + doc/src/sgml/ref/wait_for.sgml| 216 +++ doc/src/sgml/reference.sgml | 1 + src/backend/access/transam/Makefile | 3 +- src/backend/access/transam/meson.build| 1 + src/backend/access/transam/xact.c | 6 + src/backend/access/transam/xlog.c | 7 + src/backend/access/transam/xlogrecovery.c | 11 + src/backend/access/transam/xlogwait.c | 347 ++ src/backend/commands/Makefile | 3 +- src/backend/commands/meson.build | 1 + src/backend/commands/wait.c | 184 ++ src/backend/lib/pairingheap.c | 18 +- src/backend/parser/gram.y | 15 +- src/backend/storage/ipc/ipci.c| 3 + src/backend/storage/lmgr/proc.c | 6 + src/backend/tcop/pquery.c | 12 +- src/backend/tcop/utility.c| 22 ++ .../utils/activity/wait_event_names.txt | 2 + src/include/access/xlogwait.h | 89 + src/include/commands/wait.h | 21 ++ src/include/lib/pairingheap.h | 3 + src/include/nodes/parsenodes.h| 7 + src/include/parser/kwlist.h | 1 + src/include/storage/lwlocklist.h | 1 + src/include/tcop/cmdtaglist.h | 1 + src/test/recovery/meson.build | 1 + src/test/recovery/t/045_wait_for_lsn.pl | 217 +++ src/tools/pgindent/typedefs.list
Re: dblink: Add SCRAM pass-through authentication
On Mon, Mar 10, 2025 at 11:25 AM Jacob Champion wrote: > > On Fri, Mar 7, 2025 at 8:22 AM Peter Eisentraut wrote: > > Right. How about the attached? It checks as an alternative to a > > password whether the SCRAM keys were provided. That should get us back > > to the same level of checking? > > Yes, I think so. Attached is a set of tests to illustrate, mirroring > the dblink tests added upthread; they fail without this patch. In an offline discussion with Peter and Matheus, we figured out that this is still not enough. The latest patch checks that a password was used, but it doesn't ensure that the password material came from the SCRAM keys. Attached is an updated test to illustrate. Thanks, --Jacob commit 4a41754eaa41f2db285e68ff8140d6932c299358 Author: Jacob Champion Date: Mon Mar 10 11:18:27 2025 -0700 WIP diff --git a/contrib/postgres_fdw/t/001_auth_scram.pl b/contrib/postgres_fdw/t/001_auth_scram.pl index 047840cc914..464492948b4 100644 --- a/contrib/postgres_fdw/t/001_auth_scram.pl +++ b/contrib/postgres_fdw/t/001_auth_scram.pl @@ -68,6 +68,45 @@ test_fdw_auth($node1, $db0, "t2", $fdw_server2, test_auth($node2, $db2, "t2", "SCRAM auth directly on foreign server should still succeed"); +# Ensure that trust connections fail without superuser opt-in. +unlink($node1->data_dir . '/pg_hba.conf'); +unlink($node2->data_dir . '/pg_hba.conf'); + +$node1->append_conf( + 'pg_hba.conf', qq{ +local db0 all scram-sha-256 +local db1 all trust +}); +$node2->append_conf( + 'pg_hba.conf', qq{ +local all all password +}); + +$node1->restart; +$node2->restart; + +my ($ret, $stdout, $stderr) = $node1->psql( + $db0, + "SELECT count(1) FROM t", + connstr => $node1->connstr($db0) . " user=$user"); + +is($ret, 3, 'loopback trust fails on the same cluster'); +like( + $stderr, + qr/password or GSSAPI delegated credentials required/, + 'expected error from loopback trust (same cluster)'); + +($ret, $stdout, $stderr) = $node1->psql( + $db0, + "SELECT count(1) FROM t2", + connstr => $node1->connstr($db0) . " user=$user"); + +is($ret, 3, 'loopback password fails on a different cluster'); +like( + $stderr, + qr/password or GSSAPI delegated credentials required/, + 'expected error from loopback password (different cluster)'); + # Helper functions sub test_auth
remove open-coded popcount in acl.c
There's a count_one_bits() function in acl.c that can be replaced with a call to pg_popcount64(). This isn't performance-critical code, but IMHO we might as well use the centralized implementation. -- nathan >From 932fac13bf168571b145a54c29d9ac28ca2a070f Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 12 Mar 2025 10:45:12 -0500 Subject: [PATCH v1 1/1] Remove open-coded popcount in acl.c. --- src/backend/utils/adt/acl.c | 20 +--- 1 file changed, 1 insertion(+), 19 deletions(-) diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c index 6a76550a5e2..ba14713fef2 100644 --- a/src/backend/utils/adt/acl.c +++ b/src/backend/utils/adt/acl.c @@ -5432,24 +5432,6 @@ select_best_admin(Oid member, Oid role) return admin_role; } - -/* does what it says ... */ -static int -count_one_bits(AclMode mask) -{ - int nbits = 0; - - /* this code relies on AclMode being an unsigned type */ - while (mask) - { - if (mask & 1) - nbits++; - mask >>= 1; - } - return nbits; -} - - /* * Select the effective grantor ID for a GRANT or REVOKE operation. * @@ -5532,7 +5514,7 @@ select_best_grantor(Oid roleId, AclMode privileges, */ if (otherprivs != ACL_NO_RIGHTS) { - int nnewrights = count_one_bits(otherprivs); + int nnewrights = pg_popcount64(otherprivs); if (nnewrights > nrights) { -- 2.39.5 (Apple Git-154)
Re: Orphaned users in PG16 and above can only be managed by Superusers
On Wed, Mar 12, 2025 at 12:10:30PM +0530, Ashutosh Sharma wrote: > I think moving the check to the second pass won´t work in this case. > The reason is that we rely on entries in the pg_auth_members table. By > the time the check occurs in the second pass, the first pass will have > already removed all entries related to the roles being dropped from > pg_auth_members and incremented the command counter. As a result, when > check_drop_role_dependency() is called in the second pass, it won´t > find any entries in the table and won't be able to detect any ADMIN > privilege-related dependencies. Oh, good catch. > So, we need to explore alternative approaches to handle this better. > I´ll spend some more time today to investigate other possibilities for > addressing this issue. I think this approach has other problems. For example, even if a role has admin directly on the dropped role, we'll block DROP ROLE if it also has admin indirectly: postgres=# create role a createrole; CREATE ROLE postgres=# set role a; SET postgres=> create role b createrole; CREATE ROLE postgres=> set role b; SET postgres=> create role c admin a; CREATE ROLE postgres=> reset role; RESET postgres=# drop role b; ERROR: role "b" cannot be dropped because some objects depend on it DETAIL: role a inherits ADMIN privileges on role c through role b There are also other ways besides DROP ROLE that roles can end up without any admins: postgres=# create role a createrole; CREATE ROLE postgres=# set role a; SET postgres=> create role b createrole; CREATE ROLE postgres=> set role b; SET postgres=> create role c; CREATE ROLE postgres=> reset role; RESET postgres=# revoke c from b; REVOKE ROLE I wonder if we should adjust the requirements here to be "an operation cannot result in a role with 0 admins." That does mean you might lose indirect admin privileges, but I'm not sure that's sustainable, anyway. For example, if a role has 2 admins, should we block REVOKE from one of the admins because another role inherited its privileges? If nothing else, it sounds like a complicated user experience. If we just want to make sure that roles don't end up without admins, I think we could just collect the set of roles losing an admin during DROP ROLE, REVOKE, etc., and then once all removals have completed, we verify that each of the collected roles has an admin. -- nathan
Re: Elimination of the repetitive code at the SLRU bootstrap functions.
> On 12 Mar 2025, at 20:02, Evgeny Voropaev wrote: > v6 looks good to me. I'll flip the CF entry. Thanks! Best regards, Andrey Borodin.
Re: Log connection establishment timings
On Tue, Mar 11, 2025 at 6:27 PM Melanie Plageman wrote: > > I did more manual testing of my patches, and I think they are mostly > ready for commit except for the IsConnectionBackend() macro (if we > have something to change it to). I've committed this and marked it as such in the CF app. Thanks to everyone for the review. - Melanie
Re: Introduce "log_connection_stages" setting.
On Mon, Mar 3, 2025 at 6:43 PM Melanie Plageman wrote: > > Okay, I got cold feet today working on this. I actually think we > probably want some kind of guc set (set like union/intersection/etc > not set like assign a value) infrastructure for gucs that can be equal > to any combination of predetermined values. I ended up with a patch set that I was happy with which used the list of string GUCs. I committed it in 9219093cab26. It changes log_connections to a list of strings for the different aspects of connection setup and supports the most common inputs for the boolean version of log_connections for backwards compatibility. I did not end up deprecating log_disconnections since it is rather late in the cycle and that change could use more visibility. I suggest proposing that at the beginning of the next release. Thanks for your interest in this topic. - Melanie
Re: Index AM API cleanup
On Wed, Mar 12, 2025 at 7:00 AM Peter Eisentraut wrote: > And another small patch set extracted from the bigger one, to keep > things moving along: > > 0001: Add get_opfamily_method() in lsyscache.c, from your patch set. > Right, this came from v21-0006-*, with a slight code comment change and one variable renaming. It is ready for commit. 0002: Add get_opfamily_member_for_cmptype(). This was called > get_opmethod_member() in your patch set, but I think that name wasn't > quite right. I also removed the opmethod argument, which was rarely > used and is somewhat redundant. > This is also taken from v21-0006-*. The reason I had an opmethod argument was that some of the callers of this function already know the opmethod, and without the argument, this function has to look up the opmethod from the syscache again. Whether that makes a measurable performance difference is an open question. Your version is ready for commit. If we want to reintroduce the opmethod argument for performance reasons, we can always circle back to that in a later commit. 0003 and 0004 are enabling non-btree unique indexes for partition keys > You refactored v21-0011-* into v21.2-0003-*, in which an error gets raised about a missing operator in a slightly different part of the logic. I am concerned that the new positioning of the check-and-error might allow the flow of execution to reach the Assert(idx_eqop) statement in situations where the user has defined an incomplete opfamily or opclass. Such a condition should raise an error about the missing operator rather than asserting. In particular, looking at the control logic: if (stmt->unique && !stmt->iswithoutoverlaps) { } else if (exclusion) ; Assert(idx_eqop); I cannot prove to myself that the assertion cannot trigger, because the upstream logic before we reach this point *might* be filtering out all cases where this could be a problem, but it is too complicated to prove. Even if it is impossible now, this is a pretty brittle piece of code after applying the patch. Any chance you'd like to keep the patch closer to how I had it in v21-0011-* ? > and materialized views. These were v21-0011 and v21-0012, except that > I'm combining the switch from strategies to compare types (which was in > v21-0006 or so) with the removal of the btree requirements. > v21.2-0004-* is ready for commit. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Test to dump and restore objects left behind by regression
On Wed, Mar 12, 2025 at 5:35 PM Alvaro Herrera wrote: > > Hello > > When running these tests, I encounter this strange diff in the dumps, > which seems to be that the locale for type money does not match. I > imagine the problem is that the locale is not set correctly when > initdb'ing one of them? Grepping the regress_log for initdb, I see > this: > > $ grep -B1 'Running: initdb' tmp_check/log/regress_log_002_pg_upgrade > [13:00:57.580](0.003s) # initializing database system by running initdb > # Running: initdb -D > /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/t_002_pg_upgrade_old_node_data/pgdata > -A trust -N --wal-segsize 1 --allow-group-access --encoding UTF-8 > --lc-collate C --lc-ctype C --locale-provider builtin --builtin-locale > C.UTF-8 -k > -- > [13:01:12.879](0.044s) # initializing database system by running initdb > # Running: initdb -D > /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/t_002_pg_upgrade_dst_node_data/pgdata > -A trust -N --wal-segsize 1 --allow-group-access --encoding UTF-8 > --lc-collate C --lc-ctype C --locale-provider builtin --builtin-locale > C.UTF-8 -k > -- > [13:01:28.000](0.033s) # initializing database system by running initdb > # Running: initdb -D > /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/t_002_pg_upgrade_new_node_data/pgdata > -A trust -N --wal-segsize 1 --allow-group-access --encoding SQL_ASCII > --locale-provider libc > The original node and the node where dump is restored have the same initdb commands. It's the upgraded node which has different initdb command. But that's how the test is written originally. > > > [12:50:31.838](0.102s) not ok 15 - dump outputs from original and restored > regression database (using plain format) match > [12:50:31.839](0.000s) > [12:50:31.839](0.000s) # Failed test 'dump outputs from original and > restored regression database (using plain format) match' > # at /pgsql/source/master/src/test/perl/PostgreSQL/Test/Utils.pm line 797. > [12:50:31.839](0.000s) # got: '1' > # expected: '0' > === diff of > /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/tmp_test_vVew/src_dump.sql_adjusted > and > /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/tmp_test_vVew/dest_dump.plain.sql_adjusted > === stdout === > --- > /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/tmp_test_vVew/src_dump.sql_adjusted > 2025-03-12 12:50:27.674918597 +0100 > +++ > /home/alvherre/Code/pgsql-build/master/src/bin/pg_upgrade/tmp_check/tmp_test_vVew/dest_dump.plain.sql_adjusted > 2025-03-12 12:50:31.778840338 +0100 > @@ -208972,7 +208972,7 @@ > -- Data for Name: money_data; Type: TABLE DATA; Schema: public; Owner: > alvherre > -- > COPY public.money_data (m) FROM stdin; > -$123.46 > +$ 12.346,00 > \. > -- > -- Data for Name: mvtest_t; Type: TABLE DATA; Schema: public; Owner: alvherre > @@ -376231,7 +376231,7 @@ > -- Data for Name: tab_core_types; Type: TABLE DATA; Schema: public; Owner: > alvherre > -- > COPY public.tab_core_types (point, line, lseg, box, openedpath, closedpath, > polygon, circle, date, "time", "timestamp", timetz, timestamptz, "interval", > "json", jsonb, jsonpath, inet, cidr, macaddr8, macaddr, int2, int4, int8, > float4, float8, pi, "char", bpchar, "varchar", name, text, bool, bytea, > "bit", varbit, money, refcursor, int2vector, oidvector, aclitem, tsvector, > tsquery, uuid, xid8, regclass, type, regrole, oid, tid, xid, cid, > txid_snapshot, pg_snapshot, pg_lsn, cardinal_number, character_data, > sql_identifier, time_stamp, yes_or_no, int4range, int4multirange, int8range, > int8multirange, numrange, nummultirange, daterange, datemultirange, tsrange, > tsmultirange, tstzrange, tstzmultirange) FROM stdin; > -(11,12){1,-1,0}[(11,11),(12,12)] (13,13),(11,11) > ((11,12),(13,13),(14,14)) [(11,12),(13,13),(14,14)] > ((11,12),(13,13),(14,14)) <(1,1),1> 2025-03-12 > 04:50:14.125899 2025-03-12 04:50:14.125899 04:50:14.125899-07 > 2025-03-12 12:50:14.125899+01 00:00:12{"reason":"because"} > {"when": "now"} $."a"[*]?(@ > 2)127.0.0.1 127.0.0.0/8 > 00:01:03:ff:fe:86:1c:ba 00:01:03:86:1c:ba 2 4 8 4 > 8 3.14159265358979f c abc nametxt t > \\xdeadbeef 1 10001 $12.34 abc 1 2 1 2 > alvherre=UC/alvherre'a' 'and' 'ate' 'cat' 'fat' 'mat' 'on' 'rat' 'sat' > 'fat' & 'rat' a0eebc99-9c0b-4ef8-bb6d-6bb9bd380a1111 pg_class > regtype pg_monitor 1259(1,1) 2 3 10:20:10,14,15 > 10:20:10,14,15 16/B374D848 1 l n 2025-03-12 > 12:50:14.13+01 YES empty {} empty {} (3,4) {(3,4)} > [2020-01-03,2021-02-03) {[2020-01-03,2021-02-03)} ("2020-01-02 > 03:04:05","2021-02-03 06:07:08") {("
Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints
On Wed, Mar 12, 2025 at 3:20 PM Alvaro Herrera wrote: > On 2025-Mar-12, Ashutosh Bapat wrote: > > > If the test passes for you, can you please try the patches at [1] on > > top of your patches? Please apply those, set and export environment > > variable PG_TEST_EXTRA=regress_dump_test, and run 002_pg_upgrade test? > > I intended to do this but can not do it since the test always fails > > with your patches applied. > > Oh, I need to enable a PG_TEST_EXTRA option in order for the test to > run? FFS. That explains why the tests passed just fine for me. > I'll re-run. > > -- > Álvaro HerreraBreisgau, Deutschland — > https://www.EnterpriseDB.com/ > "La conclusión que podemos sacar de esos estudios es que > no podemos sacar ninguna conclusión de ellos" (Tanenbaum) > I can reproduce the issue and will be sending the patch soon. Thanks Alvaro, Ashutosh. Rushabh Lathia
Re: Elimination of the repetitive code at the SLRU bootstrap functions.
Hello Hackers! Andrey, thank you for your review and remarks. > Patch adds whitespace errors Cleaned. > if (writePage != 0) should be if (writePage) Done. > XLogSimpleInsert(int64 simpledata, RmgrId rmid, uint8 info) > I’d rename function XLogSimpleInsert() to something more descriptive > and changed arguments order from generic to specific. The function has been renamed and the parameters have been reordered. Now we have: XLogInsertInt64(RmgrId rmid, uint8 info, int64 simpledata) > Probably, committer has broader view on XLog routines and can decide > if this function would better belong to SLRU than common XLog stuff. In accordance with Álvaro's proposal, we want to enclose this function in the "xloginsert.c" module. Best regards, Evgeny Voropaev.From 8c9aa484dc96cdf23c7fa524d88d67ce3c4cc6fc Mon Sep 17 00:00:00 2001 From: Evgeny Voropaev Date: Wed, 12 Mar 2025 22:35:19 +0800 Subject: [PATCH v6] Elimination of the repetitive code at the SLRU bootstrapping and nullifying functions. MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The functions bootstrapping and nullifying SLRU pages used to have a lot of repetitive code. New functions realized by this commit (BootStrapSlruPage, XLogInsertInt64) have allowed to remove these repetitions. Author: Evgeny Voropaev Reviewed by: Álvaro Herrera Reviewed by: Andrey Borodin Reviewed by: Aleksander Alekseev Discussion: https://www.postgresql.org/message-id/flat/97820ce8-a1cd-407f-a02b-47368fadb14b%40tantorlabs.com --- src/backend/access/transam/clog.c | 72 ++- src/backend/access/transam/commit_ts.c | 68 ++- src/backend/access/transam/multixact.c | 155 ++-- src/backend/access/transam/slru.c | 45 ++- src/backend/access/transam/subtrans.c | 48 ++-- src/backend/access/transam/xloginsert.c | 14 +++ src/include/access/slru.h | 2 + src/include/access/xloginsert.h | 1 + 8 files changed, 137 insertions(+), 268 deletions(-) diff --git a/src/backend/access/transam/clog.c b/src/backend/access/transam/clog.c index 48f10bec91e..aca8c31d680 100644 --- a/src/backend/access/transam/clog.c +++ b/src/backend/access/transam/clog.c @@ -110,7 +110,6 @@ static SlruCtlData XactCtlData; #define XactCtl (&XactCtlData) -static int ZeroCLOGPage(int64 pageno, bool writeXlog); static bool CLOGPagePrecedes(int64 page1, int64 page2); static void WriteZeroPageXlogRec(int64 pageno); static void WriteTruncateXlogRec(int64 pageno, TransactionId oldestXact, @@ -832,41 +831,11 @@ check_transaction_buffers(int *newval, void **extra, GucSource source) void BootStrapCLOG(void) { - int slotno; - LWLock *lock = SimpleLruGetBankLock(XactCtl, 0); - - LWLockAcquire(lock, LW_EXCLUSIVE); - - /* Create and zero the first page of the commit log */ - slotno = ZeroCLOGPage(0, false); - - /* Make sure it's written out */ - SimpleLruWritePage(XactCtl, slotno); - Assert(!XactCtl->shared->page_dirty[slotno]); - - LWLockRelease(lock); -} - -/* - * Initialize (or reinitialize) a page of CLOG to zeroes. - * If writeXlog is true, also emit an XLOG record saying we did this. - * - * The page is not actually written, just set up in shared memory. - * The slot number of the new page is returned. - * - * Control lock must be held at entry, and will be held at exit. - */ -static int -ZeroCLOGPage(int64 pageno, bool writeXlog) -{ - int slotno; - - slotno = SimpleLruZeroPage(XactCtl, pageno); - - if (writeXlog) - WriteZeroPageXlogRec(pageno); - - return slotno; + /* + * Nullify the page (pageno = 0), don't insert an XLog record, + * save the page on a disk + */ + BootStrapSlruPage(XactCtl, 0, 0, true); } /* @@ -959,7 +928,6 @@ void ExtendCLOG(TransactionId newestXact) { int64 pageno; - LWLock *lock; /* * No work except at first XID of a page. But beware: just after @@ -970,14 +938,12 @@ ExtendCLOG(TransactionId newestXact) return; pageno = TransactionIdToPage(newestXact); - lock = SimpleLruGetBankLock(XactCtl, pageno); - LWLockAcquire(lock, LW_EXCLUSIVE); - - /* Zero the page and make an XLOG entry about it */ - ZeroCLOGPage(pageno, true); - - LWLockRelease(lock); + /* + * Zero the page, make an XLOG entry about it, + * don't write the page on a disk + */ + BootStrapSlruPage(XactCtl, pageno, WriteZeroPageXlogRec, false); } @@ -1073,9 +1039,7 @@ CLOGPagePrecedes(int64 page1, int64 page2) static void WriteZeroPageXlogRec(int64 pageno) { - XLogBeginInsert(); - XLogRegisterData(&pageno, sizeof(pageno)); - (void) XLogInsert(RM_CLOG_ID, CLOG_ZEROPAGE); + XLogInsertInt64(RM_CLOG_ID, CLOG_ZEROPAGE, pageno); } /* @@ -1114,19 +1078,9 @@ clog_redo(XLogReaderState *record) if (info == CLOG_ZEROPAGE) { int64 pageno; - int slotno; - LWLock *lock; - memcpy(&pageno, XLogRecGetData(record), sizeof(pageno)); - - lock = SimpleLruGetBankLock(XactCtl, pageno); - LWLockAcquire(lock, LW_EXCLUSIVE); - - sl
Re: Proposal: Progressive explain
On Fri, Mar 7, 2025, at 5:34 PM, Rafael Thofehrn Castro wrote: > Did additional benchmarks and found issues with the patch that doesn't do > execProcNode > wrapping. There are sporadic crashes with **double free or corruption (top)** > > So making the patch that uses the wrapper the current one. Again, giving the > credits to > torikoshia as being the owner of that section of the code. > Rafael, thanks for working on it. It is a step forward in observability. I started with some performance tests and the last improvements seem to fix the overhead imposed in the initial patch version. I didn't notice any of these new function in the perf report while executing fast queries. I found a crash. It is simple to reproduce. Session A: select * from pg_stat_progress_explain; \watch 2 Session B: explain select pg_sleep(30); \watch 2 8<8< Backtrace: Core was generated by `postgres: euler postgres [lo'. Program terminated with signal SIGSEGV, Segmentation fault. #0 WrapExecProcNodeWithExplain (ps=0x7f7f7f7f7f7f7f7f) at explain.c:5401 5401 if (ps->ExecProcNodeOriginal != NULL) #0 WrapExecProcNodeWithExplain (ps=0x7f7f7f7f7f7f7f7f) at explain.c:5401 #1 0x5624173829aa in handle_sig_alarm (postgres_signal_arg=) at timeout.c:414 #2 0x5624173ba02c in wrapper_handler (postgres_signal_arg=14) at pqsignal.c:110 #3 #4 0x7f20fa529e63 in epoll_wait (epfd=6, events=0x56244ef37e58, maxevents=1, timeout=-1) at ../sysdeps/unix/sysv/linux/epoll_wait.c:30 #5 0x5624171fb02f in WaitEventSetWaitBlock (nevents=1, occurred_events=0x7ffdd9e62080, cur_timeout=-1, set=0x56244ef37dd8) at waiteventset.c:1190 #6 WaitEventSetWait (set=0x56244ef37dd8, timeout=timeout@entry=-1, occurred_events=occurred_events@entry=0x7ffdd9e62080, nevents=nevents@entry=1, wait_event_info=wait_event_info@entry=100663296) at waiteventset.c:1138 #7 0x56241709513c in secure_read (port=0x56244eeb90e0, ptr=0x56241775a9a0 , len=8192) at be-secure.c:218 #8 0x56241709bf2e in pq_recvbuf () at pqcomm.c:924 #9 0x56241709ceb5 in pq_getbyte () at pqcomm.c:970 #10 0x56241721b617 in SocketBackend (inBuf=0x7ffdd9e622a0) at postgres.c:361 #11 ReadCommand (inBuf=0x7ffdd9e622a0) at postgres.c:484 #12 PostgresMain (dbname=, username=) at postgres.c:4625 #13 0x5624172167ed in BackendMain (startup_data=, startup_data_len=) at backend_startup.c:107 #14 0x56241717519b in postmaster_child_launch (child_type=, child_slot=2, startup_data=startup_data@entry=0x7ffdd9e6253c, startup_data_len=startup_data_len@entry=4, client_sock=client_sock@entry=0x7ffdd9e62540) at launch_backend.c:274 #15 0x562417178c32 in BackendStartup (client_sock=0x7ffdd9e62540) at postmaster.c:3519 #16 ServerLoop () at postmaster.c:1688 #17 0x56241717a6da in PostmasterMain (argc=argc@entry=1, argv=argv@entry=0x56244eeb81b0) at postmaster.c:1386 #18 0x562416e64f9a in main (argc=1, argv=0x56244eeb81b0) at main.c:230 8<8< You call this feature "progressive explain". My first impression is that it will only provide the execution plans for EXPLAIN commands. Instead of "progressive explain", I would suggest "query progress" that is a general database terminology. It seems natural to use "progressive explain" since you are using the explain infrastructure (including the same options -- format, settings, wal, ...) -- to print the execution plan. +CREATE VIEW pg_stat_progress_explain AS +SELECT +* +FROM pg_stat_progress_explain(true); + There is no use for the function argument. If you decide to keep this function, remove it. Why don't you use the pgstat_progress_XXX() API? Since you are using a pg_stat_progress_XXX view name I would expect using the command progress reporting infrastructure (see backend_progress.c). Maybe you could include datid and datname as the other progress reporting views. It would avoid a join to figure out what the database is. +static const struct config_enum_entry explain_format_options[] = { + {"text", EXPLAIN_FORMAT_TEXT, false}, + {"xml", EXPLAIN_FORMAT_XML, false}, + {"json", EXPLAIN_FORMAT_JSON, false}, + {"yaml", EXPLAIN_FORMAT_YAML, false}, + {NULL, 0, false} +}; Isn't it the same definition as in auto_explain.c? Use only one definition for auto_explain and this feature. You can put this struct into explain.c, use an extern definition for guc_tables.c and put a extern PGDLLIMPORT defintion into guc.h. See wal_level_options, for an example. +static const struct config_enum_entry progressive_explain_options[] = { + {"off", PROGRESSIVE_EXPLAIN_NONE, false}, + {"explain", PROGRESSIVE_EXPLAIN_EXPLAIN, false}, + {"analyze", PROGRESSIVE_EXPLAIN_ANALYZE, false}, + {"false", PROGRESSIVE_EXPLAIN_NONE, true}, + {NULL, 0, false} +}; The "analyze" is a separate option in auto_explain. Should we have 2 options? One that enable/disable
Re: Index AM API cleanup
Peter Eisentraut writes: > 0002: Add get_opfamily_member_for_cmptype(). This was called > get_opmethod_member() in your patch set, but I think that name wasn't > quite right. I also removed the opmethod argument, which was rarely > used and is somewhat redundant. Hm, that will throw an error if IndexAmTranslateCompareType fails. Shouldn't it be made to return InvalidOid instead? regards, tom lane
Re: Optimizing FastPathTransferRelationLocks()
On Tue, Mar 11, 2025 at 8:46 PM Fujii Masao wrote: > > > > On 2025/03/11 20:55, Ashutosh Bapat wrote: > > Hi Fujii-san, > > > > It seems that this was forgotten somehow. > > > > The patch still applies. > > > > Examining c4d5cb71d229095a39fda1121a75ee40e6069a2a, it seems that this patch > > could have been part of that commit as well. But may be it wasn't so > > apparent > > that time. I think it's a good improvement. > > Thanks for reviewing the patch! > > > > On Tue, Nov 19, 2024 at 10:14 PM Fujii Masao > > wrote: > >> > >> The latest version of the patch is attached. > > > > @@ -2773,6 +2773,10 @@ FastPathTransferRelationLocks(LockMethod > > lockMethodTable, const LOCKTAG *locktag > > LWLock *partitionLock = LockHashPartitionLock(hashcode); > > Oid relid = locktag->locktag_field2; > > uint32 i; > > + uint32 group; > > + > > + /* fast-path group the lock belongs to */ > > + group = FAST_PATH_REL_GROUP(relid); > > > > We could just combine variable declaration and initialization; similar > > to partitionLock. > > I’ve updated the patch as suggested. Updated patch is attached. > Thanks. > > > The performance gain from this patch might be tiny and may not be > > visible. Still, have you tried to measure the performance improvement? > > I haven’t measured the actual performance gain since the patch optimizes > the logic in a clear and logical way. While we might see some improvement > in artificial scenarios — like with a large max_connections and > all backends slots having their databaseIds set, I’m not sure > how meaningful that would be. Fair enough. The code is more readable this way. That itself is an improvement. I stared at c4d5cb71d229095a39fda1121a75ee40e6069a2a to see whether there's any reason this was left aside at that time. I am convinced that it was just missed. I think this patch is good to be committed. -- Best Wishes, Ashutosh Bapat
Re: Draft for basic NUMA observability
On Mon, Mar 10, 2025 at 11:14 AM Bertrand Drouvot wrote: > Thanks for the new version! v10 is attached with most fixes after review and one new thing introduced: pg_numa_available() for run-time decision inside tests which was needed after simplifying code a little bit as you wanted. I've also fixed -Dlibnuma=disabled as it was not properly implemented. There are couple open items (minor/decision things), but most is fixed or answered: > Some random comments on 0001: > > === 1 > > It does not compiles "alone". It's missing: > [..] > +extern PGDLLIMPORT int huge_pages_status; [..] > -static int huge_pages_status = HUGE_PAGES_UNKNOWN; > +inthuge_pages_status = HUGE_PAGES_UNKNOWN; > > That come with 0002. So it has to be in 0001 instead. Ugh, good catch, I haven't thought about it in isolation, they are separate to just ease review, but should be committed together. Fixed. > === 2 > > +else > + as_fn_error $? "header file is required for --with-libnuma" > "$LINENO" 5 > +fi > > Maybe we should add the same test (checking for numa.h) for meson? TBH, I have no idea, libnuma.h may exist but it may not link e.g. when cross-compiling 32-bit on 64-bit. Or is this more about keeping sync between meson and autoconf? > === 3 > > +# FIXME: filter-out / with/without with_libnuma? > +LIBS += $(LIBNUMA_LIBS) > > It looks to me that we can remove those 2 lines. Done. > === 4 > > + Only supported on Linux. > > s/on Linux/on platforms for which the libnuma library is implemented/? > > I did a quick grep on "Only supported on" and it looks like that could be > a more consistent wording. Fixed. > === 5 > > +#include "c.h" > +#include "postgres.h" > +#include "port/pg_numa.h" > +#include "storage/pg_shmem.h" > +#include > > I had a closer look to other header files and it looks like it "should" be: > > #include "c.h" > #include "postgres.h" > > #include > #ifdef WIN32 > #include > #endif > > #include "port/pg_numa.h" > #include "storage/pg_shmem.h" > > And is "#include "c.h"" really needed? Fixed both. It seems to compile without c.h. > === 6 > > +/* FIXME not tested, might crash */ > > That's a bit scary. When you are in support for long enough it is becoming the norm ;) But on serious note Andres wanted have numa error/warning handlers (used by libnuma), but current code has no realistic code-path to hit it from numa_available(3), numa_move_pages(3) or numa_max_node(3). The situation won't change until one day in future (I hope!) we start using some more advanced libnuma functionality for interleaving memory, please see my earlier reply: https://www.postgresql.org/message-id/CAKZiRmzpvBtqrz5Jr2DDcfk4Ar1ppsXkUhEX9RpA%2Bs%2B_5hcTOg%40mail.gmail.com E.g. numa_available(3) is tiny wrapper , see https://github.com/numactl/numactl/blob/master/libnuma.c#L872 For now, I've adjusted that FIXME to XXX, but still don't know we could inject failure to see this triggered... > === 7 > > +* XXX: for now we issue just WARNING, but long-term that might > depend on > +* numa_set_strict() here > > s/here/here./ Done. > Did not look carefully at all the comments in 0001, 0002 and 0003 though. > > A few random comments regarding 0002: > > === 8 > > # create extension pg_buffercache; > ERROR: could not load library > "/home/postgres/postgresql/pg_installed/pg18/lib/pg_buffercache.so": > /home/postgres/postgresql/pg_installed/pg18/lib/pg_buffercache.so: undefined > symbol: pg_numa_query_pages > CONTEXT: SQL statement "CREATE FUNCTION pg_buffercache_pages() > RETURNS SETOF RECORD > AS '$libdir/pg_buffercache', 'pg_buffercache_pages' > LANGUAGE C PARALLEL SAFE" > extension script file "pg_buffercache--1.2.sql", near line 7 > > While that's ok if 0003 is applied. I think that each individual patch should > "fully" work individually. STILL OPEN QUESTION: Not sure I understand: you need to have 0001 + 0002 or 0001 + 0003, but here 0002 is complaining about lack of pg_numa_query_pages() which is part of 0001 (?) Should I merge those patches or keep them separate? > === 9 > > + do > + { > + if (os_page_ptrs[blk2page + j] == 0) > > blk2page + j will be repeated multiple times, better to store it in a local > variable instead? Done. > === 10 > > + if (firstUseInBackend == true) > > > if (firstUseInBackend) instead? Done everywhere. > === 11 > > + int j = 0, > + blk2page = (int) i * pages_per_blk; > > I wonder if size_t is more appropriate for blk2page: > > size_t blk2page = (size_t)(i * pages_per_blk) maybe? Sure, done. > === 12 > > as we know that we'll iterate until pages_per_blk then would a for loop be > more > appropriate here, something like? > > " > for (size_t j = 0; j < pages_per_blk; j++) > { > if (os_page_ptrs[blk2page + j] == 0) > { > " Sure. > === 13 > > + if (buf_state & BM_DIRTY) > + fctx->record[i].isdirty = true; > + else > +
Re: Separate GUC for replication origins
On Fri, 7 Mar 2025 at 21:21, Euler Taveira wrote: > > On Fri, Mar 7, 2025, at 11:47 AM, Peter Eisentraut wrote: > > Is that maximum active for the whole system, or maximum active per > session, or maximum active per created origin, or some combination of these? > > > It is a cluster-wide setting. Similar to max_replication_slots. I will make > sure the GUC description is clear about it. It seems the Replication Progress > Tracking chapter needs an update to specify this information too. I reviewed the discussion on this thread and believe we now have an agreement on the design and GUC names. However, the patch still needs updates and extensive testing, especially considering its impact on backward compatibility. I'm unsure if this feature can be committed in the current CommitFest. If you're okay with it, we can move it to the next CommitFest. However, if you prefer to keep it, we can do our best to complete it and make a final decision at the end of the CommitFest. Regards, Vignesh
Re: Question about duplicate JSONTYPE_JSON check
On 2025-Mar-12, Amit Langote wrote: > Patch look good for committing? Ah sorry, I should have said so -- yes, it looks good to me. I feel a slight dislike for using URL-escaped characters in the mailing list link you added, because it means I cannot directly copy/paste the message-id string into my email client program. Not a huge issue for sure, and it seems a majority of links in the source tree are already like that anyway, but this seems an inclusive, safe, welcoming nitpicking space :-) -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/
Re: Changing the state of data checksums in a running cluster
On 3/11/25 14:07, Dagfinn Ilmari Mannsåker wrote: > As the resident perl style pedant, I'd just like to complain about the > below: > > Tomas Vondra writes: > >> diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm >> b/src/test/perl/PostgreSQL/Test/Cluster.pm >> index 666bd2a2d4c..1c66360c16c 100644 >> --- a/src/test/perl/PostgreSQL/Test/Cluster.pm >> +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm >> @@ -3761,7 +3761,8 @@ sub checksum_enable_offline >> my ($self) = @_; >> >> print "### Enabling checksums in \"$self->data_dir\"\n"; >> -PostgreSQL::Test::Utils::system_or_bail('pg_checksums', '-D', >> $self->data_dir, '-e'); >> +PostgreSQL::Test::Utils::system_or_bail('pg_checksums', '-D', >> +$self->data_dir, '-e'); >> return; >> } > > > This breaking between the command line options and its arguments is why > we're switching to using fat commas. We're also using long options for > improved self-documentation, so this should be written as: > > PostgreSQL::Test::Utils::system_or_bail('pg_checksums', > '--pgdata' => $self->data_dir, > '--enable'); > > And likewise below in the disable method. > I don't know what fat comma is, but that's simply what perltidy did. I don't mind formatting it differently, if there's a better way. thanks -- Tomas Vondra
Re: Index AM API cleanup
And another small patch set extracted from the bigger one, to keep things moving along: 0001: Add get_opfamily_method() in lsyscache.c, from your patch set. 0002: Add get_opfamily_member_for_cmptype(). This was called get_opmethod_member() in your patch set, but I think that name wasn't quite right. I also removed the opmethod argument, which was rarely used and is somewhat redundant. 0003 and 0004 are enabling non-btree unique indexes for partition keys and materialized views. These were v21-0011 and v21-0012, except that I'm combining the switch from strategies to compare types (which was in v21-0006 or so) with the removal of the btree requirements. From bca4cfd1b8837ced9b74a04267861cac8003fbc6 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 12 Mar 2025 13:45:39 +0100 Subject: [PATCH v21.2 1/4] Add get_opfamily_method() --- src/backend/utils/cache/lsyscache.c | 22 ++ src/include/utils/lsyscache.h | 1 + 2 files changed, 23 insertions(+) diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c index 80c5a3fcfb7..9f7c0315052 100644 --- a/src/backend/utils/cache/lsyscache.c +++ b/src/backend/utils/cache/lsyscache.c @@ -1288,6 +1288,28 @@ get_opclass_method(Oid opclass) /* -- OPFAMILY CACHE -- */ +/* + * get_opfamily_method + * + * Returns the OID of the index access method the opfamily is for. + */ +Oid +get_opfamily_method(Oid opfid) +{ + HeapTuple tp; + Form_pg_opfamily opfform; + Oid result; + + tp = SearchSysCache1(OPFAMILYOID, ObjectIdGetDatum(opfid)); + if (!HeapTupleIsValid(tp)) + elog(ERROR, "cache lookup failed for operator family %u", opfid); + opfform = (Form_pg_opfamily) GETSTRUCT(tp); + + result = opfform->opfmethod; + ReleaseSysCache(tp); + return result; +} + char * get_opfamily_name(Oid opfid, bool missing_ok) { diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h index 6fab7aa6009..271eac2f0e1 100644 --- a/src/include/utils/lsyscache.h +++ b/src/include/utils/lsyscache.h @@ -108,6 +108,7 @@ extern Oid get_opclass_input_type(Oid opclass); extern bool get_opclass_opfamily_and_input_type(Oid opclass, Oid *opfamily, Oid *opcintype); extern Oid get_opclass_method(Oid opclass); +extern Oid get_opfamily_method(Oid opfid); extern char *get_opfamily_name(Oid opfid, bool missing_ok); extern RegProcedure get_opcode(Oid opno); extern char *get_opname(Oid opno); -- 2.48.1 From d6788595aa64dbb8fe55703cfec1b9368fe3c245 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 12 Mar 2025 14:07:34 +0100 Subject: [PATCH v21.2 2/4] Add get_opfamily_member_for_cmptype() --- src/backend/utils/cache/lsyscache.c | 20 src/include/utils/lsyscache.h | 3 +++ 2 files changed, 23 insertions(+) diff --git a/src/backend/utils/cache/lsyscache.c b/src/backend/utils/cache/lsyscache.c index 9f7c0315052..2151d8d0915 100644 --- a/src/backend/utils/cache/lsyscache.c +++ b/src/backend/utils/cache/lsyscache.c @@ -184,6 +184,26 @@ get_opfamily_member(Oid opfamily, Oid lefttype, Oid righttype, return result; } +/* + * get_opfamily_member_for_cmptype + * Get the OID of the operator that implements the specified comparison + * type with the specified datatypes for the specified opfamily. + * + * Returns InvalidOid if there is no pg_amop entry for the given keys. + */ +Oid +get_opfamily_member_for_cmptype(Oid opfamily, Oid lefttype, Oid righttype, + CompareType cmptype) +{ + Oid opmethod; + StrategyNumber strategy; + + opmethod = get_opfamily_method(opfamily); + strategy = IndexAmTranslateCompareType(cmptype, opmethod, opfamily, false); + + return get_opfamily_member(opfamily, lefttype, righttype, strategy); +} + /* * get_ordering_op_properties * Given the OID of an ordering operator (a btree "<" or ">" operator), diff --git a/src/include/utils/lsyscache.h b/src/include/utils/lsyscache.h index 271eac2f0e1..d42380a0d46 100644 --- a/src/include/utils/lsyscache.h +++ b/src/include/utils/lsyscache.h @@ -14,6 +14,7 @@ #define LSYSCACHE_H #include "access/attnum.h" +#include "access/cmptype.h" #include "access/htup.h" #include "nodes/pg_list.h" @@ -74,6 +75,8 @@ extern void get_op_opfamily_properties(Oid opno, Oid opfamily, bool ordering_op, Oid *righttype); extern Oid get_opfamily_member(Oid opfamily, Oid lefttype, Oid righttype, int16 strategy); +extern Oid get_opfamily_member_fo
Re: AIO v2.5
Hi, On 2025-03-11 19:55:35 -0400, Andres Freund wrote: > On 2025-03-11 12:41:08 -0700, Noah Misch wrote: > > On Mon, Sep 16, 2024 at 01:51:42PM -0400, Andres Freund wrote: > > > On 2024-09-16 07:43:49 -0700, Noah Misch wrote: > > > > For non-sync IO methods, I gather it's essential that a process other > > > > than the > > > > IO definer be scanning for incomplete IOs and completing them. > > > > > > Otherwise, deadlocks like this would happen: > > > > > > > backend1 locks blk1 for non-IO reasons > > > > backend2 locks blk2, starts AIO write > > > > backend1 waits for lock on blk2 for non-IO reasons > > > > backend2 waits for lock on blk1 for non-IO reasons > > > > > > > > If that's right, in worker mode, the IO worker resolves that deadlock. > > > > What > > > > resolves it under io_uring? Another process that happens to do > > > > pgaio_io_ref_wait() would dislodge things, but I didn't locate the code > > > > to > > > > make that happen systematically. > > > > > > Yea, it's code that I haven't forward ported yet. I think basically > > > LockBuffer[ForCleanup] ought to call pgaio_io_ref_wait() when it can't > > > immediately acquire the lock and if the buffer has IO going on. > > > > I'm not finding that code in v2.6. What function has it? > > My local version now has it... Sorry, I was focusing on the earlier patches > until now. Looking more at my draft, I don't think it was race-free. I had a race-free way of doing it in the v1 patch (by making lwlocks extensible, so the check for IO could happen between enqueueing on the lwlock wait queue and sleeping on the semaphore), but that obviously requires that infrastructure. I want to focus on reads for now, so I'll add FIXMEs to the relevant places in the patch to support AIO writes and focus on the rest of the patch for now. Greetings, Andres Freund
Re: Add pg_buffercache_evict_all() and pg_buffercache_mark_dirty[_all]() functions
Hi, On Mon, 17 Feb 2025 at 19:59, Andres Freund wrote: > > Hi, Thanks for the review! And sorry for the late reply. > On 2024-12-25 15:57:34 +0300, Nazir Bilal Yavuz wrote: > > So, this patchset extends pg_buffercache with 3 functions: > > > > 1- pg_buffercache_evict_all(): This is very similar to the already > > existing pg_buffercache_evict() function. The difference is > > pg_buffercache_evict_all() does not take an argument. Instead it just > > loops over the shared buffers and tries to evict all of them. It > > returns the number of buffers evicted and flushed. > > I do think that'd be rather useful. Perhaps it's also worth having a version > that just evicts a specific relation? That makes sense. I will work on this. > > 2- pg_buffercache_mark_dirty(): This function takes a buffer id as an > > argument and tries to mark this buffer as dirty. Returns true on > > success. This returns false if the buffer is already dirty. Do you > > think this makes sense or do you prefer it to return true if the > > buffer is already dirty? > > I don't really have feelings about that ;) I feel the same. I just wanted to have a symmetry between dirty and evict functions. If you think this would be useless, I can remove it. > > From 813e5ec0da4c65970b4b1ce2ec2918e4652da9ab Mon Sep 17 00:00:00 2001 > > From: Nazir Bilal Yavuz > > Date: Fri, 20 Dec 2024 14:06:47 +0300 > > Subject: [PATCH v1 1/2] Add pg_buffercache_evict_all() function for testing > > > > This new function provides a mechanism to evict all shared buffers at > > once. It is designed to address the inefficiency of repeatedly calling > > pg_buffercache_evict() for each individual buffer, which can be > > time-consuming when dealing with large shared buffer pools > > (e.g., ~790ms vs. ~270ms for 16GB of shared buffers). > > > */ > > bool > > -EvictUnpinnedBuffer(Buffer buf) > > +EvictUnpinnedBuffer(Buffer buf, bool *flushed) > > { > > BufferDesc *desc; > > uint32 buf_state; > > boolresult; > > > > + *flushed = false; > > + > > /* Make sure we can pin the buffer. */ > > ResourceOwnerEnlarge(CurrentResourceOwner); > > ReservePrivateRefCountEntry(); > > @@ -6134,6 +6136,7 @@ EvictUnpinnedBuffer(Buffer buf) > > LWLockAcquire(BufferDescriptorGetContentLock(desc), > > LW_SHARED); > > FlushBuffer(desc, NULL, IOOBJECT_RELATION, IOCONTEXT_NORMAL); > > LWLockRelease(BufferDescriptorGetContentLock(desc)); > > + *flushed = true; > > } > > > > /* This will return false if it becomes dirty or someone else pins > > it. */ > > I don't think *flushed is necessarily accurate this way, as it might detect > that it doesn't need to flush the data (via StartBufferIO() returning false). You are right. It seems there is no way to get that information without editing the FlushBuffer(), right? And I think editing FlushBuffer() is not worth it. So, I can either remove it or explain in the docs that #buffers_flushed may not be completely accurate. > > + */ > > +Datum > > +pg_buffercache_evict_all(PG_FUNCTION_ARGS) > > +{ > > + Datum result; > > + TupleDesc tupledesc; > > + HeapTuple tuple; > > + Datum values[NUM_BUFFERCACHE_EVICT_ALL_ELEM]; > > + boolnulls[NUM_BUFFERCACHE_EVICT_ALL_ELEM] = {0}; > > + > > + int32 buffers_evicted = 0; > > + int32 buffers_flushed = 0; > > + boolflushed; > > + > > + if (get_call_result_type(fcinfo, NULL, &tupledesc) != > > TYPEFUNC_COMPOSITE) > > + elog(ERROR, "return type must be a row type"); > > + > > + if (!superuser()) > > + ereport(ERROR, > > + (errcode(ERRCODE_INSUFFICIENT_PRIVILEGE), > > + errmsg("must be superuser to use > > pg_buffercache_evict_all function"))); > > + > > + for (int buf = 1; buf < NBuffers; buf++) > > + { > > + if (EvictUnpinnedBuffer(buf, &flushed)) > > + buffers_evicted++; > > + if (flushed) > > + buffers_flushed++; > > I'd probably add a pre-check for the buffer not being in use. We don't need an > external function call, with an unconditional LockBufHdr() inside, for that > case. I did not understand why we would need this. Does not EvictUnpinnedBuffer() already check that the buffer is not in use? > > +/* > > + * MarkUnpinnedBufferDirty > > + * > > + * This function is intended for testing/development use only! > > + * > > + * To succeed, the buffer must not be pinned on entry, so if the caller > > had a > > + * particular block in mind, it might already have been replaced by some > > other > > + * block by the time this function runs. It's also unpinned on return, so > > the > > + * buffer might be occupied and flushed by the time control is returned. > > This > > + * inherent raciness without other inter
Re: Adding skip scan (including MDAM style range skip scan) to nbtree
On Wed, Mar 12, 2025 at 4:28 PM Alena Rybakina wrote: > Thank you for the explanation! > > Now I see why these changes were made. > > After your additional explanations, everything really became clear and I > fully agree with the current code regarding this part. Cool. > However I did not see an explanation to the commit regarding this place, > as well as a comment next to the assert and the parallel_aware check and > why BitmapIndexScanState was added in the ExecParallelReInitializeDSM. I added BitmapIndexScanState to the switch statement in ExecParallelReInitializeDSM because it is in the category of planstates that never need their shared memory reinitialized -- that's just how we represent such a plan state there. I think that this is supposed to serve as a kind of documentation, since it doesn't really affect how things behave. That is, it wouldn't actually affect anything if I had forgotten to add BitmapIndexScanState to the ExecParallelReInitializeDSM switch statement "case" that represents that it is in this "plan state category": the switch ends with catch-all "default: break;". > In my opinion, there is not enough additional explanation about this in > the form of comments, although I think that it has already been > explained here enough for someone who will look at this code. What can be done to improve the situation? For example, would adding a comment next to the new assertions recently added to ExecIndexScanReInitializeDSM and ExecIndexOnlyScanReInitializeDSM be an improvement? And if so, what would the comment say? -- Peter Geoghegan
Re: PGSERVICEFILE as part of a normal connection string
On Mon, Jan 27, 2025 at 2:01 PM Michael Paquier wrote: > > On Wed, Nov 20, 2024 at 02:58:43AM -0500, Corey Huinker wrote: > > Currently, a lot of our utility scripts (anything that uses > > connectDatabase) don't support service=name params or PGSERVICE=name env > > vars, which is really too bad. I previously thought that this was because > > of a lack of interest, but perhaps people do want it? > > I'm all for more test coverage, FWIW. > > Torsten, the patch has been waiting on input from you based on my > latest review for some time, so I have marked it as returned with > feedback in the CP app. Feel free to resubmit a new version if you > are planning to work on that. TO: Torsten, CC: Micael and other hackers If you can't work for ther patch for a while because you are busy or other some reason, I can become additinal reviewer and apply review comments from Micael to the patch instead of you. If you don't want my action, please reply and notice me that. If possible, within a week :) Just to let you know, my action is not intended to steal your contribution but to prevent your good idea from being lost. TO: Mecael and other hackers, There are any problem in light of community customs? --- Great regards, Ryo Kanbayashi
RE: CRC32C Parallel Computation Optimization on ARM
> > Intel has contributed SSE4.2 CRC32C [1] and AVX-512 CRC32C [2] based on > similar techniques to postgres. > > ...this is a restatement of facts we already know. I'm guessing the intended > takeaway is "since Intel submitted an implementation to us based on paper A, > then we are free to separately also use a technique from paper B (which cites > patents)". Yes. > The original proposal that started this thread is below, and I'd like to give > that > author credit for initiating that work Yup, that should be fine. Raghuveer
Re: Separate GUC for replication origins
On Thu, 13 Mar 2025 at 05:59, Euler Taveira wrote: > > On Wed, Mar 12, 2025, at 8:47 AM, vignesh C wrote: > > I reviewed the discussion on this thread and believe we now have an > agreement on the design and GUC names. However, the patch still needs > updates and extensive testing, especially considering its impact on > backward compatibility. I'm unsure if this feature can be committed in > the current CommitFest. If you're okay with it, we can move it to the > next CommitFest. However, if you prefer to keep it, we can do our best > to complete it and make a final decision at the end of the CommitFest. > > > This is a mechanical patch. I was waiting if someone would object or suggest a > better GUC name. It seems to me it isn't. The pre existing GUC > (max_replication_slots) already has coverage. I don't know what additional > tests you have in mind. Could you elaborate? I was considering any potential impact on pg_upgrade and pg_createsubscriber. I will run tests with the latest posted patch to verify. > I'm biased to say that it is one of the easiest patches to commit because it > is > just assigning a new GUC name for a pre existing functionality. If there is no > interested in it, it will be moved to the next CF. Sounds like a plan! Let's verify it and work towards getting it committed. Regards, Vignesh
Re: DOCS: Make the Server Application docs synopses more consistent
On Wed, Mar 12, 2025 at 3:18 AM David G. Johnston wrote: > > On Thu, Dec 12, 2024 at 8:25 PM Peter Smith wrote: >> >> [1] initdb [option...] [ --pgdata | -D ] directory >> [2] pg_archivecleanup [option...] archivelocation oldestkeptwalfile >> [3] pg_checksums [option...] [[ -D | --pgdata ]datadir] >> [4] pg_controldata [option] [[ -D | --pgdata ]datadir] >> [5] pg_createsubscriber [option...] { -d | --database }dbname { -D | >> --pgdata }datadir { -P | --publisher-server }connstr >> [6] pg_ctl (the many synopses here don't give all the switch >> alternatives, it would be too much...) >> [7] pg_resetwal [ -f | --force ] [ -n | --dry-run ] [option...] [ -D | >> --pgdata ]datadir >> [8] pg_rewind [option...] { -D | --target-pgdata } directory { >> --source-pgdata=directory | --source-server=connstr } >> [9] pg_test_fsync [option...] >> [10] pg_test_timing [option...] >> [11] pg_upgrade -b oldbindir [-B newbindir] -d oldconfigdir -D >> newconfigdir [option...] >> [12] pg_waldump [option...] [startseg [endseg]] >> [13] pg_walsummary [option...] [file...] >> > > Here are some guidelines we should add to [1]. > > I don't feel listing both short and long form args is a good use of space - > and the { | } impairs readability. The short form combined with, usually, a > decent replaceable name, shows the reader the common interactive usage and > they can find the long forms in the Options section. Maybe use long forms > for value-less options since those don't get the argname hint. > > The non-space between option and its value reduces readability. IIUC a space > in between doesn't impact correctness so I say go for readability. This > becomes more pronounced with the first items since it is only marginally > readable now because there is always a } or ] before the replaceable. Though > this may involve fighting the markup a bit...I haven't experimented yet > (pg_rewind managed it...). > > The first listed command should probably only include mandatory options. > When there are multiple combinations of mandatory options show multiple > commands, one for each variant. Use examples to showcase idiomatic usage > patterns with descriptions. There is room to argue exceptions to be listed > also in Synopsis. > > Establish the convention of datadir as the replaceable name. Possibly do the > same for other common items. > > We should have a reference page (near [1] listing DocBook elements and our > guidelines for how/where to use them. > > In any case, details aside, modifying [1] with whatever is agreed to (and > making changes to conform) is something I agree should happen. > Hi David. Thanks for taking an interest in this thread. I've made some updates and attached the v2 patch. == CHANGE SUMMARY There are some discussion points among these changes as well as other TODO items to check I haven't broken anything. [1] initdb [option...] [ --pgdata | -D ] directory - removed the short/long option variations - cleanup the tags - changed replaceable 'directory' to 'datadir' in the synopsis and in the options list - TODO. Need to confirm if changing to "[-D datadir]" is correct, or should that be "[[-D] datadir]" [2] pg_archivecleanup [option...] archivelocation oldestkeptwalfile - no changes [3] pg_checksums [option...] [[ -D | --pgdata ]datadir] - removed the short/long option variations - cleanup the tags - TODO. Need to confirm if changing to "[-D datadir]" is correct, or should that be "[[-D] datadir]" (see also initdb etc) [4] pg_controldata [option] [[ -D | --pgdata ]datadir] - removed the short/long option variations - cleanup the tags - some should be - the "[option]" should be "[option...]" - TODO. Need to confirm if changing to "[-D datadir]" is correct, or should that be "[[-D] datadir]" (see also initdb etc) - TODO. The page structure looks strange. There should be an "Options" section to describe -D, -V, and -? [5] pg_createsubscriber [option...] { -d | --database }dbname { -D | --pgdata }datadir { -P | --publisher-server }connstr - removed the short/long option variations - cleanup the tags - rearranged so -D comes first (same as most other commands) - make the "-D datadir" optional "[-D datadir]" - change 'directory' to 'datadir' in the options list so it matches the synopsis - change the order of the options list to match - TODO. Need to confirm if changing to "[-D datadir]" is correct, or should that be "[[-D] datadir]" (see also initdb etc) [6] pg_ctl (the many synopses here don't give all the switch alternatives, it would be too much...) - no changes [7] pg_resetwal [ -f | --force ] [ -n | --dry-run ] [option...] [ -D | --pgdata ]datadir - removed the short/long option variations - cleanup the tags - TODO. Looks a bit strange with "[options...]" not shown first. [8] pg_rewind [option...] { -D | --target-pgdata } directory { --source-pgdata=directory | --source-server=connstr } - removed the short/long option variations - cleanup the tags - make the "{-D datadir}
Re: Proposal: Filter irrelevant change before reassemble transactions during logical decoding
On Mon, Mar 3, 2025 at 9:10 PM Ajin Cherian wrote: > > On Thu, Feb 20, 2025 at 8:42 PM Hayato Kuroda (Fujitsu) > wrote: > > I confirmed with tests that with the patch, a lot less disk space is used when there are lots of unpublished changes and the subscription is not configured to be streaming. HEAD code: Initial disk usage in replication slot directory: 4.0K/home/ajin/dataoss/pg_replslot/mysub BEGIN INSERT 0 100 COMMIT Final disk usage in replication slot directory: 212M/home/ajin/dataoss/pg_replslot/mysub Patched code: Initial disk usage in replication slot directory: 4.0K/home/ajin/dataoss/pg_replslot/mysub BEGIN INSERT 0 100 COMMIT Final disk usage in replication slot directory: 4.0K/home/ajin/dataoss/pg_replslot/mysub Attaching the test script used. Regards, Ajin Cherian Fujitsu Australia test_disk_usage.sh Description: Binary data
Re: Vacuum statistics
On Wed, Mar 12, 2025 at 2:41 PM Alena Rybakina wrote: > Hi! > > On 10.03.2025 12:13, Ilia Evdokimov wrote: > > Hi, > > > > After commit eaf5027 we should add information about wal_buffers_full. > > > > Any thoughts? > > > > -- > > Best regards, > > Ilia Evdokimov, > > Tantor Labs LLC. > > > I think I can add it. To be honest, I haven't gotten to know this > statistic yet, haven't had time to get to know this commit yet. How will > this statistic help us analyze the work of the vacuum for relations? > The usecase I can see here is that we don't want autovac creating so much WAL traffic that it starts forcing other backends to have to write WAL out. But tracking how many times autovac writes WAL buffers won't help with that (though we also don't want any WAL buffers written by autovac to be counted in the system-wide wal_buffers_full: autovac is a BG process and it's fine if it's spending time writing WAL out). I think the same is true of a manual vacuum as well. What would be helpful would be a way to determine if autovac was causing enough traffic to force other backends to write WAL. Offhand I'm not sure how practical that actually is though. BTW, there's also an argument to be made that autovac should throttle itself if we're close to running out of available WAL buffers...
DOCS: Make the Server Application docs synopses more consistent
On Wednesday, March 12, 2025, Peter Smith wrote: > > > I've made some updates and attached the v2 patch. Thanks. Full review later. Pg_controldata - TODO. The page structure looks strange. There should be an "Options" > section to describe -D, -V, and -? Agreed. > > [7] pg_resetwal [ -f | --force ] [ -n | --dry-run ] [option...] [ -D | > > - TODO. Looks a bit strange with "[options...]" not shown first. Yeah, need to fix that. > > [8] pg_rewind [option...] { -D | --target-pgdata } directory > > - TODO. Was it OK to make the "[-D datadir]" optional like all others? > The page does not mention PGDATA. My take on this is that the presence of environment variables doesn’t impact whether a given CLI option is shown optional or mandatory. We are, in effect, communicating that “datadir” must be specified for this command. And then choosing one of the ways of specifying it. The reader can use the indicated short-form -D, the long-form —pgdata if it exists, or the DATADIR environment variable (this applies to any option, not just datadir) as they desire. In short, most/all of these should show “-D datadir” without [ ]. [11] pg_upgrade -b oldbindir [-B newbindir] -d oldconfigdir -D > newconfigdir [option...] > - made all the option to be optional because the page says they can > all use environment variable defaults. See note above regarding environment variables. > - TODO. Looks a bit strange with "[options...]" not shown first. Should be moved. > > OTHER QUESTIONS > - Should we use class="parameter" for all the tags? > Currently, some do, but most do not. > > I’d probably document that they should have that class but leave the bulk cleanup of existing content for a separate patch. ~~~ > > BTW, the scope of this thread is originally only the *server* > application pages, but all the *client* applications might be impacted > if we applied the same rules there. > Yeah, noticed that yesterday. I don’t see a reason why the same policy shouldn’t apply to both subsets. I wouldn’t require everything has to be changed right now though. Perfectly happy to ask for people specifically familiar with these applications to modify the pages under the new policy and try to divvy up the work. Then pickup what doesn’t get done. David J.
Re: DOCS: Make the Server Application docs synopses more consistent
On Wednesday, March 12, 2025, David G. Johnston wrote: > > > My take on this is that the presence of environment variables doesn’t > impact whether a given CLI option is shown optional or mandatory. We are, > in effect, communicating that “datadir” must be specified for this > command. And then choosing one of the ways of specifying it. The reader > can use the indicated short-form -D, the long-form —pgdata if it exists, or > the DATADIR environment variable (this applies to any option, not just > datadir) as they desire. In short, most/all of these should show “-D > datadir” without [ ]. > Expanding on this a bit. Write the synopsis as if no environment variables are set. Generally, if the application itself defines the default value do not list the option in the primary synopsis entry at all. If the application takes an option’s default value from the OS environment by a means other than an optional environment variable then show the option in the primary synopsis entry but mark it as optional. Connection options can have their own rules; which means we don’t need to be specifying -U and -d everywhere when we deal with client applications. Though maybe we should? David J.
Re: Add Postgres module info
On 7/3/2025 16:56, Andrei Lepikhov wrote: On 2/3/2025 20:35, Andrei Lepikhov wrote: On 17/2/2025 04:00, Michael Paquier wrote: No documentation provided. Ok, I haven't been sure this idea has a chance to be committed. I will introduce the docs in the next version. This is a new version with bug fixes. Also, use TAP tests instead of regression tests. Still, no documentation is included. v5 contains documentation entries for the pg_get_modules function and the PG_MODULE_MAGIC_EXT macro. Also, commit comment is smoothed a little. -- regards, Andrei LepikhovFrom b5072ce898e0760a717411535c0429303f4e5a45 Mon Sep 17 00:00:00 2001 From: "Andrei V. Lepikhov" Date: Tue, 19 Nov 2024 18:45:36 +0700 Subject: [PATCH v5] Introduce PG_MODULE_MAGIC_EXT macro. This macro provides dynamically loaded shared libraries (modules) with standard way to incorporate version (supposedly, defined according to semantic versioning specification) and name data. The introduced catalogue routine pg_get_modules can be used to find this module by name and check the version. It makes users independent from file naming conventions. With a growing number of Postgres core hooks and the introduction of named DSM segments, the number of modules that don't need to be loaded on startup may grow fast. Moreover, in many cases related to query tree transformation or extra path recommendation, such modules might not need database objects except GUCs - see auto_explain as an example. That means they don't need to execute the 'CREATE EXTENSION' statement at all and don't have a record in the pg_extension table. Such a trick provides much flexibility, including an online upgrade and may become widespread. In addition, it is also convenient in support to be sure that the installation (or at least the backend) includes a specific version of the module. Even if a module has an installation script, it is not rare that it provides an implementation for a range of UI routine versions. It makes sense to ensure which specific version of the code is used. Discussions [1,2] already mentioned module-info stuff, but at that time, extensibility techniques and extension popularity were low, and it wasn't necessary to provide that data. [1] https://www.postgresql.org/message-id/flat/20060507211705.GB3808%40svana.org [2] https://www.postgresql.org/message-id/flat/20051106162658.34c31d57%40thunder.logicalchaos.org --- doc/src/sgml/xfunc.sgml | 44 + src/backend/utils/fmgr/dfmgr.c| 66 ++- src/include/catalog/pg_proc.dat | 6 ++ src/include/fmgr.h| 28 +++- src/test/modules/test_misc/Makefile | 5 +- .../test_misc/t/008_check_pg_get_modules.pl | 65 ++ src/test/modules/test_shm_mq/test.c | 5 +- src/test/modules/test_slru/test_slru.c| 5 +- 8 files changed, 216 insertions(+), 8 deletions(-) create mode 100644 src/test/modules/test_misc/t/008_check_pg_get_modules.pl diff --git a/doc/src/sgml/xfunc.sgml b/doc/src/sgml/xfunc.sgml index 9f22dacac7d..22e26258b15 100644 --- a/doc/src/sgml/xfunc.sgml +++ b/doc/src/sgml/xfunc.sgml @@ -1967,6 +1967,28 @@ CREATE FUNCTION square_root(double precision) RETURNS double precision PG_MODULE_MAGIC; + +or + +PG_MODULE_MAGIC_EXT(name, version); + + + + +extended magic block + + +Extended magic block provides the same functionality like PG_MODULE_MAGIC, +allowing the extension provider to hard code into the binary file +information about the +version and module name that can be read and processed later by the +pg_get_modules function. + + +PG_MODULE_MAGIC_EXT( +.name = "my_module_name", +.version = "1.2.3" +); @@ -1993,6 +2015,28 @@ PG_MODULE_MAGIC; return void. There is presently no way to unload a dynamically loaded file. + +pg_get_modules + + +List of loaded modules + + + +To see the status of loaded modules a function named +pg_get_modules is used. +The function receives no parameters and returns whole set of loaded modules +in current backend. The result can differ among backends because some +modules might be loaded dynamically by the LOAD command. +Returns a set of records describing the module. +The module_name column of text type +contains the name the maintainer provided at the compilation stage or NULL +if not defined. +The version column of text type contains +version info hard coded during the compilation or NULL if not defined. +The libname column of text type contains +full path to the file of loaded module. + diff --git a/src/backend/utils/fmgr/dfmgr.c b/src/backend/utils/fmgr/dfmgr.c index 87b233cb887..70d6ced4157 100644 --- a/src/backend/utils/fmgr/dfmgr.c +++ b/src/backend/utils/fmgr/dfmgr.c @@ -21,10 +21,12 @@ #endif /* !WIN32 */
Re: remove open-coded popcount in acl.c
On 2025-Mar-12, Nathan Bossart wrote: > On Wed, Mar 12, 2025 at 05:23:25PM +0100, Álvaro Herrera wrote: > > Strange: this code is not covered by any tests. > > > > https://coverage.postgresql.org/src/backend/utils/adt/acl.c.gcov.html#5533 > > https://coverage.postgresql.org/src/backend/utils/adt/acl.c.gcov.html#5438 > > Huh. Well, it's easy enough to add some basic tests for the grantor > selection machinery. Here's a first try. Thanks :-) I confirm that this covers the code in select_best_grantor that you're modifying. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "Ed is the standard text editor." http://groups.google.com/group/alt.religion.emacs/msg/8d94ddab6a9b0ad3
Re: remove open-coded popcount in acl.c
On Wed, Mar 12, 2025 at 05:23:25PM +0100, Álvaro Herrera wrote: > Strange: this code is not covered by any tests. > > https://coverage.postgresql.org/src/backend/utils/adt/acl.c.gcov.html#5533 > https://coverage.postgresql.org/src/backend/utils/adt/acl.c.gcov.html#5438 Huh. Well, it's easy enough to add some basic tests for the grantor selection machinery. Here's a first try. -- nathan >From d3cf9ca237f647ebcca20c55c8302f00f716c459 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Wed, 12 Mar 2025 10:45:12 -0500 Subject: [PATCH v2 1/1] Remove open-coded popcount in acl.c. --- src/backend/utils/adt/acl.c | 20 +- src/test/regress/expected/privileges.out | 27 src/test/regress/sql/privileges.sql | 22 +++ 3 files changed, 50 insertions(+), 19 deletions(-) diff --git a/src/backend/utils/adt/acl.c b/src/backend/utils/adt/acl.c index 6a76550a5e2..ba14713fef2 100644 --- a/src/backend/utils/adt/acl.c +++ b/src/backend/utils/adt/acl.c @@ -5432,24 +5432,6 @@ select_best_admin(Oid member, Oid role) return admin_role; } - -/* does what it says ... */ -static int -count_one_bits(AclMode mask) -{ - int nbits = 0; - - /* this code relies on AclMode being an unsigned type */ - while (mask) - { - if (mask & 1) - nbits++; - mask >>= 1; - } - return nbits; -} - - /* * Select the effective grantor ID for a GRANT or REVOKE operation. * @@ -5532,7 +5514,7 @@ select_best_grantor(Oid roleId, AclMode privileges, */ if (otherprivs != ACL_NO_RIGHTS) { - int nnewrights = count_one_bits(otherprivs); + int nnewrights = pg_popcount64(otherprivs); if (nnewrights > nrights) { diff --git a/src/test/regress/expected/privileges.out b/src/test/regress/expected/privileges.out index a76256405fe..954f549555e 100644 --- a/src/test/regress/expected/privileges.out +++ b/src/test/regress/expected/privileges.out @@ -3293,3 +3293,30 @@ DROP SCHEMA reindex_test; DROP ROLE regress_no_maintain; DROP ROLE regress_maintain; DROP ROLE regress_maintain_all; +-- grantor selection +CREATE ROLE regress_grantor1; +CREATE ROLE regress_grantor2 ROLE regress_grantor1; +CREATE ROLE regress_grantor3; +CREATE TABLE grantor_test1 (); +CREATE TABLE grantor_test2 (); +CREATE TABLE grantor_test3 (); +GRANT SELECT ON grantor_test2 TO regress_grantor1 WITH GRANT OPTION; +GRANT SELECT, UPDATE ON grantor_test3 TO regress_grantor2 WITH GRANT OPTION; +SET ROLE regress_grantor1; +GRANT SELECT, UPDATE ON grantor_test1 TO regress_grantor3; +ERROR: permission denied for table grantor_test1 +GRANT SELECT, UPDATE ON grantor_test2 TO regress_grantor3; +WARNING: not all privileges were granted for "grantor_test2" +GRANT SELECT, UPDATE ON grantor_test3 TO regress_grantor3; +RESET ROLE; +SELECT * FROM information_schema.table_privileges t + WHERE grantor LIKE 'regress_grantor%' ORDER BY ROW(t.*); + grantor | grantee | table_catalog | table_schema | table_name | privilege_type | is_grantable | with_hierarchy +--+--+---+--+---++--+ + regress_grantor1 | regress_grantor3 | regression| public | grantor_test2 | SELECT | NO | YES + regress_grantor2 | regress_grantor3 | regression| public | grantor_test3 | SELECT | NO | YES + regress_grantor2 | regress_grantor3 | regression| public | grantor_test3 | UPDATE | NO | NO +(3 rows) + +DROP TABLE grantor_test1, grantor_test2, grantor_test3; +DROP ROLE regress_grantor1, regress_grantor2, regress_grantor3; diff --git a/src/test/regress/sql/privileges.sql b/src/test/regress/sql/privileges.sql index d195aaf1377..b81694c24f2 100644 --- a/src/test/regress/sql/privileges.sql +++ b/src/test/regress/sql/privileges.sql @@ -2042,3 +2042,25 @@ DROP SCHEMA reindex_test; DROP ROLE regress_no_maintain; DROP ROLE regress_maintain; DROP ROLE regress_maintain_all; + +-- grantor selection +CREATE ROLE regress_grantor1; +CREATE ROLE regress_grantor2 ROLE regress_grantor1; +CREATE ROLE regress_grantor3; +CREATE TABLE grantor_test1 (); +CREATE TABLE grantor_test2 (); +CREATE TABLE grantor_test3 (); +GRANT SELECT ON grantor_test2 TO regress_grantor1 WITH GRANT OPTION; +GRANT SELECT, UPDATE ON grantor_test3 TO regress_grantor2 WITH GRANT OPTION; + +SET ROLE regress_grantor1; +GRANT SELECT, UPDATE ON grantor_test1 TO regress_grantor3; +GRANT SELECT, UPDATE ON grantor_test2 TO regress_grantor3; +GRANT SELECT, UPDATE ON grantor_test3 TO regress_grantor3; +RESET ROLE; + +SELECT * FROM information_schema.table_privileges t + WHERE grantor LIKE 'regress_g
Re: remove open-coded popcount in acl.c
On Wed, Mar 12, 2025 at 07:34:16PM +0100, Álvaro Herrera wrote: > Thanks :-) I confirm that this covers the code in select_best_grantor > that you're modifying. Thanks for the quick review. I'll plan on committing this shortly if CI is happy. -- nathan
Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints
On 2025-Mar-12, Rushabh Lathia wrote: > Hi Alvaro, > > Here are the latest patches, which includes the regression fix. Thank you. Taking a step back after discussing this with some colleagues, I need to contradict what I said at the start of this thread. There's a worry that changing pg_attribute.attnotnull in the way I initially suggested might not be such a great idea after all. I did a quick search using codesearch.debian.net for applications reading that column and thinking about how they would react to this change; I think in the end it's going to be quite disastrous. We would break a vast number of these apps, and there are probably countless other apps and frameworks that we would also break. Everybody would hate us forever. Upgrading to Postgres 18 would become as bad an experience as the drastic change of implicit casts to text in 8.3. Nothing else in the intervening 17 years strikes me as so problematic as this change would be. So I think we may need to go back and somehow leave pg_attribute alone, to avoid breaking the whole world. Maybe it would be sufficient to change the CompactAttribute->attnotnull to no longer be a boolean, but the new char representation instead. I'm not sure if this would actually work. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Re: [PATCH] SVE popcount support
On Wed, Mar 12, 2025 at 10:34:46AM +, chiranmoy.bhattacha...@fujitsu.com wrote: > On Wed, Mar 12, 2025 at 02:41:18AM +, nathandboss...@gmail.com wrote: > >> v5-no-sve is the result of using a function pointer, but pointing to the >> "slow" versions instead of the SVE version. v5-sve is the result of the >> latest patch in this thread on a machine with SVE support, and v5-4reg is >> the result of the latest patch in this thread modified to process 4 >> register's worth of data at a time. > > Nice, I wonder why I did not observe any performance gain in the 4reg > version. Did you modify the 4reg version code? > > One possible explanation is that you used Graviton4 based instances > whereas I used Graviton3 instances. Yeah, it looks like the number of vector registers is different [0]. >> For the latter point, I think we should consider trying to add a separate >> Neon implementation that we use as a fallback for machines that don't have >> SVE. My understanding is that Neon is virtually universally supported on >> 64-bit Arm gear, so that will not only help offset the function pointer >> overhead but may even improve performance for a much wider set of machines. > > I have added the NEON implementation in the latest patch. > > Here are the numbers for drive_popcount(100, 1024) on m7g.8xlarge: > Scalar - 692ms > Neon - 298ms > SVE - 112ms Those are nice results. I'm a little worried about the Neon implementation for smaller inputs since it uses a per-byte loop for the remaining bytes, though. If we can ensure there's no regression there, I think this patch will be in decent shape. [0] https://github.com/aws/aws-graviton-getting-started?tab=readme-ov-file#building-for-graviton -- nathan
Re: Handle interrupts while waiting on Append's async subplans
On 03/03/2025 19:44, Heikki Linnakangas wrote: In any case, the attached patch works and seems like an easy fix for stable branches at least. Committed that to master and all affected stable branches. -- Heikki Linnakangas Neon (https://neon.tech)
Re: remove open-coded popcount in acl.c
On 2025-Mar-12, Nathan Bossart wrote: > There's a count_one_bits() function in acl.c that can be replaced with a > call to pg_popcount64(). This isn't performance-critical code, but IMHO we > might as well use the centralized implementation. Makes sense. Patch looks good to me. > @@ -5532,7 +5514,7 @@ select_best_grantor(Oid roleId, AclMode privileges, >*/ > if (otherprivs != ACL_NO_RIGHTS) > { > - int nnewrights = > count_one_bits(otherprivs); > + int nnewrights = > pg_popcount64(otherprivs); Strange: this code is not covered by any tests. https://coverage.postgresql.org/src/backend/utils/adt/acl.c.gcov.html#5533 https://coverage.postgresql.org/src/backend/utils/adt/acl.c.gcov.html#5438 -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/
Re: vacuumdb changes for stats import/export
On Wed, Mar 12, 2025 at 01:56:04PM +0700, John Naylor wrote: > The change here seems fine. My only quibble is that this sentence now > seems out of place: "Option --analyze-in-stages can be used to > generate minimal statistics quickly." I'm thinking we should make a > clearer separation for with and without the --no-statistics option > specified. I think the recommendation stays the same regardless of whether --no-statistics was used. --missing-only should do the right think either way. But I do agree that this paragraph feels a bit haphazard. I modified it to call out the full command for both --analyze-only and --analyze-in-stages, and I moved the note about --jobs to its own sentence and made it clear that it is applicable to both of the suggested vacuumdb commands. -- nathan >From 57329fe78042b12331852c4b8a121e73020e6a90 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Tue, 11 Mar 2025 11:18:36 -0500 Subject: [PATCH v6 1/3] vacuumdb: Teach vacuum_one_database() to reuse query results. Presently, each call to vacuum_one_database() performs a catalog query to retrieve the list of tables to process. A proposed follow-up commit would add a "missing only" feature to --analyze-in-stages, which requires us to save the results of the catalog query (since tables without statistics would have them after the first stage). This commit adds this ability via a new parameter for vacuum_one_database() that specifies either a previously-retrieved list to process or a place to store the results of the catalog query for later use. Note that this commit does not make use of this new parameter for anything. That is left as a future exercise. The trade-offs of this approach are increased memory usage and less responsiveness to concurrent catalog changes in later stages, neither of which is expected to bother anyone. Author: Corey Huinker Co-authored-by: Nathan Bossart Reviewed-by: John Naylor Discussion: https://postgr.es/m/Z5O1bpcwDrMgyrYy%40nathan --- src/bin/scripts/vacuumdb.c | 330 ++--- 1 file changed, 194 insertions(+), 136 deletions(-) diff --git a/src/bin/scripts/vacuumdb.c b/src/bin/scripts/vacuumdb.c index 982bf070be6..e28f82a0eba 100644 --- a/src/bin/scripts/vacuumdb.c +++ b/src/bin/scripts/vacuumdb.c @@ -62,10 +62,16 @@ typedef enum static VacObjFilter objfilter = OBJFILTER_NONE; +static SimpleStringList *retrieve_objects(PGconn *conn, + vacuumingOptions *vacopts, + SimpleStringList *objects, + bool echo); + static void vacuum_one_database(ConnParams *cparams, vacuumingOptions *vacopts, int stage, SimpleStringList *objects, + SimpleStringList **found_objs, int concurrentCons, const char *progname, bool echo, bool quiet); @@ -405,7 +411,7 @@ main(int argc, char *argv[]) { vacuum_one_database(&cparams, &vacopts, stage, - &objects, + &objects, NULL, concurrentCons, progname, echo, quiet); } @@ -413,7 +419,7 @@ main(int argc, char *argv[]) else vacuum_one_database(&cparams, &vacopts, ANALYZE_NO_STAGE, - &objects, + &objects, NULL, concurrentCons, progname, echo, quiet); } @@ -461,8 +467,36 @@ escape_quotes(const char *src) /* * vacuum_one_database * - * Process tables in the given database. If the 'objects' list is empty, - * process all tables in the database. + * Process tables in the given database. + * + * There are two ways to specify the list of objects to process: + * + * 1) The "found_objs" parameter is a double pointer to a fully qualified list + *of objects to process, as returned by a previous call to + *vacuum_one_dat
Re: Vacuum statistics
Hi! On 10.03.2025 16:33, Kirill Reshke wrote: On Thu, 27 Feb 2025 at 23:00, Alena Rybakina wrote: Hi! On 17.02.2025 17:46, Alena Rybakina wrote: On 04.02.2025 18:22, Alena Rybakina wrote: Hi! Thank you for your review! On 02.02.2025 23:43, Alexander Korotkov wrote: On Mon, Jan 13, 2025 at 3:26 PM Alena Rybakina wrote: I noticed that the cfbot is bad, the reason seems to be related to the lack of a parameter in src/backend/utils/misc/postgresql.conf.sample. I added it, it should help. The patch doesn't apply cleanly. Please rebase. I rebased them. The patch needed a rebase again. There is nothing new since version 18, only a rebase. The patch needed a new rebase. Sorry, but due to feeling unwell I picked up a patch from another thread and squashed the patch for vacuum statistics for indexes and heaps in the previous version. Now I fixed everything together with the rebase. -- Regards, Alena Rybakina Postgres Professional Hi! CI fails on this one[0] Is it a flap or a real problem caused by v20? ``` SELECT relpages AS irp diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/vacuum_tables_and_db_statistics.out /tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/results/vacuum_tables_and_db_statistics.out --- /tmp/cirrus-ci-build/src/test/regress/expected/vacuum_tables_and_db_statistics.out 2025-03-10 09:36:10.274799176 + +++ /tmp/cirrus-ci-build/build-32/testrun/recovery/027_stream_regress/data/results/vacuum_tables_and_db_statistics.out 2025-03-10 09:49:35.796596462 + @@ -65,7 +65,7 @@ WHERE vt.relname = 'vestat' AND vt.relid = c.oid; relname | vm_new_frozen_pages | tuples_deleted | relpages | pages_scanned | pages_removed -+-++--+---+--- - vestat | 0 | 0 | 455 | 0 | 0 + vestat | 0 | 0 | 417 | 0 | 0 (1 row) SELECT relpages AS rp === EOF === ``` [0]https://api.cirrus-ci.com/v1/artifact/task/5336493629112320/testrun/build-32/testrun/recovery/027_stream_regress/log/regress_log_027_stream_regress Thank you for your help, I'll fix it soon. -- Regards, Alena Rybakina Postgres Professional
Re: Vacuum statistics
Hi! On 10.03.2025 12:13, Ilia Evdokimov wrote: Hi, After commit eaf5027 we should add information about wal_buffers_full. Any thoughts? -- Best regards, Ilia Evdokimov, Tantor Labs LLC. I think I can add it. To be honest, I haven't gotten to know this statistic yet, haven't had time to get to know this commit yet. How will this statistic help us analyze the work of the vacuum for relations? -- Regards, Alena Rybakina Postgres Professional
Re: Optimization for lower(), upper(), casefold() functions.
12.03.2025 19:55, Alexander Borisov wrote: [...] A couple questions: * Is there a reason the fast-path for codepoints < 0x80 is in unicode_case.c rather than unicode_case_func.h? Yes, this is an important optimization, below are benchmarks that [...] I forgot to add the benchmark: Benchmark Anatation: Patch v3j_0001: patch v3j_0001 without any changes. Patch v3j_0001 + static: patch v3j_0001 with adding static to casemap(). Patch v3j_0001 + static + fast path: patch v3j_0001 with adding static to casemap() and fast path for codepoint < 0x80. v4_0001: v3j_0001 + static + fast path v4_0001 + v3j_0002: v3j_0002 patch unchanged, but with static for casemap() (inherited from v4_0001 patch) v4_0001 + v4_0002: changed fast path for codepoint < 0x80. Made fast return to avoid unnecessary checks. Without: current version of the code without any patches. All results are in tps. * macOS 15.1 (Apple M3 Pro) (Apple clang version 16.0.0) ASCII: Repeated characters (700kb) in the range from 0x20 to 0x7E. Patch v3j_0001: 201.029609 Patch v3j_0001 + static: 247.155438 Patch v3j_0001 + static + fast path: 267.941047 Patch v4_0001 + v3j_0002: 260.737601 Patch v4_0001 + v4_0002: 268.913658 Without: 260.437508 Cyrillic: Repeated characters (1MB) in the range from 0x0410 to 0x042F. Patch v3j_0001: 48.130350 Patch v3j_0001 + static: 49.365897 Patch v3j_0001 + static + fast path: 48.891842 Patch v4_0001 + v3j_0002: 88.833061 Patch v4_0001 + v4_0002: 88.329603 Without: 47.869687 Unicode: A query consisting of all Unicode characters from 0xA0 to 0x2FA1D (excluding 0xD800..0xDFFF). Patch v3j_0001: 91.333557 Patch v3j_0001 + static: 92.464786 Patch v3j_0001 + static + fast path: 91.359428 Patch v4_0001 + v3j_0002: 103.390609 Patch v4_0001 + v4_0002: 102.819763 Without: 92.279652 Regards, Alexander Borisov
Re: Rename functions to alloc/free things in reorderbuffer.c
Heikki Linnakangas writes: > ReorderBufferGetRelids allocates an array with MemoryContextAlloc, and > ReorderBufferReturnRelids just calls pfree. The pools are long gone, and > now the naming looks weird. > Attached patch renames those functions and other such functions to use > the terms Alloc/Free. I actually wonder if we should go further and > remove these functions altogether, and change the callers to call > MemoryContextAlloc directly. But I didn't do that yet. Yeah, that is very confusing, especially since nearby code uses names like ReorderBufferGetFoo for functions that are lookups not allocations. +1 for Alloc/Free where that's an accurate description. Given that a lot of these are not just one-liners, I'm not sure that getting rid of the ones that are would help much. regards, tom lane
Re: making EXPLAIN extensible
Hi, >>> EXPLAIN output. It wouldn't make sense for core to have an EXPLAIN >>> option whose whole purpose is to cater to the needs of some extension, >>> so that made me think of providing some extensibility infrastructure. >> Making EXPLAIN extensible sounds like a good idea.. FWIW, There is a >> discussion [0] >> for showing FDW remote plans ( postgres_fdw specifically), and I think >> we will need to >> add some new options to EXPLAIN to make that possible. > Have not looked at your patches, but I will do so now. Over the past few days I have had a chance to experiment with these patches and good news is that it has allowed me to extend EXPLAIN for postrges_fdw to show remote plans. I will share the update and patch for this in [0], but thought it will be good to share here as well. postgres=# explain (remote_plans) select * from t_r1, t1_r1; QUERY PLAN Nested Loop (cost=200.00..83272.80 rows=6553600 width=16) Plan Node ID: 0 -> Foreign Scan on t_r1 (cost=100.00..673.20 rows=2560 width=8) Plan Node ID: 1 -> Materialize (cost=100.00..686.00 rows=2560 width=8) Plan Node ID: 2 -> Foreign Scan on t1_r1 (cost=100.00..673.20 rows=2560 width=8) Plan Node ID: 3 Remote Plans: Seq Scan on t (cost=0.00..32.60 rows=2260 width=8) Statement Name: Plan Node ID = 1 Seq Scan on t1 (cost=0.00..32.60 rows=2260 width=8) Statement Name: Plan Node ID = 3 (14 rows) I do have some findings/suggestions: 1/ As you can see form the output above, I used explain_per_node_hook to append a "Plan Node ID" to the explain output. I really don't like having it there, and prefer that it gets added to the top line of the node. i.e. -> Foreign Scan on t_r1 (cost=100.00..673.20 rows=2560 width=8) (node_id=1) -> Materialize (cost=100.00..686.00 rows=2560 width=8) (node_id=2) -> Foreign Scan on t1_r1 (cost=100.00..673.20 rows=2560 width=8) (node_id=3) Can we add a hook at that point [1] which will allow an extension to modify the first line of a node? I think this is not just useful for my case, but also for other use-cases in which some high level node details could be placed. what do you think? 2/ I registered an options handler, and I wanted this options handler to validate that my new extension option is not used with the analyze option. So, the behavior is if the core explain option was first in the list, it worked, but if it was first in the list it does not. postgres=# explain (analyze, remote_plans) select from t_r1, t1_r1; ERROR: EXPLAIN options REMOTE_PLANS and ANALYZE cannot be used together postgres=# explain (remote_plans, analyze) select from t_r1, t1_r1; QUERY PLAN - Nested Loop (cost=200.00..147337.37 rows=11648569 width=0) (actual time=3.222..3.236 rows=0.00 loops=1) ... This is because the ApplyExtensionExplainOption is called inside the main loop that parses the options, so when ApplyExtensionExplainOption, the core option may or may not have been set yet. I also think this will break if there are multiple extension options that need to be validated together. One way I thought to fix this is to allow the user to register another handler for validation, which can then be called after the parse option loop, and after all the in-core options have been validated against each other. Right after this line. +/* if the summary was not set explicitly, set default value */ +es->summary = (summary_set) ? es->summary : es->analyze; What do you think? [0] https://www.postgresql.org/message-id/flat/CAP%2BB4TD%3Diy-C2EnsrJgjpwSc7_4pd3Xh-gFzA0bwsw3q8u860g%40mail.gmail.com [1] https://github.com/postgres/postgres/blob/master/src/backend/commands/explain.c#L2013 Thanks -- Sami Imseih Amazon Web Services (AWS)
Re: making EXPLAIN extensible
Sami Imseih writes: > 1/ As you can see form the output above, I used explain_per_node_hook > to append a "Plan Node ID" to the explain output. I really don't like having > it > there, and prefer that it gets added to the top line of the node. > i.e. >-> Foreign Scan on t_r1 (cost=100.00..673.20 rows=2560 width=8) > (node_id=1) >-> Materialize (cost=100.00..686.00 rows=2560 width=8) (node_id=2) > -> Foreign Scan on t1_r1 (cost=100.00..673.20 rows=2560 > width=8) (node_id=3) > Can we add a hook at that point [1] which will allow an extension to modify > the first line of a node? I think this is not just useful for my case, but > also > for other use-cases in which some high level node details could be placed. > what do you think? I think this is a seriously bad idea. The first line is already overloaded; we don't need several different extensions adding more stuff to it. Plus, this doesn't consider what to do in non-text output formats. ISTM you should be good with adding a new "Plan Node ID" property to each node. No, you don't get to put it first. Too bad. (If we did try to cater to that, what shall we do with multiple extensions that all think they should be first?) The validation point is an interesting one. I agree that we don't want the behavior to depend on the order in which options are written. regards, tom lane
Re: Non-text mode for pg_dumpall
On 2025-03-12 We 3:03 AM, jian he wrote: On Wed, Mar 12, 2025 at 1:06 AM Álvaro Herrera wrote: Hello, On 2025-Mar-11, Mahendra Singh Thalor wrote: In map.dat file, I tried to fix this issue by adding number of characters in dbname but as per code comments, as of now, we are not supporting \n\r in dbnames so i removed handling. I will do some more study to fix this issue. Yeah, I think this is saying that you should not consider the contents of map.dat as a shell string. After all, you're not going to _execute_ that file via the shell. Maybe for map.dat you need to escape such characters somehow, so that they don't appear as literal newlines/carriage returns. I am confused. currently pg_dumpall plain format will abort when encountering dbname containing newline. the left dumped plain file does not contain all the cluster databases data. if pg_dumpall non-text format aborts earlier, it's aligned with pg_dumpall plain format? it's also an improvement since aborts earlier, nothing will be dumped? am i missing something? I think we should fix that. But for the current proposal, Álvaro and I were talking this morning, and we thought the simplest thing here would be to have the one line format and escape NL/CRs in the database name. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: Optimization for lower(), upper(), casefold() functions.
On Wed, 2025-03-12 at 19:55 +0300, Alexander Borisov wrote: > 1. Added static for casemap() function. Otherwise the compiler could > not > optimize the code and the performance dropped significantly. Oops, it was static, but I made it external just to see what code it generated. I didn't intend to publish it as an external function -- thank you for catching that! > 2. Added a fast path for codepoint < 0x80. > > v3j-0002: > In the fast path for codepoints < 0x80, I added a premature return. > This avoided additional insertions, which increased performance. What do you mean "additional insertions"? Also, should we just compute the results in the fast path? We don't even need a table. Rough patch attached to go on top of v4-0001. Should we properly return CASEMAP_SELF when *simple == u1, or is it ok to return CASEMAP_SIMPLE? It probably doesn't matter performance-wise, but it feels more correct to return CASEMAP_SELF. > > Perhaps for general > beauty it should be made static inline, I don't have a rigid position > here. We ordinarily use "static inline" if it's in a header file, and "static" if it's in a .c file, so I'll do it that way. > I was purely based on existing approaches in Postgres, the > Normalization Forms have them separated into different headers. Just > trying to be consistent with existing approaches. I think that was done for normalization primarily because it's not used #ifndef FRONTEND (see unicode_norm.c), and perhaps also because it's just a more complex function worthy of its own file. I looked into the history, and commit 783f0cc64d explains why perfect hashing is not used in the frontend: "The decomposition table remains the same, getting used for the binary search in the frontend code, where we care more about the size of the libraries like libpq over performance..." > Regards, Jeff Davis From ed4d2803aa32add7c05726286b94e78e49bb1257 Mon Sep 17 00:00:00 2001 From: Jeff Davis Date: Wed, 12 Mar 2025 11:56:59 -0700 Subject: [PATCH vtmp] fastpath --- src/common/unicode_case.c | 34 -- 1 file changed, 32 insertions(+), 2 deletions(-) diff --git a/src/common/unicode_case.c b/src/common/unicode_case.c index 2b3b4cdc2e7..b1fc1651043 100644 --- a/src/common/unicode_case.c +++ b/src/common/unicode_case.c @@ -390,11 +390,41 @@ casemap(pg_wchar u1, CaseKind casekind, bool full, { const pg_case_map *map; + /* + * Fast path for early codepoints. The only interesting characters are + * [a-zA-Z]. + */ if (u1 < 0x80) { - *simple = case_map[u1].simplemap[casekind]; + /* fast-path codepoints do not have special casing */ + Assert(find_case_map(u1)->special_case == NULL); - return CASEMAP_SIMPLE; + switch (casekind) + { + case CaseLower: + case CaseFold: +if (u1 >= 'A' && u1 <= 'Z') +{ + *simple = u1 + 0x20; + Assert(case_map[u1].simplemap[casekind] == *simple); + return CASEMAP_SIMPLE; +} +break; + case CaseTitle: + case CaseUpper: +if (u1 >= 'a' && u1 <= 'z') +{ + *simple = u1 - 0x20; + Assert(case_map[u1].simplemap[casekind] == *simple); + return CASEMAP_SIMPLE; +} +break; + default: +Assert(false); + } + + Assert(case_map[u1].simplemap[casekind] == u1); + return CASEMAP_SELF; } map = find_case_map(u1); -- 2.34.1
Re: Changing the state of data checksums in a running cluster
Tomas Vondra writes: > On 3/11/25 14:07, Dagfinn Ilmari Mannsåker wrote: >> As the resident perl style pedant, I'd just like to complain about the >> below: >> >> Tomas Vondra writes: >> >>> diff --git a/src/test/perl/PostgreSQL/Test/Cluster.pm >>> b/src/test/perl/PostgreSQL/Test/Cluster.pm >>> index 666bd2a2d4c..1c66360c16c 100644 >>> --- a/src/test/perl/PostgreSQL/Test/Cluster.pm >>> +++ b/src/test/perl/PostgreSQL/Test/Cluster.pm >>> @@ -3761,7 +3761,8 @@ sub checksum_enable_offline >>> my ($self) = @_; >>> >>> print "### Enabling checksums in \"$self->data_dir\"\n"; >>> - PostgreSQL::Test::Utils::system_or_bail('pg_checksums', '-D', >>> $self->data_dir, '-e'); >>> + PostgreSQL::Test::Utils::system_or_bail('pg_checksums', '-D', >>> + $self->data_dir, '-e'); >>> return; >>> } >> >> >> This breaking between the command line options and its arguments is why >> we're switching to using fat commas. We're also using long options for >> improved self-documentation, so this should be written as: >> >> PostgreSQL::Test::Utils::system_or_bail('pg_checksums', >> '--pgdata' => $self->data_dir, >> '--enable'); >> >> And likewise below in the disable method. >> > > I don't know what fat comma is, but that's simply what perltidy did. I > don't mind formatting it differently, if there's a better way. Fat comma is the perlish name for the => arrow, which is semantically equivalent to a comma (except it auto-quotes any immediately preceding bareword), but looks fatter. Perltidy knows to not wrap lines around them, keeping the key and value (or option and argument in this case) together. See commit ce1b0f9da03 for a large (but not complete, I have more patches pending) conversion to this new style. - ilmari
Re: meson vs. llvm bitcode files
Hi, The v7 patch looks good to me, handling the bitcode modules in a uniform way and also avoiding the hacky code and warnings, much better now. A small note about the bitcode emission for generated sources in contrib, using cube as example, currently it creates two dict entries in a list: bc_seg_gen_sources = [{'srcfiles': [seg_scan]}] bc_seg_gen_sources += {'srcfiles': [seg_parse[0]]} Then pass it to the bitcode_modules: bitcode_modules += { ... 'gen_srcfiles': bc_seg_gen_sources, } It could be passed as a list with a single dict, since both generated sources share the same compilation flags: bitcode_modules += { ... 'gen_srcfiles': [ { 'srcfiles': [cube_scan, cube_parse[0]] }. ] } Both approaches work, the first one has the advantage of being able to pass separate additional_flags per generated source. Thanks for your reply Nazir, also waiting for more opinions on this. Regards, Diego On Wed, Mar 12, 2025 at 7:27 AM Nazir Bilal Yavuz wrote: > Hi, > > On Tue, 11 Mar 2025 at 01:04, Diego Fronza > wrote: > > I did a full review on the provided patches plus some tests, I was able > to validate that the loading of bitcode modules is working also JIT works > for both backend and contrib modules. > > Thank you! > > > To test JIT on contrib modules I just lowered the costs for all jit > settings and used the intarray extension, using the data/test__int.data: > > CREATE EXTENSION intarray; > > CREATE TABLE test__int( a int[] );1 > > \copy test__int from 'data/test__int.data' > > > > For queries any from line 98+ on contrib/intarray/sql/_int.sql will work. > > > > Then I added extra debug messages to llvmjit_inline.cpp on > add_module_to_inline_search_path() function, also on > llvm_build_inline_plan(), I was able to see many functions in this module > being successfully inlined. > > > > I'm attaching a new patch based on your original work which add further > support for generating bitcode from: > > Thanks for doing that! > > > - Generated backend sources: processed by flex, bison, etc. > > - Generated contrib module sources, > > I think we do not need to separate these two. > >foreach srcfile : bitcode_module['srcfiles'] > -if meson.version().version_compare('>=0.59') > +srcfilename = '@0@'.format(srcfile) > +if srcfilename.startswith(' + srcfilename = srcfile.full_path().split(meson.build_root() + '/')[1] > +elif meson.version().version_compare('>=0.59') > > Also, checking if the string starts with ' hacky, and 'srcfilename = '@0@'.format(srcfile)' causes a deprecation > warning. So, instead of this we can process all generated sources like > how generated backend sources are processed. I updated the patch with > that. > > > On this patch I just included fmgrtab.c and src/backend/parser for the > backend generated code. > > For contrib generated sources I added contrib/cube as an example. > > I applied your contrib/cube example and did the same thing for the > contrib/seg. > > > All relevant details about the changes are included in the patch itself. > > > > As you may know already I also created a PR focused on llvm bitcode > emission on meson, it generates bitcode for all backend and contribution > modules, currently under review by some colleagues at Percona: > https://github.com/percona/postgres/pull/103 > > I'm curious if we should get all or some of the generated backend > sources compiled to bitcode, similar to contrib modules. > > I think we can do this. I added other backend sources like you did in > the PR but attached it as another patch (0007) because I wanted to > hear other people's opinions on that first. > > v3 is attached. > > -- > Regards, > Nazir Bilal Yavuz > Microsoft >
Re: dblink query interruptibility
On 3/12/25 12:48 AM, Noah Misch wrote: The CREATE DATABASE hang is indeed new in v15. The general dblink missed interrupt processing (e.g. pg_cancel_backend response delay) is an old bug. Aha, that was what you were referring to! My apologies, was reading your mail a bit too quickly. :) Commit d3c5f37 used the new functions for postgres_fdw, not just dblink. That caused message changes detailed in postgr.es/m/CAHGQGwGpDTXeg8K1oTmDv9nankaKTrCD-XW-tnkzo6%3DE9p5%3Duw%40mail.gmail.com so I'm inclined to omit postgres_fdw changes in back branches. postgres_fdw was already interruptible, so the point of making postgres_fdw adopt the functions was to reduce code duplication. Makes sense. Overall, in the absence of objections, I will queue a task to back-patch the non-postgres_fdw portion of commit d3c5f37 to v13-v16. Thanks! Andreas
Re: Parallel heap vacuum
On Wed, Mar 12, 2025 at 11:24 AM Dilip Kumar wrote: > > On Wed, Mar 12, 2025 at 3:17 AM Masahiko Sawada wrote: > > > > On Tue, Mar 11, 2025 at 5:51 AM Amit Kapila wrote: > > > > > Some thoughts/questions on the idea > > I notice that we are always considering block-level parallelism for > heaps and object-level parallelism for indexes. I'm wondering, when > multiple tables are being vacuumed together—either because the user > has provided a list of tables or has specified a partitioned table > with multiple children—does it still make sense to default to > block-level parallelism? Or could we consider table-level parallelism > in such cases? For example, if there are 4 tables and 6 workers, with > 2 tables being small and the other 2 being large, perhaps we could > allocate 4 workers to vacuum all 4 tables in parallel. For the larger > tables, we could apply block-level parallelism, using more workers for > internal parallelism. On the other hand, if all tables are small, we > could just apply table-level parallelism without needing block-level > parallelism at all. This approach could offer more flexibility, isn't > it? > I have not thought from this angle, but it seems we can build this even on top of block-level vacuum for large tables. -- With Regards, Amit Kapila.
Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints
On 2025-Mar-12, Ashutosh Bapat wrote: > The 002_pg_upgrade test passes with and without my patches now. But > then the tests added here do not leave behind any parent-child table. > Previously we have found problems in dumping and restoring constraints > in an inheritance hierarchy. I think the test should leave behind all > the combinations of parent and child NOT NULL constraints so that > 0002_pg_upgrade can test those. I agree. > Is it expected that a child may have VALID constraint but parent has > not valid constraint? Sure. Consider: if the parent has an unvalidated constraint, we cannot make any assertions about the state of its children. The children may have additional constraints of their own -- in this case, a child can have a validated constraint even though the parent has none, or only an unvalidatec constraint. But the opposite is not allowed: if you know something to be true about a parent table (to wit: that no row in it is NULL), then this must necessarily apply to its children as well. Therefore, if there's a valid constraint in the parent, then all children must have the same constraint, and all such constraints must be known valid. > Same case with partitioned table. We should leave partitioned table > hierarchy behind for 002_pg_upgrade to test. And we need tests to test > scenarios where a partitioned table has valid constraint but we try to > change constraint on a partition to not valid and vice versa. I think > we shouldn't allow such assymetry in partitioned table hierarchy and > having a test would be better. Agreed. -- Álvaro HerreraBreisgau, Deutschland — https://www.EnterpriseDB.com/
Re: [PoC] Reducing planning time when tables have many partitions
The following review has been posted through the commitfest application: make installcheck-world: tested, failed Implements feature: tested, failed Spec compliant: tested, failed Documentation:tested, failed Hi Yuya, Tested this patch and noted that this patch significantly improves query planning time, especially as the number of partitions increases. While the impact is minimal for small partition counts (2–8), the improvement becomes substantial from 16 partitions onward, reaching up to ~86.6% reduction at 768 partitions. Larger partitions (512–1024) see a dramatic speedup, cutting planning time by over 2.7 seconds. The results confirm that the patch optimizes partitioned query execution efficiently. This enhancement is crucial for databases handling large partitioned tables, leading to better performance and scalability. Regards, NewtGlobal PostgreSQL contributors
Re: Test mail for pgsql-hackers
On Wednesday, March 12, 2025, Ashutosh Bapat wrote: > Hi Blessy, > > On Tue, Mar 11, 2025 at 6:03 PM BharatDB wrote: > >> > >> Hi , > > > > > >> > >> I’ve been exploring logical replication and noticed that if the column > datatypes don’t match between the publisher and subscriber, PostgreSQL > doesn’t give a warning. This can cause unexpected behavior, and I thought > it might be helpful to alert users when this happens. > > > > > >> > >> ### **What This Patch Does:** > > > > > >> > >> - Adds a warning when a column's datatype in the subscriber doesn’t > match the publisher. > >> > >> - Helps users catch issues early instead of running into silent errors > later. > > > > > >> > >> Why I Think It’s Useful:- Avoids confusion when replication doesn’t > work as expected. - Makes debugging easier by pointing out potential > problems. I’d love to get feedback on whether this is a good idea and if > I’ve approached it correctly. Since I’m still learning, any suggestions for > improvement would be really helpful. I’ve attached the patch—please let me > know what you think! > > Large part of your patch is renaming files which are not related to > this patch. Can you please clean that up. > If there is really a patch submission happening here I suggest you start a new thread with a meaningful subject line and proper recipients (one list, not -owners) David J.
Re: Document NULL
Em ter., 11 de mar. de 2025 às 17:43, Tom Lane escreveu: > I think that idea (changing all the docs) is a complete nonstarter > because people would not understand why the results they get don't > look like what it says in the docs. Of course, we could fix that > by changing the factory default for \pset null, but that seems like > even more of a nonstarter. > And if we use a tag to display nulls, like NULL Then all null values would be the same, and the user would understand what it means, because it's painted differently. And then would be an unique way of appearance, on this page and all others regards Marcos
Re: per backend WAL statistics
Michael Paquier writes: > And I guess that we're OK here, so applied. That should be the last > one. Quite a few buildfarm members are not happy about the initialization that 9a8dd2c5a added to PendingBackendStats. For instance [1]: gcc -std=gnu99 -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Werror=vla -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -g -O2 -I. -I. -I../../../../src/include -D_GNU_SOURCE -I/usr/include/libxml2 -c -o pgstat_backend.o pgstat_backend.c pgstat_backend.c:39:1: warning: missing braces around initializer [-Wmissing-braces] static PgStat_BackendPending PendingBackendStats = {0}; ^ pgstat_backend.c:39:1: warning: (near initialization for \342\200\230PendingBackendStats.pending_io\342\200\231) [-Wmissing-braces] I guess that more than one level of braces is needed for this to be fully correct? Similar from ayu, batfish, boa, buri, demoiselle, dhole, rhinoceros, shelduck, siskin. regards, tom lane [1] https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=arowana&dt=2025-03-11%2004%3A59%3A16&stg=build
Re: RFC: Additional Directory for Extensions
On Tue, Mar 11, 2025 at 12:59 PM Peter Eisentraut wrote: > Yes, that structure looks ok. But you can remove one level of block in > get_extension_control_directories(). > Sorry, missed during debugging. Fixed > I found a bug that was already present in my earlier patch versions: > > @@ -423,7 +424,7 @@ find_extension_control_filename(ExtensionControlFile > *control) > ecp = Extension_control_path; > if (strlen(ecp) == 0) > ecp = "$system"; > - result = find_in_path(basename, Extension_control_path, > "extension_control_path", "$system", system_dir); > + result = find_in_path(basename, ecp, "extension_control_path", > "$system", system_dir); > > Without this, it won't work if you set extension_control_path empty. > (Maybe add a test for that?) > Fixed, and also added a new test case for this. > I think this all works now, but I think the way > pg_available_extensions() works is a bit strange and inefficient. After > it finds a candidate control file, it calls > read_extension_control_file() with the extension name, that calls > parse_extension_control_file(), that calls > find_extension_control_filename(), and that calls find_in_path(), which > searches that path again! > > There should be a simpler way into this. Maybe > pg_available_extensions() should fill out the ExtensionControlFile > structure itself, set ->control_dir with where it found it, then call > directly to parse_extension_control_file(), and that should skip the > finding if the directory is already set. Or something similar. > Good catch. I fixed this by creating a new function to construct the ExtensionControlFile and changed the pg_available_extensions to set the control_dir. The read_extension_control_file was also changed to just call this new function constructor. I implemented the logic to check if the control_dir is already set on parse_extension_control_file because it seems to me that make more sense to not call find_extension_control_filename instead of putting this logic there since we already set the control_dir when we find the control file, and having the logic to set the control_dir or skip the find_in_path seems more confusing on this function instead of on parse_extension_control_file. Please let me know what you think. -- Matheus Alcantara v7-0001-extension_control_path.patch Description: Binary data
Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints
On Wed, Mar 12, 2025 at 3:52 PM Rushabh Lathia wrote: > > > Hi Alvaro, > > Here are the latest patches, which includes the regression fix. > The 002_pg_upgrade test passes with and without my patches now. But then the tests added here do not leave behind any parent-child table. Previously we have found problems in dumping and restoring constraints in an inheritance hierarchy. I think the test should leave behind all the combinations of parent and child NOT NULL constraints so that 0002_pg_upgrade can test those. We should add more scenarios for constraint inheritance. E.g. #CREATE TABLE notnull_tbl1 (a int); #ALTER TABLE notnull_tbl1 ADD CONSTRAINT nn_parent not null a not valid; #CREATE TABLE notnull_chld (a int); #ALTER TABLE notnull_chld ADD CONSTRAINT nn_child not null a; #ALTER TABLE notnull_chld INHERIT notnull_tbl1; #SELECT conname, convalidated FROM pg_catalog.pg_constraint WHERE conrelid in ('notnull_tbl1'::regclass, 'notnull_chld'::regclass); conname | convalidated ---+-- nn_parent | f nn_child | t (2 rows) Is it expected that a child may have VALID constraint but parent has not valid constraint? Same case with partitioned table. We should leave partitioned table hierarchy behind for 002_pg_upgrade to test. And we need tests to test scenarios where a partitioned table has valid constraint but we try to change constraint on a partition to not valid and vice versa. I think we shouldn't allow such assymetry in partitioned table hierarchy and having a test would be better. -- Best Wishes, Ashutosh Bapat
RE: Selectively invalidate caches in pgoutput module
Dear Amit, > Thanks, the patch looks mostly good to me. I have made few cosmetic > changes in the attached and combined both the patches. Thanks, it looks good to me. > The existing > Alter Publication ... Rename tests don't test invalidations arising > from that command. As this patch changes that code path, it would be > good to add a few tests for the same. We can add one for individual > relations and another for ALL Tables publication. I created new patch which adds a test code. I added in 007_ddl.pl, but I feel it is OK to introduce new test file. How do you think? Best regards, Hayato Kuroda FUJITSU LIMITED v12-0001-Avoid-invalidating-all-RelationSyncCache-entries.patch Description: v12-0001-Avoid-invalidating-all-RelationSyncCache-entries.patch v12-0002-Add-testcase.patch Description: v12-0002-Add-testcase.patch
Re: Question about duplicate JSONTYPE_JSON check
On Wed, Mar 12, 2025 at 7:09 PM Álvaro Herrera wrote: > On 2025-Mar-12, Amit Langote wrote: > > > I was able to construct a test case that crashes due to this bug: > > > > CREATE TYPE mood AS ENUM ('happy', 'sad', 'neutral'); > > CREATE FUNCTION mood_to_json(mood) RETURNS json AS $$ > > SELECT to_json($1::text); > > $$ LANGUAGE sql IMMUTABLE; > > CREATE CAST (mood AS json) WITH FUNCTION mood_to_json(mood) AS IMPLICIT; > > > > SELECT JSON_OBJECT('happy'::mood: '123'::jsonb); > > server closed the connection unexpectedly > > Good reaction time :-) I see that that line shows as not even uncovered > in the report, but as non-existant (no background color as opposed to > red): > > https://coverage.postgresql.org/src/backend/utils/adt/jsonb.c.gcov.html#660 > > Evidently the compiler must be optimizing it out as dead code. Ah, I did wonder about the coverage, thanks for pointing it out. Patch look good for committing? -- Thanks, Amit Langote
Re: Change log level for notifying hot standby is waiting non-overflowed snapshot
Hi, After an off-list discussion with Fujii-san, I'm now trying to modify the following message that is output when a client attempts to connect instead of changing the log level as the original proposal: $ psql: error: connection to server at "localhost" (::1), port 5433 failed: FATAL: the database system is not yet accepting connections DETAIL: Consistent recovery state has not been yet reached. I have now 2 candidates to do this. The 1st one(v1-0001-Change-log-message-when-hot-standby-is-not-access.patch) is a simple update to the existing log messages, explicitly mentioning that snapshot overflow could be a possible cause. The 2nd(v1-0001-Make-it-clear-when-hot-standby-is-inaccessible-du.patch) one introduces new states for pmState and CAC_state (which manages whether connections can be accepted) to represent waiting for a non-overflowed snapshot. The advantage of the 2nd one is that it makes it clear whether the connection failure is due to not reaching a consistent recovery state or a snapshot overflow. However, I haven't found other significant benefits, and I feel it might be overkill. Personally, I feel 1st patch may be sufficient, but I would appreciate any feedback. -- Atsushi Torikoshi Seconded from NTT DATA GROUP CORPORATION to SRA OSS K.K.From 38a9ec23af2dc43ad24d939bb015d28d550d71fd Mon Sep 17 00:00:00 2001 From: Atsushi Torikoshi Date: Wed, 12 Mar 2025 21:47:22 +0900 Subject: [PATCH v1] Make it clear when hot standby is inaccessible due to subtransaction overflow Previously, the log message only assumed that the recovery process had not yet reached a consistent point. However, even after reaching the consistent point, if there is a transaction with an overflowed subtransaction, hot standby becomes inaccessible. Since there was no log message indicating this reason, it was difficult to identify the cause. This patch explicitly handles such cases, making the cause clearer in the logs. --- src/backend/postmaster/postmaster.c | 29 ++--- src/backend/storage/ipc/procarray.c | 17 + src/backend/tcop/backend_startup.c | 13 + src/include/storage/pmsignal.h | 2 ++ src/include/tcop/backend_startup.h | 1 + 5 files changed, 55 insertions(+), 7 deletions(-) diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c index d2a7a7add6..5c3de3f97d 100644 --- a/src/backend/postmaster/postmaster.c +++ b/src/backend/postmaster/postmaster.c @@ -333,6 +333,8 @@ typedef enum PM_INIT, /* postmaster starting */ PM_STARTUP, /* waiting for startup subprocess */ PM_RECOVERY,/* in archive recovery mode */ + PM_SNAPSHOT_PENDING, /* in snapshot pending because of an + * overflowed subtransaction */ PM_HOT_STANDBY,/* in hot standby mode */ PM_RUN, /* normal "database is alive" state */ PM_STOP_BACKENDS, /* need to stop remaining backends */ @@ -1814,6 +1816,9 @@ canAcceptConnections(BackendType backend_type) else if (!FatalError && pmState == PM_RECOVERY) return CAC_NOTCONSISTENT; /* not yet at consistent recovery * state */ + else if (!FatalError && pmState == PM_SNAPSHOT_PENDING) + return CAC_SNAPSHOT_PENDING; /* waiting for non-overflowed + * snapshot */ else return CAC_RECOVERY; /* else must be crash recovery */ } @@ -2111,7 +2116,7 @@ process_pm_shutdown_request(void) */ if (pmState == PM_RUN || pmState == PM_HOT_STANDBY) connsAllowed = false; - else if (pmState == PM_STARTUP || pmState == PM_RECOVERY) + else if (pmState == PM_STARTUP || pmState == PM_RECOVERY || pmState == PM_SNAPSHOT_PENDING) { /* There should be no clients, so proceed to stop children */ UpdatePMState(PM_STOP_BACKENDS); @@ -2145,7 +2150,7 @@ process_pm_shutdown_request(void) sd_notify(0, "STOPPING=1"); #endif - if (pmState == PM_STARTUP || pmState == PM_RECOVERY) + if (pmState == PM_STARTUP || pmState == PM_RECOVERY || pmState == PM_SNAPSHOT_PENDING) { /* Just shut down background processes silently */ UpdatePMState(PM_STOP_BACKENDS); @@ -2711,6 +2716,7 @@ HandleFatalError(QuitSignalReason reason, bool consider_sigabrt) /* wait for children to die */ case PM_RECOVERY: + case PM_SNAPSHOT_PENDING: case PM_HOT_STANDBY: case PM_RUN: case PM_STOP_BACKENDS: @@ -3193,6 +3199,7 @@ pmstate_name(PMState state) PM_TOSTR_CASE(PM_INIT); PM_TOSTR_CASE(PM_STARTUP); PM_TOSTR_CASE(PM_RECOVERY); + PM_TOSTR_CASE(PM_SNAPSHOT_PENDING); PM_TOSTR_CASE(PM_HOT_STANDBY); PM_TOSTR_CASE(PM_RUN); PM_TOSTR_CASE(PM_STOP_BACKENDS); @@ -3245,7 +3252,7 @@ LaunchMissingBackgroundProcesses(void) * the shutdown checkpoint. That's done in PostmasterStateMachine(), not * here.) */ - if (pmState == PM_RUN || pmState == PM_RECOVERY || + if (pmState == PM_RUN || pmState == PM_RECOVERY || pmState == PM_SNAPSHOT_PENDING || pmState == PM_HOT_STANDBY || pmState ==
Re: Logical Replication of sequences
On Wed, 12 Mar 2025 at 09:14, vignesh C wrote: > > The patch was not applying on top of HEAD because of recent commits, > here is a rebased version. I have moved this to the next CommitFest since it will not be committed in the current release. This also allows reviewers to focus on the remaining patches in the current CommitFest. Regards, Vignesh
Re: Bypassing cursors in postgres_fdw to enable parallel plans
Rafia Sabih writes: Hi, > > At present, in postgres_fdw, if a query which is using a parallel plan is > fired from the remote end fails to use the > parallel plan locally because of the presence of CURSORS. Consider the > following example, ... > > Now, to overcome this limitation, I have worked on this idea (suggested by my > colleague Bernd Helmle) of bypassing the > cursors. Do you know why we can't use parallel plan when cursor is used? Is It related to this code in ExecutePlan? /* * Set up parallel mode if appropriate. * * Parallel mode only supports complete execution of a plan. If we've * already partially executed it, or if the caller asks us to exit early, * we must force the plan to run without parallelism. */ if (queryDesc->already_executed || numberTuples != 0) use_parallel_mode = false; Actually I can't understand the comment as well and I had this confusion for a long time. -- Best Regards Andy Fan
Re: Document NULL
On Wednesday, March 12, 2025, Marcos Pegoraro wrote: > Em ter., 11 de mar. de 2025 às 17:43, Tom Lane > escreveu: > >> I think that idea (changing all the docs) is a complete nonstarter >> because people would not understand why the results they get don't >> look like what it says in the docs. Of course, we could fix that >> by changing the factory default for \pset null, but that seems like >> even more of a nonstarter. >> > > And if we use a tag to display nulls, like NULL > Then all null values would be the same, and the user would understand > what it means, because it's painted differently. > And then would be an unique way of appearance, on this page and all others > I’m not accepting this line of work as part of this patch. Please limit this to the merits of choosing \N as the particular value for \pset null on this page. You can start a new thread regarding a policy change for marking up null values in the whole of the documentation. But as Tom said, I don’t see that going anywhere. David J.
Re: Support NOT VALID / VALIDATE constraint options for named NOT NULL constraints
Hi Alvaro, Here are the latest patches, which includes the regression fix. Thanks, On Wed, Mar 12, 2025 at 3:38 PM Ashutosh Bapat wrote: > On Wed, Mar 12, 2025 at 3:20 PM Alvaro Herrera > wrote: > > > > On 2025-Mar-12, Ashutosh Bapat wrote: > > > > > If the test passes for you, can you please try the patches at [1] on > > > top of your patches? Please apply those, set and export environment > > > variable PG_TEST_EXTRA=regress_dump_test, and run 002_pg_upgrade test? > > > I intended to do this but can not do it since the test always fails > > > with your patches applied. > > > > Oh, I need to enable a PG_TEST_EXTRA option in order for the test to > > run? FFS. That explains why the tests passed just fine for me. > > I'll re-run. > > You need PG_TEST_EXTRA to run the testcase I am adding there. Rest of > the testcases run without PG_TEST_EXTRA. > > -- > Best Wishes, > Ashutosh Bapat > -- Rushabh Lathia 0001-Convert-pg_attribut.attnotnull-to-char-type.patch Description: Binary data 0002-Support-NOT-VALID-and-VALIDATE-CONSTRAINT-for-named-.patch Description: Binary data 0003-Documentation-changes.patch Description: Binary data
Re: Parallel safety docs for CTEs
On Tue, Nov 19, 2024 at 2:16 PM James Coleman wrote: > > Hello, > > A colleague noticed today that the docs still say that "Scans of > common table expressions (CTEs)" are "always parallel restricted". > > While I think that strictly remains true at the implementation level, > from a user's perspective I think that's not been true since the > change to default to trying to inline CTEs rather than defaulting to > materializing them. > > Attached is a patch to slightly modify the language; would be happy to > hear suggestions on a better way to improve this. > > Regards, > James Coleman I'd forgotten to make a commit fest record for this and stumbled upon it today. See https://commitfest.postgresql.org/patch/5650/ Regards, James Coleman
Re: Statistics import and export: difference in statistics of materialized view dumped
On Wed, Mar 12, 2025 at 4:08 AM Jeff Davis wrote: > > On Tue, 2025-03-11 at 11:26 -0400, Tom Lane wrote: > > Right, that was what I was thinking, but hadn't had time to look in > > detail. The postDataBound dependency isn't real helpful here, we > > could lose that if we had the data dependency. > > Attached a patch. > > It's a bit messier than I expected, so I'm open to other suggestions. > The reason is because materialized view data is also pushed to > RESTORE_PASS_POST_ACL, so we need to do the same for the statistics > (otherwise the dependency is just ignored). I ran my test with this patch (we have to remove 0003 patch in my test which uses --no-statistics option). It failed with following differences @@ -452068,8 +452068,8 @@ SELECT * FROM pg_catalog.pg_restore_relation_stats( 'version', '18'::integer, 'relation', 'public.mvtest_aa'::regclass, - 'relpages', '2'::integer, - 'reltuples', '1'::real, + 'relpages', '0'::integer, + 'reltuples', '-1'::real, 'relallvisible', '0'::integer ); -- @@ -452097,8 +452097,8 @@ SELECT * FROM pg_catalog.pg_restore_relation_stats( 'version', '18'::integer, 'relation', 'public.mvtest_tm_type'::regclass, - 'relpages', '2'::integer, - 'reltuples', '3'::real, + 'relpages', '0'::integer, + 'reltuples', '-1'::real, 'relallvisible', '0'::integer ); -- @@ -452111,8 +452111,8 @@ SELECT * FROM pg_catalog.pg_restore_relation_stats( 'version', '18'::integer, 'relation', 'public.mvtest_tvmm_expr'::regclass, - 'relpages', '2'::integer, - 'reltuples', '1'::real, + 'relpages', '0'::integer, + 'reltuples', '-1'::real, 'relallvisible', '0'::integer ); --=== stderr === === EOF === The previous differences have disappeared but new differences have appeared. -- Best Wishes, Ashutosh Bapat
Re: AIO v2.5
Hi, On 2025-03-11 20:57:43 -0700, Noah Misch wrote: > > I think we'll really need to do something about this for BSD users > > regardless > > of AIO. Or maybe those OSs should fix something, but somehow I am not having > > high hopes for an OS that claims to have POSIX confirming unnamed semaphores > > due to having a syscall that always returns EPERM... [1]. > > I won't mind a project making things better for non-root BSD users. I do > think such a project should not block other projects making things better for > everything else (like $SUBJECT). Oh, I strongly agree. The main reason I would like it to be addressed that I'm pretty tired of having to think about open/netbsd whenever we update some default setting. > > > On Mon, Mar 10, 2025 at 02:23:12PM -0400, Andres Freund wrote: > > > > Attached is v2.6 of the AIO patchset. > > > > > > > - 0005, 0006 - io_uring support - close, but we need to do something > > > > about > > > > set_max_fds(), which errors out spuriously in some cases > > > > > > What do we know about those cases? I don't see a set_max_fds(); is that > > > set_max_safe_fds(), or something else? > > > > Sorry, yes, set_max_safe_fds(). The problem basically is that with io_uring > > we > > will have a large number of FDs already allocated by the time > > set_max_safe_fds() is called. set_max_safe_fds() subtracts already_open from > > max_files_per_process allowing few, and even negative, IOs. > > > > I think we should redefine max_files_per_process to be about the number of > > files each *backend* will additionally open. Jelte was working on related > > patches, see [2] > > Got it. max_files_per_process is a quaint setting, documented as follows (I > needed the reminder): > > If the kernel is enforcing > a safe per-process limit, you don't need to worry about this setting. > But on some platforms (notably, most BSD systems), the kernel will > allow individual processes to open many more files than the system > can actually support if many processes all try to open > that many files. If you find yourself seeing Too many open > files failures, try reducing this setting. > > I could live with > v6-0003-Reflect-the-value-of-max_safe_fds-in-max_files_pe.patch but would lean > against it since it feels unduly novel to have a setting where we use the > postgresql.conf value to calculate a value that becomes the new SHOW-value of > the same setting. I think we may update some other GUCs, but not sure. > Options I'd consider before that: > - Like you say, "redefine max_files_per_process to be about the number of > files each *backend* will additionally open". It will become normal that > each backend's actual FD list length is max_files_per_process + MaxBackends > if io_method=io_uring. Outcome is not unlike > v6-0002-Bump-postmaster-soft-open-file-limit-RLIMIT_NOFIL.patch + > v6-0003-Reflect-the-value-of-max_safe_fds-in-max_files_pe.patch but we don't > mutate max_files_per_process. Benchmark results should not change beyond > the inter-major-version noise level unless one sets io_method=io_uring. I'm > feeling best about this one, but I've not been thinking about it long. Yea, I think that's something probably worth doing separately from Jelte's patch. I do think that it'd be rather helpful to have jelte's patch to increase NOFILE in addition though. > > > > +static void > > > > +maybe_adjust_io_workers(void) > > > > > > This also restarts workers that exit, so perhaps name it > > > start_io_workers_if_missing(). > > > > But it also stops IO workers if necessary? > > Good point. Maybe just add a comment like "start or stop IO workers to close > the gap between the running count and the configured count intent". It's now /* * Start or stop IO workers, to close the gap between the number of running * workers and the number of configured workers. Used to respond to change of * the io_workers GUC (by increasing and decreasing the number of workers), as * well as workers terminating in response to errors (by starting * "replacement" workers). */ > > > > +{ > > > ... > > > > + /* Try to launch one. */ > > > > + child = StartChildProcess(B_IO_WORKER); > > > > + if (child != NULL) > > > > + { > > > > + io_worker_children[id] = child; > > > > + ++io_worker_count; > > > > + } > > > > + else > > > > + break; /* XXX try > > > > again soon? */ > > > > > > Can LaunchMissingBackgroundProcesses() become the sole caller of this > > > function, replacing the current mix of callers? That would be more > > > conducive > > > to promptly doing the right thing after launch failure. > > > > I'm not sure that'd be a good idea - right now IO workers are started before > > the startup process, as the startup process might need to pe
001_rep_changes.pl succeeds after a long time
Hello, I noticed that the test file 001_repo_changes.pl finished successfully after having taken 180s to run. This seems pretty suspicious -- normally that step takes around one second. The problem is seen in this step: [19:44:49.572](0.262s) ok 24 - update works with dropped subscriber column ### Stopping node "publisher" using mode fast # Running: pg_ctl -D /home/alvherre/Code/pgsql-build/master/src/test/subscription/tmp_check/t_001_rep_changes_publisher_data/pgdata -m fast stop waiting for server to shut down done server stopped # No postmaster PID for node "publisher" ### Starting node "publisher" # Running: pg_ctl -w -D /home/alvherre/Code/pgsql-build/master/src/test/subscription/tmp_check/t_001_rep_changes_publisher_data/pgdata -l /home/alvherre/Code/pgsql-build/master/src/test/subscription/tmp_check/log/001_rep_changes_publisher.log -o --cluster-name=publisher start waiting for server to start done server started # Postmaster PID for node "publisher" is 3502228 Waiting for replication conn tap_sub's replay_lsn to pass 0/17DEBD8 on publisher done [19:47:50.689](181.116s) ok 25 - check replicated inserts after subscription publication change Note that test 25 succeeded but took 3 minutes. If I look at the publisher logs, I see this: 2025-03-11 19:44:49.885 CET [3501892] 001_rep_changes.pl LOG: statement: DELETE FROM tab_rep 2025-03-11 19:44:50.052 CET [3494694] LOG: received fast shutdown request 2025-03-11 19:44:50.052 CET [3494694] LOG: aborting any active transactions 2025-03-11 19:44:50.053 CET [3494694] LOG: background worker "logical replication launcher" (PID 3494772) exited with exit code 1 2025-03-11 19:44:50.053 CET [3494755] LOG: shutting down 2025-03-11 19:44:50.480 CET [3501776] tap_sub LOG: released logical replication slot "tap_sub" 2025-03-11 19:44:50.484 CET [3494755] LOG: checkpoint starting: shutdown immediate 2025-03-11 19:44:50.485 CET [3494755] LOG: checkpoint complete: wrote 9 buffers (7.0%), wrote 3 SLRU buffers; 0 WAL file(s) added, 0 removed, 0 recycled; write=0.001 s, sync=0.001 s, total=0.002 s; sync files=0, longest=0.000 s, average=0.000 s; distance=579 kB, estimate=579 kB; lsn=0/17DEB60, redo lsn=0/17DEB60 2025-03-11 19:44:50.488 CET [3494694] LOG: database system is shut down 2025-03-11 19:44:50.884 CET [3502228] LOG: starting PostgreSQL 18devel on x86_64-pc-linux-gnu, compiled by gcc (Debian 12.2.0-14) 12.2.0, 64-bit 2025-03-11 19:44:50.884 CET [3502228] LOG: listening on Unix socket "/run/user/1000/alvherre-tmp/i4fQURzady/.s.PGSQL.26529" 2025-03-11 19:44:50.886 CET [3502349] LOG: database system was shut down at 2025-03-11 19:44:50 CET 2025-03-11 19:44:50.889 CET [3502228] LOG: database system is ready to accept connections 2025-03-11 19:44:50.930 CET [3502395] 001_rep_changes.pl LOG: statement: SELECT pg_is_in_recovery() 2025-03-11 19:44:50.945 CET [3502428] 001_rep_changes.pl LOG: statement: SELECT pg_current_wal_lsn() 2025-03-11 19:44:50.960 CET [3502440] 001_rep_changes.pl LOG: statement: SELECT '0/17DEBD8' <= replay_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name IN ('tap_sub', 'walreceiver') The last query is repeated a number of times for 3 minutes, and finally we get this 2025-03-11 19:47:50.569 CET [3540762] 001_rep_changes.pl LOG: statement: SELECT '0/17DEBD8' <= replay_lsn AND state = 'streaming' FROM pg_catalog.pg_stat_replication WHERE application_name IN ('tap_sub', 'walreceiver') 2025-03-11 19:47:50.596 CET [3540764] tap_sub LOG: statement: SELECT pg_catalog.set_config('search_path', '', false); 2025-03-11 19:47:50.597 CET [3540764] tap_sub LOG: received replication command: IDENTIFY_SYSTEM 2025-03-11 19:47:50.597 CET [3540764] tap_sub STATEMENT: IDENTIFY_SYSTEM Meanwhile, the subscriber has this: 2025-03-11 19:44:49.777 CET [3501647] LOG: logical replication worker for subscription "tap_sub" will restart because of a parameter change 2025-03-11 19:44:49.785 CET [3501757] LOG: logical replication apply worker for subscription "tap_sub" has started 2025-03-11 19:44:50.481 CET [3501757] LOG: data stream from publisher has ended 2025-03-11 19:44:50.481 CET [3501757] ERROR: could not send end-of-streaming message to primary: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. no COPY in progress 2025-03-11 19:44:50.484 CET [3495251] LOG: background worker "logical replication apply worker" (PID 3501757) exited with exit code 1 2025-03-11 19:44:50.490 CET [3502088] LOG: logical replication apply worker for subscription "tap_sub" has started 2025-03-11 19:44:50.490 CET [3502088] ERROR: apply worker for subscription "tap_sub" could not connect to the publisher: connection to server on socket "/run/user/1000/alvherre-tmp/i4fQURzady/.s.PGSQL.26529" failed: No such file or directory I
Rename functions to alloc/free things in reorderbuffer.c
I noticed some weird naming conventions in reorderbuffer.c which are leftovers from a long time ago when reorderbuffer.c maintained its own small memory pools to reduce palloc/pfree overhead. For example: extern Oid *ReorderBufferGetRelids(ReorderBuffer *rb, int nrelids); extern void ReorderBufferReturnRelids(ReorderBuffer *rb, Oid *relids); ReorderBufferGetRelids allocates an array with MemoryContextAlloc, and ReorderBufferReturnRelids just calls pfree. The pools are long gone, and now the naming looks weird. Attached patch renames those functions and other such functions to use the terms Alloc/Free. I actually wonder if we should go further and remove these functions altogether, and change the callers to call MemoryContextAlloc directly. But I didn't do that yet. Any objections? -- Heikki Linnakangas Neon (https://neon.tech) From 23b2bfbbbcccb79f586f5b719344fcf1e09325b0 Mon Sep 17 00:00:00 2001 From: Heikki Linnakangas Date: Wed, 12 Mar 2025 19:47:18 +0200 Subject: [PATCH 1/1] Rename alloc/free functions in reorderbuffer.c There used to be bespoken pools for these structs to reduce the palloc/pfree overhead, but that was ripped out a long time ago and replaced by the generic, cheaper generational memory allocator (commit a4ccc1cef5). The Get/Return terminology made sense with the pools, as you "got" an object from the pool and "returned" it later, but now it just looks weird. Rename to Alloc/Free. --- src/backend/replication/logical/decode.c | 26 +++--- .../replication/logical/reorderbuffer.c | 89 +-- src/include/replication/reorderbuffer.h | 15 ++-- 3 files changed, 64 insertions(+), 66 deletions(-) diff --git a/src/backend/replication/logical/decode.c b/src/backend/replication/logical/decode.c index 24d88f368d8..78f9a0a11c4 100644 --- a/src/backend/replication/logical/decode.c +++ b/src/backend/replication/logical/decode.c @@ -915,7 +915,7 @@ DecodeInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) if (FilterByOrigin(ctx, XLogRecGetOrigin(r))) return; - change = ReorderBufferGetChange(ctx->reorder); + change = ReorderBufferAllocChange(ctx->reorder); if (!(xlrec->flags & XLH_INSERT_IS_SPECULATIVE)) change->action = REORDER_BUFFER_CHANGE_INSERT; else @@ -928,7 +928,7 @@ DecodeInsert(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) tuplelen = datalen - SizeOfHeapHeader; change->data.tp.newtuple = - ReorderBufferGetTupleBuf(ctx->reorder, tuplelen); + ReorderBufferAllocTupleBuf(ctx->reorder, tuplelen); DecodeXLogTuple(tupledata, datalen, change->data.tp.newtuple); @@ -965,7 +965,7 @@ DecodeUpdate(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) if (FilterByOrigin(ctx, XLogRecGetOrigin(r))) return; - change = ReorderBufferGetChange(ctx->reorder); + change = ReorderBufferAllocChange(ctx->reorder); change->action = REORDER_BUFFER_CHANGE_UPDATE; change->origin_id = XLogRecGetOrigin(r); memcpy(&change->data.tp.rlocator, &target_locator, sizeof(RelFileLocator)); @@ -980,7 +980,7 @@ DecodeUpdate(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) tuplelen = datalen - SizeOfHeapHeader; change->data.tp.newtuple = - ReorderBufferGetTupleBuf(ctx->reorder, tuplelen); + ReorderBufferAllocTupleBuf(ctx->reorder, tuplelen); DecodeXLogTuple(data, datalen, change->data.tp.newtuple); } @@ -996,7 +996,7 @@ DecodeUpdate(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) tuplelen = datalen - SizeOfHeapHeader; change->data.tp.oldtuple = - ReorderBufferGetTupleBuf(ctx->reorder, tuplelen); + ReorderBufferAllocTupleBuf(ctx->reorder, tuplelen); DecodeXLogTuple(data, datalen, change->data.tp.oldtuple); } @@ -1031,7 +1031,7 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) if (FilterByOrigin(ctx, XLogRecGetOrigin(r))) return; - change = ReorderBufferGetChange(ctx->reorder); + change = ReorderBufferAllocChange(ctx->reorder); if (xlrec->flags & XLH_DELETE_IS_SUPER) change->action = REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT; @@ -1051,7 +1051,7 @@ DecodeDelete(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) Assert(XLogRecGetDataLen(r) > (SizeOfHeapDelete + SizeOfHeapHeader)); change->data.tp.oldtuple = - ReorderBufferGetTupleBuf(ctx->reorder, tuplelen); + ReorderBufferAllocTupleBuf(ctx->reorder, tuplelen); DecodeXLogTuple((char *) xlrec + SizeOfHeapDelete, datalen, change->data.tp.oldtuple); @@ -1083,7 +1083,7 @@ DecodeTruncate(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) if (FilterByOrigin(ctx, XLogRecGetOrigin(r))) return; - change = ReorderBufferGetChange(ctx->reorder); + change = ReorderBufferAllocChange(ctx->reorder); change->action = REORDER_BUFFER_CHANGE_TRUNCATE; change->origin_id = XLogRecGetOrigin(r); if (xlrec->flags & XLH_TRUNCATE_CASCADE) @@ -1091,8 +1091,8 @@ DecodeTruncate(LogicalDecodingContext *ctx, XLogRecordBuffer *buf) if (xlrec->flags & XLH_TRUNCATE_RESTART_SEQS) change->data.truncate.rest
Re: Test to dump and restore objects left behind by regression
On 2025-Mar-12, Ashutosh Bapat wrote: > Does the test pass for you if you don't apply my patches? Yes. It also passes if I keep PG_TEST_EXTRA empty. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/
Re: Index AM API cleanup
On Wed, Mar 12, 2025 at 7:25 AM Tom Lane wrote: > Peter Eisentraut writes: > > 0002: Add get_opfamily_member_for_cmptype(). This was called > > get_opmethod_member() in your patch set, but I think that name wasn't > > quite right. I also removed the opmethod argument, which was rarely > > used and is somewhat redundant. > > Hm, that will throw an error if IndexAmTranslateCompareType fails. > Shouldn't it be made to return InvalidOid instead? > There are two failure modes. In one mode, the AM has a concept of equality, but there is no operator for the given type. In the other mode, the AM simply has no concept of equality. In DefineIndex, the call: if (stmt->unique && !stmt->iswithoutoverlaps) { idx_eqop = get_opfamily_member_for_cmptype(idx_opfamily, idx_opcintype, idx_opcintype, COMPARE_EQ); if (!idx_eqop) ereport(ERROR, errcode(ERRCODE_UNDEFINED_OBJECT), errmsg("could not identify an equality operator for type %s", format_type_be(idx_opcintype)), errdetail("There is no suitable operator in operator family \"%s\" for access method \"%s\".", get_opfamily_name(idx_opfamily, false), get_am_name(get_opfamily_method(idx_opfamily; } probably should error about COMPARE_EQ not having a strategy in the AM, rather than complaining later that an equality operator cannot be found for the type. So that one seems fine as it is now. The other caller, refresh_by_match_merge, is similar: op = get_opfamily_member_for_cmptype(opfamily, opcintype, opcintype, COMPARE_EQ); if (!OidIsValid(op)) elog(ERROR, "missing equality operator for (%u,%u) in opfamily %u", opcintype, opcintype, opfamily); seems fine. There are no other callers as yet. You have made me a bit paranoid that, in future, we might add additional callers who are not anticipating the error. Should we solve that with a more complete code comment, or do you think refactoring is in order? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Primary and standby setting cross-checks
On Thu, 29 Aug 2024 at 23:52, Heikki Linnakangas wrote: > > Currently, if you configure a hot standby server with a smaller > max_connections setting than the primary, the server refuses to start up: > > LOG: entering standby mode > FATAL: recovery aborted because of insufficient parameter settings > DETAIL: max_connections = 10 is a lower setting than on the primary > server, where its value was 100. > HINT: You can restart the server after making the necessary > configuration changes. > > Or if you change the setting in the primary while the standby is > running, replay pauses: > > WARNING: hot standby is not possible because of insufficient parameter > settings > DETAIL: max_connections = 100 is a lower setting than on the primary > server, where its value was 200. > CONTEXT: WAL redo at 2/E1D8 for XLOG/PARAMETER_CHANGE: > max_connections=200 max_worker_processes=8 max_wal_senders=10 > max_prepared_xacts=0 max_locks_per_xact=64 wal_level=logical > wal_log_hints=off track_commit_timestamp=off > LOG: recovery has paused > DETAIL: If recovery is unpaused, the server will shut down. > HINT: You can then restart the server after making the necessary > configuration changes. > CONTEXT: WAL redo at 2/E1D8 for XLOG/PARAMETER_CHANGE: > max_connections=200 max_worker_processes=8 max_wal_senders=10 > max_prepared_xacts=0 max_locks_per_xact=64 wal_level=logical > wal_log_hints=off track_commit_timestamp=off > > Both of these are rather unpleasant behavior. > > I thought I could get rid of that limitation with my CSN snapshot patch > [1], because it gets rid of the fixed-size known-assigned XIDs array, > but there's a second reason for these limitations. It's also used to > ensure that the standby has enough space in the lock manager to hold > possible AccessExclusiveLocks taken by transactions in the primary. > > So firstly, I think that's a bad tradeoff. In vast majority of cases, > you would not run out of lock space anyway, if you just started up the > system. Secondly, that cross-check of settings doesn't fully prevent the > problem. It ensures that the lock tables are large enough to accommodate > all the locks you could possibly hold in the primary, but that doesn't > take into account any additional locks held by read-only queries in the > hot standby. So if you have queries running in the standby that take a > lot of locks, this can happen anyway: > > 2024-08-29 21:44:32.634 EEST [668327] FATAL: out of shared memory > 2024-08-29 21:44:32.634 EEST [668327] HINT: You might need to increase > "max_locks_per_transaction". > 2024-08-29 21:44:32.634 EEST [668327] CONTEXT: WAL redo at 2/FD40FCC8 > for Standby/LOCK: xid 996 db 5 rel 154045 > 2024-08-29 21:44:32.634 EEST [668327] WARNING: you don't own a lock of > type AccessExclusiveLock > 2024-08-29 21:44:32.634 EEST [668327] LOG: RecoveryLockHash contains > entry for lock no longer recorded by lock manager: xid 996 database 5 > relation 154045 > TRAP: failed Assert("false"), File: > "../src/backend/storage/ipc/standby.c", Line: 1053, PID: 668327 > postgres: startup recovering > 0001000200FD(ExceptionalCondition+0x6e)[0x556a4588396e] > postgres: startup recovering > 0001000200FD(+0x44156e)[0x556a4571356e] > postgres: startup recovering > 0001000200FD(StandbyReleaseAllLocks+0x78)[0x556a45712738] > postgres: startup recovering > 0001000200FD(ShutdownRecoveryTransactionEnvironment+0x15)[0x556a45712685] > postgres: startup recovering > 0001000200FD(shmem_exit+0x111)[0x556a457062e1] > postgres: startup recovering > 0001000200FD(+0x434132)[0x556a45706132] > postgres: startup recovering > 0001000200FD(proc_exit+0x59)[0x556a45706079] > postgres: startup recovering > 0001000200FD(errfinish+0x278)[0x556a45884708] > postgres: startup recovering > 0001000200FD(LockAcquireExtended+0xa46)[0x556a45719386] > postgres: startup recovering > 0001000200FD(StandbyAcquireAccessExclusiveLock+0x11d)[0x556a4571330d] > postgres: startup recovering > 0001000200FD(standby_redo+0x70)[0x556a45713690] > postgres: startup recovering > 0001000200FD(PerformWalRecovery+0x7b3)[0x556a4547d313] > postgres: startup recovering > 0001000200FD(StartupXLOG+0xac3)[0x556a4546dae3] > postgres: startup recovering > 0001000200FD(StartupProcessMain+0xe8)[0x556a45693558] > postgres: startup recovering > 0001000200FD(+0x3ba95d)[0x556a4568c95d] > postgres: startup recovering > 0001000200FD(+0x3bce41)[0x556a4568ee41] > postgres: startup recovering > 0001000200FD(PostmasterMain+0x116e)[0x556a4568eaae] > postgres: startup recovering > 0001000200FD(+0x2f960e)[0x556a455cb60e] > /lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f10ef042c8a] > /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f10ef042d45] > postgres: startup recovering > 0001000200FD(_start+0x21)[0x556a
Re: Replace IN VALUES with ANY in WHERE clauses during optimization
Hi, Alexander! On 06.03.2025 11:23, Alexander Korotkov wrote: Hi, Alena! On Sat, Mar 1, 2025 at 1:39 PM Alena Rybakina wrote: On 09.02.2025 18:38, Alexander Korotkov wrote: Also, aren't we too restrictive while requiring is_simple_values_sequence()? For instance, I believe cases like this (containing Var) could be transformed too. select * from t t1, lateral (select * from t t2 where t2.i in (values (t1.i), (1))); I added it and attached a patch with diff file. To be honest, I didn't find queries except for var with volatile functions where the transform can't be applied. I removed the function volatility check that I added in the previous version, since we already check it in is_simple_values_sequence. I'm not sure about only cases where var can refer to something outside available_rels list but I couldn't come up with an example where that's possible, what do you think? Considering it again, I think we can't face problems like that because we don't work with join. I attached a diff file as a difference with the 3rd version of the patch, when we did not consider the values with var for transformation. I take detailed look at makeSAOPArrayExpr() function, which is much more complex than corresponding fragment from match_orclause_to_indexcol(). And I found it to be mostly wrong. We are working in post parse-analyze stage. That means it's too late to do type coercion or lookup operator by name. We have already all the catalog objects nailed down. In connection with that, second argument of OpExpr shouldn't be ignored as it might contain amrelevant type cast. I think I've fixed the most of them problems in the attached patchset. I agree with your conclusion and changes. -- Regards, Alena Rybakina Postgres Professional