GetSubscriptionRelations declares too many scan keys
The function GetSubscriptionRelations was declaring ScanKeyData skey[2]; but actually only uses 1 scan key. It seems like the code was cut/paste from other nearby functions which really are using 2 keys. PSA a trivial patch to declare the correct number of keys for this function. -- Kind Regards, Peter Smith. Fujitsu Australia v1-0001-GetSubscriptionRelations-declare-only-1-scan-key.patch Description: Binary data
Re: Remove "FROM" in "DELETE FROM" when using tab-completion
On Mon, May 10, 2021 at 06:36:19AM +, tanghy.f...@fujitsu.com wrote: > On Monday, May 10, 2021 2:48 PM, Julien Rouhaud worte > >I think the behavior now is correct. The goal of autocompletion is to save > >keystrokes and time. As the only valid keyword after a DELETE (at least in a > >DeleteStmt) is FROM, it's a good thing that you get back "DELETE FROM" > >directly > >rather than asking that to autocomplete in multiple steps. > > > >Now, the \help command is for commands, which is a different thing as the > >command in that case is DELETE not DELETE FROM, even if you will have to > >follow > >your DELETE with a FROM. > > Thanks for your reply. I totally agree with you on the convenience of "DELETE > FROM" autocompletion. > But I also noticed some autocompletion for "DELETE" in some cases is just > "DELETE" already. > > =# EXPLAIN[TAB] > ANALYZE DECLARE DELETE INSERT SELECT UPDATE VERBOSE > > =# COPY ([TAB] > DELETE INSERT SELECT TABLE UPDATE VALUES WITH > > Maybe we should keep the behavior consistent? Definitely. > I mean we can change all "DELETE" to "DELETE FROM" or just remove "FROM" for > consistency. We should change all to DELETE FROM (apart from \help of course), and same for INSERT, change to INSERT INTO everywhere it makes sense.
Re: seawasp failing, maybe in glibc allocator
On Mon, May 10, 2021 at 6:59 PM Fabien COELHO wrote: > > Is gdb installed, and are core files being dumped by that SIGABRT, and > > are they using the default name (/proc/sys/kernel/core_pattern = core), > > which the BF can find with the value it's using, namely 'core_file_glob' > > => 'core*'? > > Nope: > >sh> cat /proc/sys/kernel/core_pattern >|/usr/share/apport/apport %p %s %c %d %P %E If you don't care about Ubuntu "apport" on this system (something for sending crash/bug reports to developers with a GUI), you could uninstall it (otherwise it overwrites the core_pattern every time it restarts, no matter what you write in your sysctl.conf, apparently), and then sudo sysctl -w kernel.core_pattern=core to undo the setting immediately (or reboot). Then hopefully the build farm would succeed in dumping a backtrace into the log.
Re: GetSubscriptionRelations declares too many scan keys
On Mon, May 10, 2021 at 12:36 PM Peter Smith wrote: > > The function GetSubscriptionRelations was declaring ScanKeyData > skey[2]; but actually > only uses 1 scan key. It seems like the code was cut/paste from other > nearby functions > which really are using 2 keys. > > PSA a trivial patch to declare the correct number of keys for this function. +1 for the change. It looks like a cut/paste type introduced by the commit 7c4f52409a. A comment on the patch: why do we need to declare an array of 1 element ScanKeyData skey[1];? Instead, can we just do ScanKeyData skey;? With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: row filtering for logical replication
On Wed, Mar 31, 2021 at 12:47 PM Euler Taveira wrote: > > Good question. IIRC the issue is that AlterPublicationStmt->tables has a list > element that was a relation_expr_list and was converted to > publication_table_list. If we share 'tables' with relation_expr_list (for > ALTER > PUBLICATION ... DROP TABLE) and publication_table_list (for the other ALTER > PUBLICATION ... ADD|SET TABLE), the OpenTableList() has to know what list > element it is dealing with. I think I came to the conclusion that it is less > uglier to avoid changing OpenTableList() and CloseTableList(). > > [Doing some experimentation...] > > Here is a patch that remove the referred code. It uses 2 distinct list > elements: relation_expr_list for ALTER PUBLICATION ... DROP TABLE and > publication_table_list for for ALTER PUBLICATION ... ADD|SET TABLE. A new > parameter was introduced to deal with the different elements of the list > 'tables'. AFAIK this is the latest patch available, but FYI it no longer applies cleanly on HEAD. git apply ../patches_misc/0001-Row-filter-for-logical-replication.patch ../patches_misc/0001-Row-filter-for-logical-replication.patch:518: trailing whitespace. error: patch failed: src/backend/parser/gram.y:426 error: src/backend/parser/gram.y: patch does not apply error: patch failed: src/backend/replication/logical/worker.c:340 error: src/backend/replication/logical/worker.c: patch does not apply Kind Regards, Peter Smith. Fujitsu Australia.
Re: PG 14 release notes, first draft
On Mon, May 10, 2021 at 3:03 PM Bruce Momjian wrote: > > I have committed the first draft of the PG 14 release notes. You can > see the most current build of them here: > > https://momjian.us/pgsql_docs/release-14.html Thank you! > Add system view pg_stat_replication_slots to report replication slot activity > (Sawada Masahiko, Amit Kapila) > > Function pg_stat_reset_replication_slot() resets slot statistics. THIS IS > LOGICAL ONLY, BUT NO "LOGICAL" IN THE NAME? IIUC pg_stat_replication_slots view supports only logical slot for now. But we might have it show also physical slot in the future. I'm fine with the current view name and description but someone might want to use "logical replication slot" instead of just "replication slot". > IS "ACTIVITY" THE RIGHT WORD? The doc says "The pg_stat_replication_slots view will contain one row per logical replication slot, showing statistics about its usage.". So we can say "... to report replication slot statistics about its usage". Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
Re: Race condition in recovery?
At Fri, 7 May 2021 15:16:03 +0530, Dilip Kumar wrote in > Okay, I got your point, now, consider the scenario that we are trying > to get the checkpoint record in XLogFileReadAnyTLI, you are right that > it returns history file 0002.history. I think I did not mention > one point, basically, the tool while restarting node 3 after promoting > node 2 is deleting all the local WAL of node3 (so that node 3 can > follow node2). So now node3 doesn't have the checkpoint in the local > segment. Suppose checkpoint record was in segment ... > So now you come out of XLogFileReadAnyTLI, without reading checkpoint > record and without setting expectedTLEs. Because expectedTLEs is only > set if we are able to read the checkpoint record. Make sense? Thanks. I understood the case and reproduced. Although I don't think removing WAL files from non-backup cluster is legit, I also think we can safely start archive recovery from a replicated segment. > So now expectedTLEs is still NULL and you go to get the checkpoint > record from primary and use checkPointCopy.ThisTimeLineID. I don't think erasing expectedTLEs after once set is the right fix because expectedTLEs are supposed to be set just once iff we are sure that we are going to follow the history, until rescan changes it as the only exception. It seems to me the issue here is not a race condition but WaitForWALToBecomeAvailable initializing expectedTLEs with the history of a improper timeline. So using recoveryTargetTLI instead of receiveTLI for the case fixes this issue. - if (!expectedTLEs) - expectedTLEs = readTimeLineHistory(receiveTLI); I thought that the reason using receiveTLI instead of recoveryTargetTLI here is that there's a case where receiveTLI is the future of recoveryTarrgetTLI but I haven't successfully had such a situation. If I set recovoryTargetTLI to a TLI that standby doesn't know but primary knows, validateRecoveryParameters immediately complains about that before reaching there. Anyway the attached assumes receiveTLI may be the future of recoveryTargetTLI. Just inserting if() into the exising code makes the added lines stick out of the right side edge of 80 columns so I refactored there a bit to lower indentation. I believe the 004_timeline_switch.pl detects your issue. And the attached change fixes it. Any suggestions are welcome. regards. -- Kyotaro Horiguchi NTT Open Source Software Center >From 52eba6b6ef8d8fda078d3acd27b6ce7406078df8 Mon Sep 17 00:00:00 2001 From: Kyotaro Horiguchi Date: Mon, 10 May 2021 16:57:24 +0900 Subject: [PATCH] Choose correct timeline when received historic timelines When a standby starts streaming before determining expectedTLEs, it determines the expected timeline list based on the timeline of the WAL segment just streamed from the primary. If we fetched the last checkpoint in the older timeline, this behavior prevents recovery from proceeding. When primary streamed over a WAL file in a historical timeline, use recoveryTargetTLI, which must be a history of the primary. There's one scenario where this makes a difference: suppose two standbys are connected to a primary both by archive and streaming. In the case where one of the standby promotes, then another reconnects to the promoted standby before archiving the first segment of the new timeline, and the content of pg_wal of the new standby is removed before reconnection, the standby fetches the history file for the new timeline but the first segment for the timeline is available only via streaming. In this case, the new standby thought that the primary always sends newer timeline than the current recovery target but that is not the case of this scenario. --- src/backend/access/transam/xlog.c | 45 ++--- src/test/recovery/t/004_timeline_switch.pl | 74 +- 2 files changed, 108 insertions(+), 11 deletions(-) diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c index c1d4415a43..1ca55b7ec0 100644 --- a/src/backend/access/transam/xlog.c +++ b/src/backend/access/transam/xlog.c @@ -12656,22 +12656,47 @@ WaitForWALToBecomeAvailable(XLogRecPtr RecPtr, bool randAccess, * info is set correctly and XLogReceiptTime isn't * changed. */ - if (readFile < 0) - { - if (!expectedTLEs) -expectedTLEs = readTimeLineHistory(receiveTLI); - readFile = XLogFileRead(readSegNo, PANIC, - receiveTLI, - XLOG_FROM_STREAM, false); - Assert(readFile >= 0); - } - else + + if (readFile >= 0) { /* just make sure source info is correct... */ readSource = XLOG_FROM_STREAM; XLogReceiptSource = XLOG_FROM_STREAM; return true; } + + readFile = XLogFileRead(readSegNo, PANIC, +receiveTLI, +XLOG_FROM_STREAM, false); + Assert(re
Re: Inaccurate error message when set fdw batch_size to 0
On Mon, May 10, 2021 at 12:00 PM Tom Lane wrote: > > Fujii Masao writes: > > +1 for the change of the error messages. > > Yeah, this error message seems outright buggy. However, it's a minor > matter. Also, some people think "positive" is the same thing as > "non-negative", so maybe we need less ambiguous wording? Since value 0 can't be considered as either a positive or negative integer, I think we can do as following(roughly): if (value < 0) "requires a zero or positive integer value" if (value <= 0) "requires a positive integer value" I'm not sure whether we should consider changing these messages: remainder for hash partition must be a non-negative integer parallel vacuum degree must be a non-negative integer repeat count size must be a non-negative integer number of workers must be a non-negative integer %s requires a non-negative numeric value distance in phrase operator should be non-negative and less than %d With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
Re: PG 14 release notes, first draft
On Mon, 10 May 2021 at 18:03, Bruce Momjian wrote: > I need clarification on many items, and the document still needs its > items properly ordered, and markup added. I also expect a lot of > feedback. Thanks for drafting that up. > Add executor method to cache results from the inner-side of joins (David > Rowley) I think it would be more accurate to say "inner-side of nested loop joins". > Allow efficient retrieval of heap rows via tid (Edmund Horner, David Rowley) I'd say we already had that feature with TID Scan. Maybe it would be better to write: "Allow efficient heap scanning on ranges of tids (Edmund Horner, David Rowley)" > Improve the performance of parallel sequential scans (Thomas Munro, David > Rowley) I think it is worth mentioning "I/O" before "performance". This change won't really help cases if all the table's pages are already in shared buffers. David
Re: GetSubscriptionRelations declares too many scan keys
On Mon, May 10, 2021 at 01:39:31PM +0530, Bharath Rupireddy wrote: > On Mon, May 10, 2021 at 12:36 PM Peter Smith wrote: > > > > The function GetSubscriptionRelations was declaring ScanKeyData > > skey[2]; but actually > > only uses 1 scan key. It seems like the code was cut/paste from other > > nearby functions > > which really are using 2 keys. > > > > PSA a trivial patch to declare the correct number of keys for this function. > > +1 for the change. It looks like a cut/paste type introduced by the > commit 7c4f52409a. > > A comment on the patch: why do we need to declare an array of 1 > element ScanKeyData skey[1];? Instead, can we just do ScanKeyData > skey;? +1, there are already many places where it's done this way if there's only 1 key.
Re: Race condition in recovery?
On Mon, May 10, 2021 at 2:05 PM Kyotaro Horiguchi wrote: > I thought that the reason using receiveTLI instead of > recoveryTargetTLI here is that there's a case where receiveTLI is the > future of recoveryTarrgetTLI but I haven't successfully had such a > situation. If I set recovoryTargetTLI to a TLI that standby doesn't > know but primary knows, validateRecoveryParameters immediately > complains about that before reaching there. Anyway the attached > assumes receiveTLI may be the future of recoveryTargetTLI. If you see the note in this commit. It says without the timeline history file, so does it trying to say that although receiveTLI is the ancestor of recovoryTargetTLI, it can not detect that because of the absence of the TL.history file ? ee994272ca50f70b53074f0febaec97e28f83c4e Author: Heikki Linnakangas 2013-01-03 14:11:58 Committer: Heikki Linnakangas 2013-01-03 14:11:58 . Without the timeline history file, recovering that file will fail as the older timeline ID is not recognized to be an ancestor of the target timeline. If you try to recover from such a backup, using only streaming replication to fetch the WAL, this patch is required for that to work. = > > I believe the 004_timeline_switch.pl detects your issue. And the > attached change fixes it. I think this fix looks better to me, but I will think more about it and give my feedback. Thanks for quickly coming up with the reproducible test case. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: GetSubscriptionRelations declares too many scan keys
On Mon, May 10, 2021 at 6:09 PM Bharath Rupireddy wrote: > > On Mon, May 10, 2021 at 12:36 PM Peter Smith wrote: > > > > The function GetSubscriptionRelations was declaring ScanKeyData > > skey[2]; but actually > > only uses 1 scan key. It seems like the code was cut/paste from other > > nearby functions > > which really are using 2 keys. > > > > PSA a trivial patch to declare the correct number of keys for this function. > > +1 for the change. It looks like a cut/paste type introduced by the > commit 7c4f52409a. > > A comment on the patch: why do we need to declare an array of 1 > element ScanKeyData skey[1];? Instead, can we just do ScanKeyData > skey;? IMO declaring skey[1] is better because then the code can share the same pattern as every other ScanData skey[n] code. Please search PG source code for "ScanData skey[1];" - there are dozens of precedents where other people felt the same as me for declaring single keys. Kind Regards, Peter Smith. Fujitsu Australia
Executor code - found an instance of a WHILE that should just be an IF
Hi, During debugging I noticed some code in ExecResult() where a WHILE loop is being used with an unconditional RETURN at the end of the block (which is intentional, looking at the history of changes), but now there's no actual use of the loop in any way. The code should probably be changed to just use IF for clarity. I've attached a patch. Regards, Greg Nancarrow Fujitsu Australia v1-0001-Change-an-instance-of-WHILE-to-IF-in-executor-code.patch Description: Binary data
Re: GetSubscriptionRelations declares too many scan keys
On Mon, May 10, 2021 at 07:09:29PM +1000, Peter Smith wrote: > On Mon, May 10, 2021 at 6:09 PM Bharath Rupireddy > wrote: > > > > On Mon, May 10, 2021 at 12:36 PM Peter Smith wrote: > > > > > > The function GetSubscriptionRelations was declaring ScanKeyData > > > skey[2]; but actually > > > only uses 1 scan key. It seems like the code was cut/paste from other > > > nearby functions > > > which really are using 2 keys. > > > > > > PSA a trivial patch to declare the correct number of keys for this > > > function. > > > > +1 for the change. It looks like a cut/paste type introduced by the > > commit 7c4f52409a. > > > > A comment on the patch: why do we need to declare an array of 1 > > element ScanKeyData skey[1];? Instead, can we just do ScanKeyData > > skey;? > > IMO declaring skey[1] is better because then the code can share the > same pattern as every other ScanData skey[n] code. > > Please search PG source code for "ScanData skey[1];" - there are > dozens of precedents where other people felt the same as me for > declaring single keys. AFAICT there are 73 occurences vs 62 of the "Scandata skey;". I don't think there's a huge consensus for one over the other.
Re: seawasp failing, maybe in glibc allocator
If you don't care about Ubuntu "apport" on this system (something for sending crash/bug reports to developers with a GUI), you could uninstall it (otherwise it overwrites the core_pattern every time it restarts, no matter what you write in your sysctl.conf, apparently), and then sudo sysctl -w kernel.core_pattern=core to undo the setting immediately (or reboot). Then hopefully the build farm would succeed in dumping a backtrace into the log. I forced-removed apport (which meant removing xserver-xorg). Let's see whether the reports are better or whether I break something. -- Fabien.
Re: GetSubscriptionRelations declares too many scan keys
On Mon, May 10, 2021 at 2:46 PM Julien Rouhaud wrote: > > On Mon, May 10, 2021 at 07:09:29PM +1000, Peter Smith wrote: > > On Mon, May 10, 2021 at 6:09 PM Bharath Rupireddy > > wrote: > > > > > > On Mon, May 10, 2021 at 12:36 PM Peter Smith > > > wrote: > > > > > > > > The function GetSubscriptionRelations was declaring ScanKeyData > > > > skey[2]; but actually > > > > only uses 1 scan key. It seems like the code was cut/paste from other > > > > nearby functions > > > > which really are using 2 keys. > > > > > > > > PSA a trivial patch to declare the correct number of keys for this > > > > function. > > > > > > +1 for the change. It looks like a cut/paste type introduced by the > > > commit 7c4f52409a. > > > > > > A comment on the patch: why do we need to declare an array of 1 > > > element ScanKeyData skey[1];? Instead, can we just do ScanKeyData > > > skey;? > > > > IMO declaring skey[1] is better because then the code can share the > > same pattern as every other ScanData skey[n] code. > > > > Please search PG source code for "ScanData skey[1];" - there are > > dozens of precedents where other people felt the same as me for > > declaring single keys. > > AFAICT there are 73 occurences vs 62 of the "Scandata skey;". I don't think > there's a huge consensus for one over the other. I think both Scandata skey[1]; and Scandata skey; are used. But IMHO using Scandata skey; looks better. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Why do we have perl and sed versions of Gen_dummy_probes?
On 07.05.21 20:31, Andrew Dunstan wrote: On 5/7/21 1:20 PM, Andres Freund wrote: On 2021-05-07 11:19:02 -0400, Andrew Dunstan wrote: Here's a patch that adds the README and also adds a Makefile recipe for regenerating Gen_dummy_probes.pl after the sed script is changed. On my system at least the recipe is idempotent. Nice! Thanks for this work. de nada. pushed. This recipe doesn't produce a Gen_dummy_probes.pl that matches exactly the one that is there now. If this is going to be the preferred method, then we should generate it once so that it matches going forward.
Fix of file path in the file comments
Hi, The file path mentioned in the file comments of 'src/backend/utils/activity/backend_status.c' was incorrect. Modified it to the correct path. Please find the patch attached. Thanks & Regards, Nitin Jadhav v1_0001-fix_file_path_in_comments.patch Description: Binary data
Re: [Proposal] Global temporary tables
On Thu, Apr 22, 2021 at 1:11 PM wenjing wrote: > I have briefly looked into the design comments added by the patch. I have a few questions. +Feature description + + +Previously, temporary tables are defined once and automatically +created (starting with empty contents) in every session before using them. I don’t think this statement is correct, I mean if we define a temp table in one session then it doesn’t automatically create in all the sessions. + +Like local temporary table, Global Temporary Table supports ON COMMIT PRESERVE ROWS +or ON COMMIT DELETE ROWS clause, so that data in the temporary table can be +cleaned up or reserved automatically when a session exits or a transaction COMMITs. /reserved/preserved I was trying to look into the “Main design idea” section. +1) CATALOG +GTTs store session-specific data. The storage information of GTTs'data, their +transaction information, and their statistics are not stored in the catalog. I did not understand what do you mean by “transaction information” is not stored in the catalog? Mean what transaction information are stored in catalog in the normal table which is not stored for GTT? +Changes to the GTT's metadata affect all sessions. +The operations making those changes include truncate GTT, Vacuum/Cluster GTT, +and Lock GTT. How does Truncate or Vacuum affect all the sessions, I mean truncate should only truncate the data of the current session and the same is true for the vacuum no? I will try to do a more detailed review. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: seawasp failing, maybe in glibc allocator
On Mon, May 10, 2021 at 9:30 PM Fabien COELHO wrote: > I forced-removed apport (which meant removing xserver-xorg). Let's see > whether the reports are better or whether I break something. And of course this time it succeeded :-) Just by the way, I noticed it takes ~40 minutes to compile. Is there a reason you don't install ccache and set eg CC="ccache /path/to/clang", CXX="ccache /path/to/clang++", CLANG="ccache /path/to/clang"? That doesn't seem to conflict with your goal of testing LLVM/Clang's main branch, because ccache checks the compiler's mtime as part of its cache invalidation strategy.
Re: Why do we have perl and sed versions of Gen_dummy_probes?
Peter Eisentraut writes: > On 07.05.21 20:31, Andrew Dunstan wrote: >> On 5/7/21 1:20 PM, Andres Freund wrote: >>> On 2021-05-07 11:19:02 -0400, Andrew Dunstan wrote: Here's a patch that adds the README and also adds a Makefile recipe for regenerating Gen_dummy_probes.pl after the sed script is changed. On my system at least the recipe is idempotent. >>> Nice! Thanks for this work. >> >> de nada. pushed. > > This recipe doesn't produce a Gen_dummy_probes.pl that matches exactly > the one that is there now. If this is going to be the preferred method, > then we should generate it once so that it matches going forward. Which version of perltidy do you have installed? For me it generates identical versions using any of 20170521 (per src/tools/pgindent/README), 20201207 (what I happened to have installed before), and 20210402 (the latest). Also, what does the difference look like? - ilmari -- - Twitter seems more influential [than blogs] in the 'gets reported in the mainstream press' sense at least. - Matt McLeod - That'd be because the content of a tweet is easier to condense down to a mainstream media article. - Calle Dybedahl
Re: seawasp failing, maybe in glibc allocator
And of course this time it succeeded :-) Hmmm. ISTM that failures are on and off every few attempts. Just by the way, I noticed it takes ~40 minutes to compile. Is there a reason you don't install ccache and set eg CC="ccache /path/to/clang", CXX="ccache /path/to/clang++", CLANG="ccache /path/to/clang"? That doesn't seem to conflict with your goal of testing LLVM/Clang's main branch, because ccache checks the compiler's mtime as part of its cache invalidation strategy. Yep. I remember that I disactivated it for some reason once, but I cannot remember why. I just reactivated it, will see what happens. -- Fabien.
Re: PG 14 release notes, first draft
On Mon, 10 May 2021 at 08:03, Bruce Momjian wrote: > > I have committed the first draft of the PG 14 release notes. You can > see the most current build of them here: > https://momjian.us/pgsql_docs/release-14.html > > I need clarification on many items, and the document still needs its > items properly ordered, and markup added. I also expect a lot of > feedback. I noticed that the improvement in bloat control in the HeapAM that I know of (3c3b8a4b, 0ff8bbde) weren't documented here. Although each can be considered minor, they together can decrease the bloating behaviour of certain workloads significantly (and limit the total damage), and in my opinion this should be mentioned. 3c3b8a4b: Returns space claimed for the line pointer array back to the page's empty space, so that it can also be used for tuple data. 0ff8bbde: Allows large tuples to be inserted on pages which have only a small amount of data, regardless of fillfactor. Together they should be able to help significantly in both bloat prevention and bloat reduction. > I plan to work on completing this document this coming week in > preparation for beta next week. Thanks! With regards, Matthias van de Meent
Re: [PATCH] Full support for index LP_DEAD hint bits on standby
Michail Nikolaev wrote: > Hello, Antonin. > > > I'm trying to review the patch, but not sure if I understand this problem, > > please see my comment below. > > Thanks a lot for your attention. It is strongly recommended to look at > version N3 (1) because it is a much more elegant, easy, and reliable > solution :) But the minRecoveryPoint-related issue affects it anyway. Indeed I'm reviewing (1), but I wanted to discuss this particular question in context, so I replied here. > > Why doesn't minRecoveryPoint get updated to 20? IMO that should happen by > > replaying the commit record. And if the standby happens to crash before the > > commit record could be replayed, no query should see the deletion and thus > > no > > hint bit should be set in the index. > > minRecoveryPoint is not affected by replaying the commit record in > most cases. It is updated in a lazy way, something like this: > minRecoveryPoint = max LSN of flushed page. Version 3 of a patch > contains a code_optional.patch to move minRecoveryPoint more > aggressively to get additional performance on standby (based on > Peter’s answer in (2). > So, “minRecoveryPoint will go here” is not because of “STANDBY INSERTS > NEW ROW IN INDEX” it is just a random event. > Michail. Sorry, I missed the fact that your example can be executed inside BEGIN - END block, in which case minRecoveryPoint won't advance after each command. I'll continue my review by replying to (1) > [1]: > https://www.postgresql.org/message-id/CANtu0ohHu1r1xQfTzEJuxeaOMYncG7xRxUQWdH%3DcMXZSf%2Bnzvg%40mail.gmail.com > [2]: > https://www.postgresql.org/message-id/CAH2-WzkSUcuFukhJdSxHFgtL6zEQgNhgOzNBiTbP_4u%3Dk6igAg%40mail.gmail.com > (“Also, btw, do you know any reason to keep minRecoveryPoint at a low > value?”) I'm not an expert in this area (I'm reviewing this patch also to learn more about recovery and replication), but after a breif research I think that postgres tries not to update the control file too frequently, see comments in UpdateMinRecoveryPoint(). I don't know if what you do in code_optional.patch would be a problem. Actually I think that a commit record should be replayed more often than XLOG_RUNNING_XACTS, shouldn't it? -- Antonin Houska Web: https://www.cybertec-postgresql.com
Re: Executor code - found an instance of a WHILE that should just be an IF
On Mon, 10 May 2021 at 21:16, Greg Nancarrow wrote: > During debugging I noticed some code in ExecResult() where a WHILE > loop is being used with an unconditional RETURN at the end of the > block (which is intentional, looking at the history of changes), but > now there's no actual use of the loop in any way. The code should > probably be changed to just use IF for clarity. > I've attached a patch. Looks like leftovers from ea15e1867. I don't think this will affect any code generation but you are right, it should be an "if". David
Re: PG 14 release notes, first draft
On 5/10/21 2:03 AM, Bruce Momjian wrote: I have committed the first draft of the PG 14 release notes. You can see the most current build of them here: https://momjian.us/pgsql_docs/release-14.html I need clarification on many items, and the document still needs its items properly ordered, and markup added. I also expect a lot of feedback. I plan to work on completing this document this coming week in preparation for beta next week. While only a small change, this commit does affect user visible behavior and so should probably be noted: https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b12bd4869b5e Joe -- Crunchy Data - http://crunchydata.com PostgreSQL Support for Secure Enterprises Consulting, Training, & Open Source Development
Re: [PATCH] Identify LWLocks in tracepoints
On 05.05.21 06:20, Craig Ringer wrote: On Wed, 5 May 2021 at 09:15, Craig Ringer wrote: warning: suggest braces around empty body in an ‘if’ statement [-Wempty-body] 1322 |TRACE_POSTGRESQL_LWLOCK_WAIT_START(T_NAME(lock), mode); | ^ Odd that I didn't get that. This compiler complaint is not due to the _ENABLED() test as such. It's due to the fact that we completely define out the TRACE_POSTGRESQL_ macros with src/backend/utils/Gen_dummy_probes.sed . While explicit braces could be added around each test, I suggest fixing Gen_dummy_probes.sed to emit the usual dummy statement instead. Patch attached. Committed, with the Gen_dummy_probes.pl change added.
Re: Inherited UPDATE/DELETE vs async execution
Fujita-san, On Sat, May 8, 2021 at 1:21 AM Etsuro Fujita wrote: > I noticed this while working on the > EXPLAIN-ANALYZE-for-async-capable-nodes issue: > > EXPLAIN (VERBOSE, COSTS OFF) > DELETE FROM async_pt; >QUERY PLAN > > Delete on public.async_pt >Foreign Delete on public.async_p1 async_pt_1 >Foreign Delete on public.async_p2 async_pt_2 >Delete on public.async_p3 async_pt_3 >-> Append > -> Async Foreign Delete on public.async_p1 async_pt_1 >Remote SQL: DELETE FROM public.base_tbl1 > -> Async Foreign Delete on public.async_p2 async_pt_2 >Remote SQL: DELETE FROM public.base_tbl2 > -> Seq Scan on public.async_p3 async_pt_3 >Output: async_pt_3.tableoid, async_pt_3.ctid > (11 rows) > > DELETE FROM async_pt; > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > connection to server was lost > > The cause for this would be that direct-update plans are mistakenly > treated as async-capable ones, as shown in the EXPLAIN output. I guess that happens because the ForeignScan nodes responsible for scanning or direct-updating/deleting from child foreign tables appear under an Append as of 86dc90056, whereas before they would appear as child plans of a ModifyTable node. IIUC, it's the Append that causes the async_capable flag to be set in those ForeignScan nodes. > To > fix, I think we should modify postgresPlanDirectModify() so that it > clears the async-capable flag if it is set. Attached is a patch for > that. Maybe I am missing something, though. I see that your patch is to disable asynchronous execution in ForeignScan nodes responsible for direct update/delete, but why not do the same for other ForeignScan nodes too? Or the other way around -- is it because fixing the crash that occurs in the former's case would be a significant undertaking for little gain? -- Amit Langote EDB: http://www.enterprisedb.com
RE: Remove "FROM" in "DELETE FROM" when using tab-completion
On Monday, May 10, 2021 4:15 PM, Julien Rouhaud wrote >We should change all to DELETE FROM (apart from \help of course), and same for >INSERT, change to INSERT INTO everywhere it makes sense. Thanks for the reply. Your advice sounds reasonable to me. So I tried to change all "DELETE" to "DELETE FROM" and "INSERT" to "INSERT INTO" in the attached patch except the follow cases which I think is in accordance with what PG-Doc said. CREATE POLICY CREATE [ OR REPLACE ] RULE CREATE [ OR REPLACE ] TRIGGER ALTER DEFAULT PRIVILEGES After applying the patch, the tap-tests for psql is passed. Please be free to tell me anything insufficient you found in my fix. Thanks. Regards, Tang 0001-TAB-completion-modification-for-INSERT-INTO-and-DELE.patch Description: 0001-TAB-completion-modification-for-INSERT-INTO-and-DELE.patch
Re: Why do we have perl and sed versions of Gen_dummy_probes?
On 5/10/21 7:16 AM, Dagfinn Ilmari Mannsåker wrote: > Peter Eisentraut writes: > >> On 07.05.21 20:31, Andrew Dunstan wrote: >>> On 5/7/21 1:20 PM, Andres Freund wrote: On 2021-05-07 11:19:02 -0400, Andrew Dunstan wrote: > Here's a patch that adds the README and also adds a Makefile recipe for > regenerating Gen_dummy_probes.pl after the sed script is changed. On my > system at least the recipe is idempotent. Nice! Thanks for this work. >>> de nada. pushed. >> This recipe doesn't produce a Gen_dummy_probes.pl that matches exactly >> the one that is there now. If this is going to be the preferred method, >> then we should generate it once so that it matches going forward. > Which version of perltidy do you have installed? For me it generates > identical versions using any of 20170521 (per src/tools/pgindent/README), > 20201207 (what I happened to have installed before), and 20210402 (the > latest). > > Also, what does the difference look like? > Yep: andrew@emma:utils $ touch Gen_dummy_probes.sed andrew@emma:utils $ touch ../../../src/Makefile.global andrew@emma:utils $ make top_srcdir=../../.. Gen_dummy_probes.pl perl -ni -e ' print; exit if /^\$0/;' Gen_dummy_probes.pl s2p -f Gen_dummy_probes.sed | sed -e 1,4d -e '/# #/d' -e '$d' >> Gen_dummy_probes.pl perltidy --profile=../../tools/pgindent/perltidyrc Gen_dummy_probes.pl perl -pi -e '!$lb && ( /^\t+#/ || /^# prototypes/ ) && print qq{\n};'\ -e '$lb = m/^\n/; ' Gen_dummy_probes.pl andrew@emma:utils $ git diff andrew@emma:utils $ perltidy --version This is perltidy, v20170521 cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
postgres_fdw - make cached connection functions tests meaningful
Hi, While working on [1], I got to know that there is a new GUC debug_invalidate_system_caches_always that has been introduced in v14. It can be used to switch off cache invalidation in CLOBBER_CACHE_ALWAYS builds which makes cache sensitive tests stable. Using this GUC, it is quite possible to make cached connection management function tests more meaningful by returning original values(true/false, all the output columns) instead of SELECT 1. Note that the commit f77717b29 stabilized the tests for those functions - postgres_fdw_disconnect, postgres_fdw_disconnect_all and postgres_fdw_get_connections by masking actual return value of the functions. Attaching a patch to use the new GUC to make the functions return actual output. Thoughts? [1] - https://www.postgresql.org/message-id/CALj2ACVGSQsq68y-LmyXKZzbNVgSgsiVKSzsrVXzVgnsZTN26Q%40mail.gmail.com With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com From a5ff26edbf54fca4d42833ef4aedf79bec9445bd Mon Sep 17 00:00:00 2001 From: Bharath Rupireddy Date: Sat, 8 May 2021 19:51:58 +0530 Subject: [PATCH v1] postgres_fdw-make cached connection functions tests meaningful The new GUC debug_invalidate_system_caches_always which has been introduced in v14 can be used to switch off cache invalidation in CLOBBER_CACHE_ALWAYS builds which can make cache sensitive tests stable. Using this GUC, it is quite possible to make cached connection management function tests more meaningful by returning original values(true/false, all the output columns) instead of SELECT 1. Note that the commit f77717b29 stabilized the tests for those functions - postgres_fdw_disconnect, postgres_fdw_disconnect_all and postgres_fdw_get_connections by masking actual return value of the functions. --- .../postgres_fdw/expected/postgres_fdw.out| 135 ++ contrib/postgres_fdw/sql/postgres_fdw.sql | 58 2 files changed, 100 insertions(+), 93 deletions(-) diff --git a/contrib/postgres_fdw/expected/postgres_fdw.out b/contrib/postgres_fdw/expected/postgres_fdw.out index 6f533c745d..67d993edb8 100644 --- a/contrib/postgres_fdw/expected/postgres_fdw.out +++ b/contrib/postgres_fdw/expected/postgres_fdw.out @@ -9198,17 +9198,26 @@ PREPARE TRANSACTION 'fdw_tpc'; ERROR: cannot PREPARE a transaction that has operated on postgres_fdw foreign tables ROLLBACK; WARNING: there is no transaction in progress +-- Let's ensure to close all the existing cached connections so that they don't +-- make the following tests unstable. +SELECT 1 FROM postgres_fdw_disconnect_all(); + ?column? +-- +1 +(1 row) + +-- If debug_invalidate_system_caches_always is active, it results in dropping +-- remote connections after every transaction, making it impossible to test +-- connection retry use case meaningfully. And also, the new functions +-- introduced for cached connections management will be unstable with the flag +-- on. So turn it off for these tests. +SET debug_invalidate_system_caches_always = 0; -- === -- reestablish new connection -- === -- Change application_name of remote connection to special one -- so that we can easily terminate the connection later. ALTER SERVER loopback OPTIONS (application_name 'fdw_retry_check'); --- If debug_invalidate_system_caches_always is active, it results in --- dropping remote connections after every transaction, making it --- impossible to test termination meaningfully. So turn that off --- for this test. -SET debug_invalidate_system_caches_always = 0; -- Make sure we have a remote connection. SELECT 1 FROM ft1 LIMIT 1; ?column? @@ -9250,21 +9259,13 @@ SELECT 1 FROM ft1 LIMIT 1;-- should fail ERROR: 08006 \set VERBOSITY default COMMIT; -RESET debug_invalidate_system_caches_always; -- = -- test connection invalidation cases and postgres_fdw_get_connections function -- = --- Let's ensure to close all the existing cached connections. -SELECT 1 FROM postgres_fdw_disconnect_all(); - ?column? --- -1 -(1 row) - -- No cached connections, so no records should be output. -SELECT server_name FROM postgres_fdw_get_connections() ORDER BY 1; - server_name -- +SELECT * FROM postgres_fdw_get_connections() ORDER BY 1; + server_name | valid +-+--- (0 rows) -- This test case is for closing the connection in pgfdw_xact_callback @@ -9284,11 +9285,11 @@ SELECT 1 FROM ft7 LIMIT 1; -- List all the existing cached connections. loopback and loopback3 should be -- output. -SELECT server_name FROM postgres_fdw_get_connections() ORDER BY 1; - server_name -- - loopback - loopback3 +SELECT * FROM postgres_fdw_get_connections() ORDER BY 1; + server_name | valid +--
compatibility issue - problem with migrating from Postgres 11
Hi my customer reported an issue related to unsupported TABLESPACE pg_default for partitioned table: postgres=# CREATE TABLE IF NOT EXISTS foo2 ( data bytea, guid character varying(255) COLLATE pg_catalog."default" NOT NULL, part date NOT NULL, retention_period integer, CONSTRAINT document_data_bytea_pkey1 PRIMARY KEY (guid, part) ) PARTITION BY RANGE (part) WITH ( OIDS = FALSE ) TABLESPACE pg_default; ERROR: cannot specify default tablespace for partitioned relations This check is two years old https://github.com/postgres/postgres/commit/87259588d0ab0b8e742e30596afa7ae25caadb18#diff-f2c91c95b7f2a84d916138e0af4338859803a03cee0d7e2e710fbcb869c59d0c Are there some plans to fix this issue? Regards Pavel
Re: PG 14 release notes, first draft
Hi, Bruce! On Mon, May 10, 2021 at 9:03 AM Bruce Momjian wrote: > I have committed the first draft of the PG 14 release notes. You can > see the most current build of them here: > > https://momjian.us/pgsql_docs/release-14.html > > I need clarification on many items, and the document still needs its > items properly ordered, and markup added. I also expect a lot of > feedback. > > I plan to work on completing this document this coming week in > preparation for beta next week. Thank you very much for your work! Let me provide a missing description for the items related to me. * Improve handling of compound words in to_tsquery() and websearch_to_tsquery() (Alexander Korotkov) Compound words are now transformed into parts connected with phrase search operators. For example, to_tsquery('pg_class') becomes 'pg <-> class' instead of 'pg & class'. This eliminates bug of handling compound words connected with the phrase operator and makes the search of compound words more strict. * Fix extra distance in phrase operators for quoted text in websearch_to_tsquery() (Alexander Korotkov) For example, websearch_to_tsquery('english', '"aaa: bbb"') becomes 'aaa <> bbb' instead of 'aaa <2> bbb'. Feel free to make stylistic and other corrections if needed. -- Regards, Alexander Korotkov
Re: [PATCH] Full support for index LP_DEAD hint bits on standby
Hello, Antonin. > Sorry, I missed the fact that your example can be executed inside BEGIN - END > block, in which case minRecoveryPoint won't advance after each command. No, the block is not executed as a single transaction, all commands are separated transactions (see below) > Actually I think that a commit record should be replayed > more often than XLOG_RUNNING_XACTS, shouldn't it? Yes, but replaying commit records DOES NOT affect minRecoveryPoint in almost all cases. UpdateMinRecoveryPoint is called by XLogFlush, but xact_redo_commit calls XLogFlush only in two cases: * DropRelationFiles is called (some relation are dropped) * If ForceSyncCommit was used on primary - few “heavy” commands, like DropTableSpace, CreateTableSpace, movedb, etc. But “regular” commit record is replayed without XLogFlush and, as result, without UpdateMinRecoveryPoint. So, in practice, UpdateMinRecoveryPoint is updated in an “async” way by checkpoint job. This is why there is a sense to call it on XLOG_RUNNING_XACTS. Thanks, Michail.
Re: Enhanced error message to include hint messages for redundant options error
On Mon, May 10, 2021 at 6:00 AM houzj.f...@fujitsu.com wrote: > > > > > > > > > Thanks! The v5 patch looks good to me. Let's see if all > > > > > > > > agree on the goto duplicate_error approach which could reduce > > the LOC by ~80. > > > > > > > > > > > > > > I think the "goto duplicate_error" approach looks good, it > > > > > > > avoids duplicating the same error code multiple times. > > > > > > > > > > > > Thanks. I will mark the v5 patch "ready for committer" if no one has > > comments. > > > > > > > > > > Hi, > > > > > > > > > > I looked into the patch and noticed a minor thing. > > > > > > > > > > + return; /* keep compiler quiet */ > > > > > } > > > > > > > > > > I think we do not need the comment here. > > > > > The compiler seems not require "return" at the end of function > > > > > when function's return type is VOID. > > > > > > > > > > In addition, it seems better to remove these "return;" like what > > > > > commit "3974c4" did. > > > > > > > > It looks like that commit removed the plain return statements for a > > > > void returning functions. I see in the code that there are return > > > > statements that are there right after ereport(ERROR, just to keep > > > > the compiler quiet. Here in this patch also, we have return; > > > > statements after ereport(ERROR, for void returning functions. I'm > > > > not sure removing them would cause some compiler warnings on some > > > > platforms with some other compilers. If we're not sure, I'm okay to > > > > keep those return; statements. Thoughts? > > > > > > I felt we could retain the return statement and remove the comments. > > > If you are ok with that I will modify and post a patch for it. > > > Thoughts? > > > > I would like to keep it as is i.e. both return statement and /* keep > > compiler > > quiet */ comment. Having said that, it's better to leave it to the > > committer on > > whether to have the return statement at all. > > Yes, it's better to leave it to the committer on whether to have the > "return;". > But, I think at least removing "return;" which is at the *end* of the > function will not cause any warning. > Such as: > > + return; /* keep compiler quiet */ > } > > So, I'd vote for at least removing the comment " keep the compiler quiet ". That sounds fine to me, Attached v6 patch which has the changes for the same. Regards, Vignesh From e4db135ebba067c5ae0a53489acf687e4c6a6f33 Mon Sep 17 00:00:00 2001 From: vignesh Date: Mon, 26 Apr 2021 18:40:36 +0530 Subject: [PATCH v6] Enhance error message. Enhanced error message, so that the user can easily identify the error. --- contrib/file_fdw/file_fdw.c | 20 +-- src/backend/catalog/aclchk.c| 22 ++-- src/backend/commands/copy.c | 66 -- src/backend/commands/dbcommands.c | 90 +- src/backend/commands/extension.c| 27 ++-- src/backend/commands/foreigncmds.c | 30 +++-- src/backend/commands/functioncmds.c | 57 + src/backend/commands/publicationcmds.c | 39 +++--- src/backend/commands/sequence.c | 57 +++-- src/backend/commands/subscriptioncmds.c | 75 +-- src/backend/commands/tablecmds.c| 2 +- src/backend/commands/typecmds.c | 38 +++--- src/backend/commands/user.c | 131 +++- src/backend/parser/parse_utilcmd.c | 2 +- src/backend/replication/pgoutput/pgoutput.c | 31 +++-- src/backend/replication/walsender.c | 23 ++-- src/backend/tcop/utility.c | 20 +-- src/include/commands/defrem.h | 6 +- src/include/commands/publicationcmds.h | 4 +- src/include/commands/subscriptioncmds.h | 4 +- src/include/commands/typecmds.h | 2 +- src/include/commands/user.h | 2 +- src/test/regress/expected/copy2.out | 24 ++-- src/test/regress/expected/foreign_data.out | 8 +- src/test/regress/expected/publication.out | 4 +- 25 files changed, 353 insertions(+), 431 deletions(-) diff --git a/contrib/file_fdw/file_fdw.c b/contrib/file_fdw/file_fdw.c index 2c2f149fb0..32398c227f 100644 --- a/contrib/file_fdw/file_fdw.c +++ b/contrib/file_fdw/file_fdw.c @@ -200,6 +200,7 @@ file_fdw_validator(PG_FUNCTION_ARGS) char *filename = NULL; DefElem*force_not_null = NULL; DefElem*force_null = NULL; + DefElem*def; List *other_options = NIL; ListCell *cell; @@ -209,7 +210,7 @@ file_fdw_validator(PG_FUNCTION_ARGS) */ foreach(cell, options_list) { - DefElem*def = (DefElem *) lfirst(cell); + def = (DefElem *) lfirst(cell); if (!is_valid_option(def->defname, catalog)) { @@ -290,10 +291,7 @@ file_fdw_validator(PG_FUNCTION_ARGS) else if (strcmp(def->defname, "force_not_null") == 0) { if (force_not_null) -ereport(ERROR, - (errcode(ERRCODE_SYNTAX_ERROR), -
Re: row filtering for logical replication
On Mon, May 10, 2021, at 5:19 AM, Peter Smith wrote: > AFAIK this is the latest patch available, but FYI it no longer applies > cleanly on HEAD. Peter, the last patch is broken since f3b141c4825. I'm still working on it for the next CF. I already addressed the points suggested by Amit in his last review; however, I'm still working on a cache for evaluating expression as suggested by Andres. I hope to post a new patch soon. -- Euler Taveira EDB https://www.enterprisedb.com/
Re: Remove "FROM" in "DELETE FROM" when using tab-completion
On Mon, May 10, 2021 at 5:57 PM tanghy.f...@fujitsu.com wrote: > > On Monday, May 10, 2021 4:15 PM, Julien Rouhaud wrote > >We should change all to DELETE FROM (apart from \help of course), and same > >for > >INSERT, change to INSERT INTO everywhere it makes sense. > > Thanks for the reply. Your advice sounds reasonable to me. > So I tried to change all "DELETE" to "DELETE FROM" and "INSERT" to "INSERT > INTO" in the attached patch except > the follow cases which I think is in accordance with what PG-Doc said. > CREATE POLICY > CREATE [ OR REPLACE ] RULE > CREATE [ OR REPLACE ] TRIGGER > ALTER DEFAULT PRIVILEGES > > After applying the patch, the tap-tests for psql is passed. > Please be free to tell me anything insufficient you found in my fix. Thanks. LGTM. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com
Re: Corrected documentation of data type for the logical replication message formats.
On Sun, May 9, 2021 at 6:54 PM Euler Taveira wrote: > > On Sun, May 9, 2021, at 9:37 AM, vignesh C wrote: > > For some of the logical replication messages the data type documented > was not correct, especially for lsn and xid. For lsn actual datatype > used is uint64 but is documented as int64, similarly for xid, datatype > used is uint32 but documented as int32. > Attached is a patch which has the fix for the same. > Thoughts? > > There was a discussion [1] a few months ago about it. Read the Message Data > Types definition [2]. It is confusing that an internal data type (int64) has a > name similar to a generic data type in a protocol definition. As I said [1] we > should probably inform that that piece of information (LSN) is a XLogRecPtr. > Since this chapter is intended for developers, I think it is fine to include > such internal detail. I agree to specifying the actual dataypes like XLogRecPtr for lsn, TimestampTz for timestamp, TransactionId for xid and Oid for the object id. Attached v2 patch which is changed on similar lines. Thoughts? Regards, Vignesh From e82378313f955699375f15f539a5267eb756b313 Mon Sep 17 00:00:00 2001 From: vignesh Date: Sun, 9 May 2021 17:57:08 +0530 Subject: [PATCH v2] Corrected data type for the logical replication message formats. Corrected data type for the logical replication message formats. --- doc/src/sgml/protocol.sgml | 74 +++--- 1 file changed, 37 insertions(+), 37 deletions(-) diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 2f4dde3beb..bfc29a25f5 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -6403,7 +6403,7 @@ Begin -Int64 +XLogRecPtr @@ -6413,7 +6413,7 @@ Begin -Int64 +TimestampTz @@ -6424,7 +6424,7 @@ Begin -Int32 +TransactionId @@ -6458,7 +6458,7 @@ Message -Int32 +TransactionId @@ -6469,7 +6469,7 @@ Message -Int8 +Uint8 @@ -6480,7 +6480,7 @@ Message -Int64 +XLogRecPtr @@ -6534,7 +6534,7 @@ Commit -Int8 +Uint8(0) @@ -6544,7 +6544,7 @@ Commit -Int64 +XLogRecPtr @@ -6554,7 +6554,7 @@ Commit -Int64 +XLogRecPtr @@ -6564,7 +6564,7 @@ Commit -Int64 +TimestampTz @@ -6599,7 +6599,7 @@ Origin -Int64 +XLogRecPtr @@ -6648,7 +6648,7 @@ Relation -Int32 +TransactionId @@ -6659,7 +6659,7 @@ Relation -Int32 +Oid @@ -6702,7 +6702,7 @@ Relation -Int16 +Uint16 @@ -6715,7 +6715,7 @@ Relation -Int8 +Uint8 @@ -6736,7 +6736,7 @@ Relation -Int32 +Oid @@ -6780,7 +6780,7 @@ Type -Int32 +TransactionId @@ -6791,7 +6791,7 @@ Type -Int32 +Oid @@ -6845,7 +6845,7 @@ Insert -Int32 +TransactionId @@ -6856,7 +6856,7 @@ Insert -Int32 +Oid @@ -6912,7 +6912,7 @@ Update -Int32 +TransactionId @@ -6923,7 +6923,7 @@ Update -Int32 +Oid @@ -7026,7 +7026,7 @@ Delete -Int32 +TransactionId @@ -7037,7 +7037,7 @@ Delete -Int32 +Oid @@ -7115,7 +7115,7 @@ Truncate -Int32 +TransactionId @@ -7136,7 +7136,7 @@ Truncate -Int8 +Uint8 @@ -7147,7 +7147,7 @@ Truncate -Int32 +Oid @@ -7193,7 +7193,7 @@ Stream Start -Int32 +TransactionId @@ -7203,7 +7203,7 @@ Stream Start -Int8 +Uint8 @@ -7262,7 +7262,7 @@ Stream Commit -Int32 +TransactionId @@ -7272,7 +7272,7 @@ Stream Commit -Int8 +Uint8(0) @@ -7282,7 +7282,7 @@ Stream Commit -Int64 +XLogRecPtr @@ -7292,7 +7292,7 @@ Stream Commit -Int64 +XLogRecPtr @@ -7302,7 +7302,7 @@ Stream Commit -Int64 +TimestampTz @@ -7337,7 +7337,7 @@ Stream Abort -Int32 +TransactionId @@ -7347,7 +7347,7 @@ Stream Abort -Int32 +TransactionId @@ -7382,7 +7382,7 @@ TupleData -Int16 +Uint16 -- 2.25.1
Re: Corrected documentation of data type for the logical replication message formats.
On Sun, May 9, 2021 at 6:44 PM Peter Smith wrote: > > On Sun, May 9, 2021 at 10:38 PM vignesh C wrote: > > > > Hi, > > > > For some of the logical replication messages the data type documented > > was not correct, especially for lsn and xid. For lsn actual datatype > > used is uint64 but is documented as int64, similarly for xid, datatype > > used is uint32 but documented as int32. > > Attached is a patch which has the fix for the same. > > Thoughts? > > If you want to do this then there are more - e.g. Flags should be > Uint8 instead of Int8. Thanks for the comments. I have made this change in v2 patch posted at [1]. This also includes the fix to specify uint8(0) at appropriate places. [1] - https://www.postgresql.org/message-id/CALDaNm2G_BJ9G%3DCxy9A6ht-TXPn4nB8W9_BcawuA1uxsNvoWfQ%40mail.gmail.com Regards, Vignesh
Re: PG 14 release notes, first draft
2021年5月10日(月) 15:03 Bruce Momjian : > > I have committed the first draft of the PG 14 release notes. You can > see the most current build of them here: > > https://momjian.us/pgsql_docs/release-14.html > > I need clarification on many items, and the document still needs its > items properly ordered, and markup added. I also expect a lot of > feedback. > > I plan to work on completing this document this coming week in > preparation for beta next week. This misses the change of default value, and is a bit unclear: > Remove password_encryption's support for boolean values, e.g. true (Peter > Eisentraut) > > Previous boolean values enabled md5. Now, only the md5 string does this. I'd suggest something along these lines: > The default for password_encryption is now "scram-sha-256" (Peter Eisentraut) > > The pseudo-boolean values "true", "on", "yes" and "1" are no longer accepted > as an alias for "md5". (It hasn't been a true boolean setting since Pg 9.6). Regards Ian Barwick -- EnterpriseDB: https://www.enterprisedb.com
Re: [PATCH] Full support for index LP_DEAD hint bits on standby
Michail Nikolaev wrote: > > Sorry, I missed the fact that your example can be executed inside BEGIN - > > END > > block, in which case minRecoveryPoint won't advance after each command. > > No, the block is not executed as a single transaction, all commands > are separated transactions (see below) > > > Actually I think that a commit record should be replayed > > more often than XLOG_RUNNING_XACTS, shouldn't it? > > Yes, but replaying commit records DOES NOT affect minRecoveryPoint in > almost all cases. > > UpdateMinRecoveryPoint is called by XLogFlush, but xact_redo_commit > calls XLogFlush only in two cases: > * DropRelationFiles is called (some relation are dropped) > * If ForceSyncCommit was used on primary - few “heavy” commands, like > DropTableSpace, CreateTableSpace, movedb, etc. > > But “regular” commit record is replayed without XLogFlush and, as > result, without UpdateMinRecoveryPoint. ok, I missed this. Thanks for explanation. -- Antonin Houska Web: https://www.cybertec-postgresql.com
Re: PG 14 release notes, first draft
Thanks for putting it together. I think these two should be merged: | Remove containment operators @ and ~ from contrib modules cube, hstore, intarray, and seg (Justin Pryzby) | Remove deprecated containment operators for built-in geometry data types (Justin Pryzby) | Improve autovacuum's analyze of partitioned tables (Yuzuko Hosoya) | DETAILS? Should say: Autovacuum now analyzes partitioned tables. | The server variable check_client_connection_interval allows supporting operating systems, e.g., Linux, to automatically cancel queries by disconnected clients. The GUC is actually called client_connection_check_interval - the commit message used the wrong name. | This is particularly helpful for reducing index bloat on tables that frequently update indexed columns. Does it mean "..where indexed columns are frequently updated"? | Allow multiple foreign table scans to be run in parallel (Robert Haas, Kyotaro Horiguchi, Thomas Munro, Etsuro Fujita) I think it means multiple foreight table scan *nodes* | If server variable compute_query_id is enabled, display the hash in pg_stat_activity, EXPLAIN VERBOSE, csvlog, and optionally in log_line_prefix (Julien Rouhaud) I think needs details, like: "If disabled, then the hash might be computed by an extension, instead". Later, you say: | Extension pg_stat_statements will need to enable hash computation via the compute_query_id server variable to function properly. pg_stat_statements can now use a custom hash computation method. Maybe it should say "will need hash computation to be enabled". | Allow more than the common name (CN) to be matched for client certificate authentication (Andrew Dunstan) Your description makes it sound like arbitrary attributes can be compared. But the option just allows comparing CN or DN. | Allow file system sync at the start of crash recovery on Linux (Thomas Munro) I think this should describe the existing, default behavior: Allow syncfs method to sync data directory during recovery; The default behavior is to open and fsync every data file, and the new setting recovery_init_sync_method=syncfs instead syncs each filesystem in the data directory. | Add date_bin function (John Naylor) This truncate timestamps on an arbitrary interval. Like date_trunc() but also supports eg. '15 minutes', and also uses an arbitrary "origin". | Support negative indexes in split_part() (Nikhil Benesch) | Negative values count from the last field going forward. should say "start from the last field and count backward" ? | Add configure option --with-openssl to behave like --with-ssl={openssl} (Daniel Gustafsson, Michael Paquier) | The option --with-openssl is kept for compatibility. I think this is backwards. The new option is with-ssl=openssl, and (as you said) with-openssl is kept. Should these be in the "compatibility" section? | Force custom server variable names to match the pattern used for unquoted SQL identifiers (Tom Lane) | Change password_encryption's default to scram-sha-256 (Peter Eisentraut) | Change checkpoint_completion_target default to 0.9 (Stephen Frost) | Reduce the default value of vacuum_cost_page_miss (Peter Geoghegan) Nitpicks to follow: | Allow some GiST index to be built by presorting the data (Andrey Borodin) indexes | with --with-lz4 support to enable this feature I would change to say "to support" rather than "support to enable" | Speed truncation of small tables on large shared buffer servers (Kirk Jamison) "on servers with large settings of shared_buffers" | Allow windowing functions to perform incremental sorts (David Rowley) Just "window" functions | Improve pg_stat_activity reporting for walsenders processes (Tom Lane) walsender | Previously these functions could only be executed by super-users, and still defaults do that. ..which is still the default behavior. | This allows multiple queries to be send and only wait for completion when a specific synchronization message is sent. be sent | Enhance libpq libpq's target_session_attrs parameter options (Haribabu Kommi, Greg Nancarrow, Vignesh C, Tom Lane) remove first "libpq" | With the removal of the ! operator in this release, factorial() is the only built-in way to computer a factorial. compute | For example, GROUP BY CUBE (a,b), CUBE (b,c) will generated duplicate grouping combinations without DISTINCT. will generate | Allow VACUUM VERBOSE to report page deletion counts for each scan of an index (Peter Geoghegan) I think "Allow" is wrong - should just say that VACUUM VERBOSE reports.. |By default, only the root of partitioned tables are imported. *is* imported Can these be merged: Allow logical replication to stream long transactions to standbys (Dilip Kumar, Tomas Vondra, Amit Kapila, Nikhil Sontakke) Improve the logical replication API to allow streaming large in-progress transactions (Tomas Vondra, Dilip Kumar, Amit Kapila)
Re: Inaccurate error message when set fdw batch_size to 0
Bharath Rupireddy writes: > On Mon, May 10, 2021 at 12:00 PM Tom Lane wrote: >> Yeah, this error message seems outright buggy. However, it's a minor >> matter. Also, some people think "positive" is the same thing as >> "non-negative", so maybe we need less ambiguous wording? > Since value 0 can't be considered as either a positive or negative > integer, I think we can do as following(roughly): > if (value < 0) "requires a zero or positive integer value" > if (value <= 0) "requires a positive integer value" I was thinking of avoiding the passive voice and writing "foo must be greater than zero" which removes all doubt. It's not necessary to keep the "integer" aspect of the existing text, because if someone had supplied a non-integer value, that would not have gotten this far anyway. > I'm not sure whether we should consider changing these messages: > remainder for hash partition must be a non-negative integer > parallel vacuum degree must be a non-negative integer > repeat count size must be a non-negative integer > number of workers must be a non-negative integer > %s requires a non-negative numeric value > distance in phrase operator should be non-negative and less than %d I think for consistency it'd be good to change 'em all. I'm almost tempted to put this matter into our message style guide too. regards, tom lane
Re: GetSubscriptionRelations declares too many scan keys
Julien Rouhaud writes: > On Mon, May 10, 2021 at 07:09:29PM +1000, Peter Smith wrote: >> Please search PG source code for "ScanData skey[1];" - there are >> dozens of precedents where other people felt the same as me for >> declaring single keys. > AFAICT there are 73 occurences vs 62 of the "Scandata skey;". I don't think > there's a huge consensus for one over the other. Yeah, there's no real consensus about that. But in this case there's a strong reason to use skey[1]: it makes the patch a very safe one-liner. To convert to the other pattern would require touching more code. regards, tom lane
pg_stat_statements requires compute_query_id
Hi I tested features of Postgres 14. The extension pg_stat_statements didn't work to me until I enabled compute_query_id. Is it expected behaviour? I expected just an empty column query_id and workable extension. This doesn't look well. More, it increases the (little bit) complexity of migration to Postgres 14. Regards Pavel
Re: PG 14 release notes, first draft
Same as the last couple years, I checked for missing items in the release notes, running something like this. git log --cherry-pick --oneline origin/REL_13_STABLE...origin/master Should any of these be included? f82de5c46b Do COPY FROM encoding conversion/verification in larger chunks. 9e596b65f4 Add "LP_DEAD item?" column to GiST pageinspect functions 10a5b35a00 Report resource usage at the end of recovery 7e453634bb Add additional information in the vacuum error context. 1ea396362b Improve logging of bad parameter values in BIND messages. 86dc90056d Rework planning and execution of UPDATE and DELETE. a1115fa078 Postpone some more stuff out of ExecInitModifyTable. c5b7ba4e67 Postpone some stuff out of ExecInitModifyTable. 7db0cd2145 Set PD_ALL_VISIBLE and visibility map bits in COPY FREEZE 01e658fa74 Hash support for row types a929e17e5a Allow run-time pruning on nested Append/MergeAppend nodes 8c15a29745 Allow ALTER TYPE to update an existing type's typsubscript value. c7aba7c14e Support subscripting of arbitrary types, not only arrays. 7b94e99960 Remove catalog function currtid() 926fa801ac Remove undocumented IS [NOT] OF syntax. cd9c1b3e19 Rename PGPROC->vacuumFlags to statusFlags a04daa97a4 Remove es_result_relation_info from EState. 3d351d916b Redefine pg_class.reltuples to be -1 before the first VACUUM or ANALYZE. fea10a6434 Rename VariableCacheData.nextFullXid to nextXid. 9de9294b0c Stop archive recovery if WAL generated with wal_level=minimal is found. (see also 15251c0a6) f40c6969d0 Routine usage information schema tables b4af70cb21 Simplify state managed by VACUUM. 4753ef37e0 Use a WaitLatch for vacuum/autovacuum sleeping 9dd963ae25 Recycle nbtree pages deleted during same VACUUM. 3c3b8a4b26 Truncate line pointer array during VACUUM. ad1c36b070 Fix foreign-key selectivity estimation in the presence of constants.
Re: SQL-standard function body
On 27.04.21 18:16, Tom Lane wrote: That's kind of a lot of complication, and inefficiency, for a corner case that may never arise in practice. We've ignored the risk for default expressions, and AFAIR have yet to receive any field complaints about it. So maybe it's okay to do the same for SQL-style function bodies, at least for now. Another option would be that we disallow this at creation time. Don't like that one much. The backend shouldn't be in the business of rejecting valid commands just because pg_dump might be unable to cope later. Since this is listed as an open item, I want to clarify that I'm currently not planning to work on this, based on this discussion. Certainly something to look into sometime later, but it's not in my plans right now.
Re: pg_stat_statements requires compute_query_id
Hi Pavel, On Mon, May 10, 2021 at 04:36:16PM +0200, Pavel Stehule wrote: > > I tested features of Postgres 14. The extension pg_stat_statements didn't > work to me until I enabled compute_query_id. Is it expected behaviour? Yes. > I expected just an empty column query_id and workable extension. This > doesn't look well. > > More, it increases the (little bit) complexity of migration to Postgres 14. This was already raised multiple times, and the latest discussion can be found at [1]. Multiple options have been suggested, but AFAICT there isn't a clear consensus on what we should do exactly, so I've not been able to send a fix yet. [1]: https://www.postgresql.org/message-id/flat/35457b09-36f8-add3-1d07-6034fa585ca8%40oss.nttdata.com
Re: PG 14 release notes, first draft
On Sun, May 9, 2021 at 11:03 PM Bruce Momjian wrote: > I have committed the first draft of the PG 14 release notes. This definitely isn't necessary, since the commit in question was a totally mechanical thing that cleaned up a minor inconsistency: Initialize work_mem and maintenance_work_mem using current guc.c default (Peter Geoghegan) Oversight in commit 848ae330a49, which increased the previous defaults for work_mem and maintenance_work_mem by 4X. IS THIS A BEHAVIORAL CHANGE? -- Peter Geoghegan
Re: PG 14 release notes, first draft
On Mon, May 10, 2021 at 4:44 AM Matthias van de Meent wrote: > I noticed that the improvement in bloat control in the HeapAM that I > know of (3c3b8a4b, 0ff8bbde) weren't documented here. Although each > can be considered minor, they together can decrease the bloating > behaviour of certain workloads significantly (and limit the total > damage), and in my opinion this should be mentioned. > > 3c3b8a4b: Returns space claimed for the line pointer array back to the > page's empty space, so that it can also be used for tuple data. > > 0ff8bbde: Allows large tuples to be inserted on pages which have only > a small amount of data, regardless of fillfactor. +1 on mentioning both things. -- Peter Geoghegan
Re: PG 14 release notes, first draft
On Mon, May 10, 2021 at 7:00 AM Justin Pryzby wrote: > | Allow VACUUM VERBOSE to report page deletion counts for each scan of an > index (Peter Geoghegan) > > I think "Allow" is wrong - should just say that VACUUM VERBOSE reports.. It's also not accurate, since the count of deleted pages was always shown by VACUUM VERBOSE (once per index scan). The new feature has us show pages deleted by the VACUUM that actually ran (not some earlier VACUUM) -- these are "newly deleted pages". I don't think that this item is worth mentioning, though -- it's just a nice to have. If Bruce removes it from the release notes entirely I won't object. In addition to the items that I commented on in my response to Matthias just now, I should point out the following item as worthy of inclusion: 9dd963ae25 Recycle nbtree pages deleted during same VACUUM. I suggest that this item be phrased more or less as follows: "Allow VACUUM to eagerly place newly deleted B-Tree pages in the Free Space Map. Previously VACUUM could only place preexisting deleted pages in the Free Space Map for recycling." -- Peter Geoghegan
Re: SQL-standard function body
Peter Eisentraut writes: > On 27.04.21 18:16, Tom Lane wrote: >> That's kind of a lot of complication, and inefficiency, for a corner case >> that may never arise in practice. We've ignored the risk for default >> expressions, and AFAIR have yet to receive any field complaints about it. >> So maybe it's okay to do the same for SQL-style function bodies, at least >> for now. >>> Another option would be that we disallow this at creation time. >> Don't like that one much. The backend shouldn't be in the business >> of rejecting valid commands just because pg_dump might be unable >> to cope later. > Since this is listed as an open item, I want to clarify that I'm > currently not planning to work on this, based on this discussion. > Certainly something to look into sometime later, but it's not in my > plans right now. Right, I concur with moving it to the "won't fix" category. regards, tom lane
Re: [PATCH] Full support for index LP_DEAD hint bits on standby
Michail Nikolaev wrote: > After some correspondence with Peter Geoghegan (1) and his ideas, I > have reworked the patch a lot and now it is much more simple with even > better performance (no new WAL or conflict resolution, hot standby > feedback is unrelated). My review that started in [1] continues here. (Please note that code.patch does not apply to the current master branch.) I think I understand your approach now and couldn't find a problem by reading the code. What I consider worth improving is documentation, both code comments and nbtree/README. Especially for the problem discussed in [1] it should be explained what would happen if kill_prior_tuple_min_lsn was not checked. Also, in IsIndexLpDeadAllowed() you say that invalid deadness->latest_removed_xid means the following: /* * Looks like it is tuple cleared by heap_page_prune_execute, * we must be sure if LSN of XLOG_HEAP2_CLEAN (or any subsequent * updates) less than minRecoveryPoint to avoid MVCC failure * after crash recovery. */ However I think there's one more case: if heap_hot_search_buffer() considers all tuples in the chain to be "surely dead", but HeapTupleHeaderAdvanceLatestRemovedXid() skips them all for this reason: /* * Ignore tuples inserted by an aborted transaction or if the tuple was * updated/deleted by the inserting transaction. * * Look for a committed hint bit, or if no xmin bit is set, check clog. */ I think that the dead tuples produced this way should never be visible on the standby (and even if they were, they would change the page LSN so your algorithm would treat them correctly) so I see no correctness problem. But it might be worth explaining better the meaning of invalid "latest_removed_xid" in comments. In the nbtree/README, you say "... if the commit record of latestRemovedXid is more ..." but it's not clear to me what "latestRemovedXid" is. If you mean the scan->kill_prior_tuple_min_lsn field, you probably need more words to explain it. * IsIndexLpDeadAllowed() /* It all always allowed on primary if *all_dead. */ should probably be /* It is always allowed on primary if *all_dead. */ * gistkillitems() As the function is only called if (so->numKilled > 0), I think both "killedsomething" and "dirty" variables should always have the same value, so one variable should be enough. Assert(so->numKilled) would be appropriate in that case. The situation is similar for btree and hash indexes. doc.patch: "+applying the fill page write." [1] https://www.postgresql.org/message-id/61470.1620647290%40antos -- Antonin Houska Web: https://www.cybertec-postgresql.com
Re: [Patch] ALTER SYSTEM READ ONLY
On Sun, May 9, 2021 at 1:26 AM Amul Sul wrote: > The state in the control file also gets cleared. Though, after > clearing in memory the state patch doesn't really do the immediate > change to the control file, it relies on the next UpdateControlFile() > to do that. But when will that happen? If you're relying on some very nearby code, that might be OK, but perhaps a comment is in order. If you're just thinking it's going to happen eventually, I think that's not good enough. > Regarding log message I think I have skipped that intentionally, to > avoid confusing log as "system is now read write" when we do start as > hot-standby which is not really read-write. I think the message should not be phrased that way. In fact, I think now that we've moved to calling this pg_prohibit_wal() rather than ALTER SYSTEM READ ONLY, a lot of messages need to be rethought, and maybe some comments and function names as well. Perhaps something like: system is read only -> WAL is now prohibited system is read write -> WAL is no longer prohibited And then for this particular case, maybe something like: clearing WAL prohibition because the system is in archive recovery > > The second part of this proposal was: > > > > "2. Create a new function with a name like XLogAcceptWrites(). Move the > > following things from StartupXLOG() into that function: (1) the call > > to UpdateFullPageWrites(), (2) the following block of code that does > > either CreateEndOfRecoveryRecord() or RequestCheckpoint() or > > CreateCheckPoint(), (3) the next block of code that runs > > recovery_end_command, (4) the call to XLogReportParameters(), and (5) > > the call to CompleteCommitTsInitialization(). Call the new function > > from the place where we now call XLogReportParameters(). This would > > mean that (1)-(3) happen later than they do now, which might require > > some adjustments." > > > > Now you moved that code, but you also moved (6) > > CompleteCommitTsInitialization(), (7) setting the control file to > > DB_IN_PRODUCTION, (8) setting the state to RECOVERY_STATE_DONE, and > > (9) requesting a checkpoint if we were just promoted. That's not what > > was proposed. One result of this is that the server now thinks it's in > > recovery even after the startup process has exited. > > RecoveryInProgress() is still returning true everywhere. But that is > > inconsistent with what Andres and I were recommending in > > http://postgr.es/m/CA+TgmoZYQN=rcye-ixwnjdvmaoh+7jaqsif3u2k8xqxipba...@mail.gmail.com > > Regarding modified approach, I tried to explain that why I did > this in > http://postgr.es/m/CAAJ_b96Yb4jaW6oU1bVYEBaf=tq-ql+mmt1exfwvnzer7xr...@mail.gmail.com I am not able to understand what problem you are seeing there. If we're in crash recovery, then nobody can connect to the database, so there can't be any concurrent activity. If we're in archive recovery, we now clear the WAL-is-prohibited flag so that we will go read-write directly at the end of recovery. We can and should refuse any effort to call pg_prohibit_wal() during recovery. If we reached the end of crash recovery and are now permitting read-only connections, why would anyone be able to write WAL before the system has been changed to read-write? If that can happen, it's a bug, not a reason to change the design. Maybe your concern here is about ordering: the process that is going to run XLogAcceptWrites() needs to allow xlog writes locally before we tell other backends that they also can xlog writes; otherwise, some other records could slip in before UpdateFullPageWrites() and similar have run, which we probably don't want. But that's why LocalSetXLogInsertAllowed() was invented, and if it doesn't quite do what we need in this situation, we should be able to tweak it so it does. If your concern is something else, can you spell it out for me again because I'm not getting it? -- Robert Haas EDB: http://www.enterprisedb.com
Re: Why do we have perl and sed versions of Gen_dummy_probes?
Andrew Dunstan writes: > On 5/10/21 7:16 AM, Dagfinn Ilmari Mannsåker wrote: >> Peter Eisentraut writes: >>> This recipe doesn't produce a Gen_dummy_probes.pl that matches exactly >>> the one that is there now. If this is going to be the preferred method, >>> then we should generate it once so that it matches going forward. >> Which version of perltidy do you have installed? For me it generates >> identical versions using any of 20170521 (per src/tools/pgindent/README), >> 20201207 (what I happened to have installed before), and 20210402 (the >> latest). > Yep: For me, using App-s2p-1.003 and perltidy v20170521, it works as long as I start with the previous version of Gen_dummy_probes.pl in place. I first tried to test this by "rm Gen_dummy_probes.pl; make Gen_dummy_probes.pl", and what I got was a script without all the initial commentary nor the first line of actual Perl code. I don't think this is good practice; it implies that any accidental corruption of the commentary would be carried forward. I think we should be extracting the commentary from Gen_dummy_probes.sed. regards, tom lane
Re: [PATCH] Identify LWLocks in tracepoints
On 2021-05-09 19:51:13 -0400, Tom Lane wrote: > Andres Freund writes: > > On 2021-05-08 13:13:47 -0400, Tom Lane wrote: > >> (I wonder why flaviventris and serinus are still using an "experimental" > >> compiler version that is now behind mainstream.) > > > The upgrade script didn't install the newer version it because it had to > > remove some conflicting packages... Should be fixed for runs starting > > now. > > Looks like that didn't work ... Looks like it did, but turned out to have some unintended side-effects :(. The snapshot builds are now new: https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=flaviventris&dt=2021-05-10%2015%3A43%3A56&stg=configure configure:3966: ccache /usr/lib/gcc-snapshot/bin/gcc --version >&5 gcc (Debian 20210421-1) 11.0.1 20210421 (prerelease) [gcc-11 revision fbb7739892e:d13ce34bd01:3756d99dab6a268d0d8a17583980a86f23f0595a] But the aforementioned dependencies that needed to remove broke the installed old versions of gcc/clang. I started to build the old versions of llvm manually, but that then hits the issue that at least 3.9 doesn't build with halfway modern versions of gcc/clang. So I gotta do it stepwise (i.e. go backwards, build llvm n-2 with n-1), will take a bit of time. Greetings, Andres Freund
Re: [PATCH] Identify LWLocks in tracepoints
Andres Freund writes: > Looks like it did, but turned out to have some unintended side-effects > :(. > The snapshot builds are now new: > https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=flaviventris&dt=2021-05-10%2015%3A43%3A56&stg=configure > configure:3966: ccache /usr/lib/gcc-snapshot/bin/gcc --version >&5 > gcc (Debian 20210421-1) 11.0.1 20210421 (prerelease) [gcc-11 revision > fbb7739892e:d13ce34bd01:3756d99dab6a268d0d8a17583980a86f23f0595a] > But the aforementioned dependencies that needed to remove broke the > installed old versions of gcc/clang. > I started to build the old versions of llvm manually, but that then hits > the issue that at least 3.9 doesn't build with halfway modern versions > of gcc/clang. So I gotta do it stepwise (i.e. go backwards, build llvm > n-2 with n-1), will take a bit of time. Ugh. Memo to self: don't rag on other peoples' buildfarm configurations right before a release deadline :-(. Sorry to cause you trouble. regards, tom lane
Re: PG 14 release notes, first draft
On Mon, May 10, 2021 at 08:16:16AM +0200, Laurenz Albe wrote: > On Mon, 2021-05-10 at 02:03 -0400, Bruce Momjian wrote: > > When using \e in psql, if the buffer is not modified by the editor, ignore > > the editor contents and leave the buffer unchanged (Laurenz Albe) > > The \ef and \ev commands also now have this behavior. DOCS SAY BUFFER IS > > CLEARED. > > It's a bit more complicated: If you edit the current buffer with \e, the > buffer is > unchanged if you quit the editor. > However, if you edit the previous statement, a file or the definition of a > function > or view, the query buffer is cleared if you quit the editor without saving. > > Suggested wording: > > When editing anything else than the current query buffer with \e, and you quit > the editor, the query buffer is cleared. This makes the behavior less > surprising > and prevents the unintended re-execution of the previous statement. OK, I figured it out. I was confused by \p because \? says: test=> \? Query Buffer \e [FILE] [LINE] edit the query buffer (or file) with external editor \ef [FUNCNAME [LINE]] edit function definition with external editor \ev [VIEWNAME [LINE]] edit view definition with external editor --> \p show the contents of the query buffer \r reset (clear) the query buffer ... but the documentaton says: \p or \print Print the current query buffer to the standard output. If -->the current query buffer is empty, the most recently executed -->query is printed instead. I wasn't aware that \e loads the previous query if the buffer is empty. I came up with this release note text: When editing the previous query or a file with psql's \e, ignore the contents if the editor exits without saving (Laurenz Albe) Previously, editing the previous query or a file and not saving the editor contents would still execute the editor contents. The \ef and \ev commands also now have this behavior. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: [PATCH] Identify LWLocks in tracepoints
Hi, On 2021-05-10 12:14:46 -0400, Tom Lane wrote: > Andres Freund writes: > > Looks like it did, but turned out to have some unintended side-effects > > :(. > > The snapshot builds are now new: > > https://buildfarm.postgresql.org/cgi-bin/show_stage_log.pl?nm=flaviventris&dt=2021-05-10%2015%3A43%3A56&stg=configure > > configure:3966: ccache /usr/lib/gcc-snapshot/bin/gcc --version >&5 > > gcc (Debian 20210421-1) 11.0.1 20210421 (prerelease) [gcc-11 revision > > fbb7739892e:d13ce34bd01:3756d99dab6a268d0d8a17583980a86f23f0595a] > > But the aforementioned dependencies that needed to remove broke the > > installed old versions of gcc/clang. > > I started to build the old versions of llvm manually, but that then hits > > the issue that at least 3.9 doesn't build with halfway modern versions > > of gcc/clang. So I gotta do it stepwise (i.e. go backwards, build llvm > > n-2 with n-1), will take a bit of time. > > Ugh. Memo to self: don't rag on other peoples' buildfarm configurations > right before a release deadline :-(. Sorry to cause you trouble. No worries - I knew that I'd have to do this at some point, even though I hadn't planned to do that today... I should have all of them green before end of today. I found that I actually can build LLVM 3.9 directly, as clang-6 can still build it directly (wheras the oldest gcc still installed can't build it directly). So it's a bit less painful than I thought at first The 3.9 instances (phycodurus, dragonet) tests are running right now, and I'm fairly sure they'll pass (most of a --noreport --nostatus run passed). Going forward building LLVM 4,5,6 now - the later versions take longer... Greetings, Andres Freund
Re: PG 14 release notes, first draft
On Mon, May 10, 2021 at 02:51:28PM +0800, Julien Rouhaud wrote: > On Mon, May 10, 2021 at 02:03:08AM -0400, Bruce Momjian wrote: > > I have committed the first draft of the PG 14 release notes. You can > > see the most current build of them here: > > > > https://momjian.us/pgsql_docs/release-14.html > > > > I need clarification on many items, and the document still needs its > > items properly ordered, and markup added. I also expect a lot of > > feedback. > > There's a small typo: > > +Improve tab completion (Vignesh C,, Michael [...] > > (duplicated comma) Fixed. > Also > > + > +Avoid retrieval of CHECK constraints and DEFAULT exprs in data-only dump > (Julien Rouhaud) > + > + > + > +IS THIS BACKWARD INCOMPATIBLE? > + > + > > The new behavior doesn't have any impact on the generated dump, as the > modification is to avoid retrieving data that won't be used. > > For users, it only means maybe slight faster pg_dump execution, or slightly > better change to be able to run a pg_dump --data-only if pg_constraint is > corrupted but not the rest of the user data, so maybe it's not necessary to > mention that in the release notes? Thanks, removed. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: [Patch] ALTER SYSTEM READ ONLY
On Mon, May 10, 2021 at 9:21 PM Robert Haas wrote: > > On Sun, May 9, 2021 at 1:26 AM Amul Sul wrote: > > The state in the control file also gets cleared. Though, after > > clearing in memory the state patch doesn't really do the immediate > > change to the control file, it relies on the next UpdateControlFile() > > to do that. > > But when will that happen? If you're relying on some very nearby code, > that might be OK, but perhaps a comment is in order. If you're just > thinking it's going to happen eventually, I think that's not good > enough. > Ok. > > Regarding log message I think I have skipped that intentionally, to > > avoid confusing log as "system is now read write" when we do start as > > hot-standby which is not really read-write. > > I think the message should not be phrased that way. In fact, I think > now that we've moved to calling this pg_prohibit_wal() rather than > ALTER SYSTEM READ ONLY, a lot of messages need to be rethought, and > maybe some comments and function names as well. Perhaps something > like: > > system is read only -> WAL is now prohibited > system is read write -> WAL is no longer prohibited > > And then for this particular case, maybe something like: > > clearing WAL prohibition because the system is in archive recovery > Ok, thanks for the suggestions. > > > The second part of this proposal was: > > > > > > "2. Create a new function with a name like XLogAcceptWrites(). Move the > > > following things from StartupXLOG() into that function: (1) the call > > > to UpdateFullPageWrites(), (2) the following block of code that does > > > either CreateEndOfRecoveryRecord() or RequestCheckpoint() or > > > CreateCheckPoint(), (3) the next block of code that runs > > > recovery_end_command, (4) the call to XLogReportParameters(), and (5) > > > the call to CompleteCommitTsInitialization(). Call the new function > > > from the place where we now call XLogReportParameters(). This would > > > mean that (1)-(3) happen later than they do now, which might require > > > some adjustments." > > > > > > Now you moved that code, but you also moved (6) > > > CompleteCommitTsInitialization(), (7) setting the control file to > > > DB_IN_PRODUCTION, (8) setting the state to RECOVERY_STATE_DONE, and > > > (9) requesting a checkpoint if we were just promoted. That's not what > > > was proposed. One result of this is that the server now thinks it's in > > > recovery even after the startup process has exited. > > > RecoveryInProgress() is still returning true everywhere. But that is > > > inconsistent with what Andres and I were recommending in > > > http://postgr.es/m/CA+TgmoZYQN=rcye-ixwnjdvmaoh+7jaqsif3u2k8xqxipba...@mail.gmail.com > > > > Regarding modified approach, I tried to explain that why I did > > this in > > http://postgr.es/m/CAAJ_b96Yb4jaW6oU1bVYEBaf=tq-ql+mmt1exfwvnzer7xr...@mail.gmail.com > > I am not able to understand what problem you are seeing there. If > we're in crash recovery, then nobody can connect to the database, so > there can't be any concurrent activity. If we're in archive recovery, > we now clear the WAL-is-prohibited flag so that we will go read-write > directly at the end of recovery. We can and should refuse any effort > to call pg_prohibit_wal() during recovery. If we reached the end of > crash recovery and are now permitting read-only connections, why would > anyone be able to write WAL before the system has been changed to > read-write? If that can happen, it's a bug, not a reason to change the > design. > > Maybe your concern here is about ordering: the process that is going > to run XLogAcceptWrites() needs to allow xlog writes locally before we > tell other backends that they also can xlog writes; otherwise, some > other records could slip in before UpdateFullPageWrites() and similar > have run, which we probably don't want. But that's why > LocalSetXLogInsertAllowed() was invented, and if it doesn't quite do > what we need in this situation, we should be able to tweak it so it > does. > Yes, we don't want any write slip in before UpdateFullPageWrites(). Recently[1], we have decided to let the Checkpointed process call XLogAcceptWrites() unconditionally. Here problem is that when a backend executes the pg_prohibit_wal(false) function to make the system read-write, the wal prohibited state is set to inprogress(ie. WALPROHIBIT_STATE_GOING_READ_WRITE) and then Checkpointer is signaled. Next, Checkpointer will convey this system change to all existing backends using a global barrier, and after that final wal prohibited state is set to the read-write(i.e. WALPROHIBIT_STATE_READ_WRITE). While Checkpointer is in the progress of conveying this global barrier, any new backend can connect at that time and can write a new record because the inprogress read-write state is equivalent to the final read-write state iff LocalXLogInsertAllowed != 0 for that backend. And, that new record could slip in before or in between records to be written by XLogAcceptWrites().
Re: PG 14 release notes, first draft
On Mon, May 10, 2021 at 05:28:24PM +0900, Masahiko Sawada wrote: > On Mon, May 10, 2021 at 3:03 PM Bruce Momjian wrote: > > > > I have committed the first draft of the PG 14 release notes. You can > > see the most current build of them here: > > > > https://momjian.us/pgsql_docs/release-14.html > > Thank you! > > > Add system view pg_stat_replication_slots to report replication slot > > activity (Sawada Masahiko, Amit Kapila) > > > > Function pg_stat_reset_replication_slot() resets slot statistics. THIS IS > > LOGICAL ONLY, BUT NO "LOGICAL" IN THE NAME? > > IIUC pg_stat_replication_slots view supports only logical slot for > now. But we might have it show also physical slot in the future. I'm > fine with the current view name and description but someone might want > to use "logical replication slot" instead of just "replication slot". OK, I was just confirming we are happy with the name. > > > IS "ACTIVITY" THE RIGHT WORD? > > The doc says "The pg_stat_replication_slots view will contain one row > per logical replication slot, showing statistics about its usage.". So > we can say "... to report replication slot statistics about its > usage". OK, I think I prefer "activity" so will just keep that. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: pg_stat_statements requires compute_query_id
On Mon, May 10, 2021 at 7:43 AM Julien Rouhaud wrote: > On Mon, May 10, 2021 at 04:36:16PM +0200, Pavel Stehule wrote: > > I expected just an empty column query_id and workable extension. This > > doesn't look well. > > > > More, it increases the (little bit) complexity of migration to Postgres 14. > > This was already raised multiple times, and the latest discussion can be found > at [1]. > > Multiple options have been suggested, but AFAICT there isn't a clear consensus > on what we should do exactly, so I've not been able to send a fix yet. > > [1]: > https://www.postgresql.org/message-id/flat/35457b09-36f8-add3-1d07-6034fa585ca8%40oss.nttdata.com Before it petered out, the thread seemed to be converging toward consensus that the situation could be improved. I work on pganalyze, and our product requires pg_stat_statements to be enabled for a lot of its functionality. We guide our users through enabling it, but if additional steps are required in 14, that may be confusing. I don't have any strong feelings on the particular mechanism that would work best here, but it would be nice if enabling pg_stat_statements in 14 did not require more work than in 13. Even if it's just one extra setting, it's another potential thing to get wrong and have to troubleshoot, plus it means all existing pg_stat_statements guides out there would become out of date. Thanks, Maciek
Re: PG 14 release notes, first draft
On Mon, May 10, 2021 at 08:52:44PM +1200, David Rowley wrote: > On Mon, 10 May 2021 at 18:03, Bruce Momjian wrote: > > I need clarification on many items, and the document still needs its > > items properly ordered, and markup added. I also expect a lot of > > feedback. > > Thanks for drafting that up. > > > Add executor method to cache results from the inner-side of joins (David > > Rowley) > > I think it would be more accurate to say "inner-side of nested loop joins". OK, thanks. I suspected that was true. > > Allow efficient retrieval of heap rows via tid (Edmund Horner, David Rowley) > > I'd say we already had that feature with TID Scan. Maybe it would be > better to write: > > "Allow efficient heap scanning on ranges of tids (Edmund Horner, David > Rowley)" I went with: Allow efficient heap scanning of a range of tids (Edmund Horner, David Rowley) > > Improve the performance of parallel sequential scans (Thomas Munro, David > > Rowley) > > I think it is worth mentioning "I/O" before "performance". This > change won't really help cases if all the table's pages are already in > shared buffers. I went with: Improve the performance of parallel sequential I/O scans (Thomas Munro, David Rowley) -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: pg_stat_statements requires compute_query_id
po 10. 5. 2021 v 19:03 odesílatel Maciek Sakrejda napsal: > On Mon, May 10, 2021 at 7:43 AM Julien Rouhaud wrote: > > On Mon, May 10, 2021 at 04:36:16PM +0200, Pavel Stehule wrote: > > > I expected just an empty column query_id and workable extension. This > > > doesn't look well. > > > > > > More, it increases the (little bit) complexity of migration to > Postgres 14. > > > > This was already raised multiple times, and the latest discussion can be > found > > at [1]. > > > > Multiple options have been suggested, but AFAICT there isn't a clear > consensus > > on what we should do exactly, so I've not been able to send a fix yet. > > > > [1]: > https://www.postgresql.org/message-id/flat/35457b09-36f8-add3-1d07-6034fa585ca8%40oss.nttdata.com > > Before it petered out, the thread seemed to be converging toward > consensus that the situation could be improved. I work on pganalyze, > and our product requires pg_stat_statements to be enabled for a lot of > its functionality. We guide our users through enabling it, but if > additional steps are required in 14, that may be confusing. I don't > have any strong feelings on the particular mechanism that would work > best here, but it would be nice if enabling pg_stat_statements in 14 > did not require more work than in 13. Even if it's just one extra > setting, it's another potential thing to get wrong and have to > troubleshoot, plus it means all existing pg_stat_statements guides out > there would become out of date. > +1 minimally it requires extra notes in some migration guide. I understand so queryid is one from key values. So it is not possible to merge data with and without a queryid. My idea about the best solution is something like pg_stat_statements can work without a queryid, and when the compute_query_id is changed, then the values stored in pg_stat_statements are resetted. I have no idea if it can be implemented. Current state is not user friendly. The people who know the previous behaviour can be very confused. Regards Pavel > Thanks, > Maciek >
Re: PG 14 release notes, first draft
On Mon, May 10, 2021 at 01:44:12PM +0200, Matthias van de Meent wrote: > On Mon, 10 May 2021 at 08:03, Bruce Momjian wrote: > > > > I have committed the first draft of the PG 14 release notes. You can > > see the most current build of them here: > > https://momjian.us/pgsql_docs/release-14.html > > > > I need clarification on many items, and the document still needs its > > items properly ordered, and markup added. I also expect a lot of > > feedback. > > I noticed that the improvement in bloat control in the HeapAM that I > know of (3c3b8a4b, 0ff8bbde) weren't documented here. Although each > can be considered minor, they together can decrease the bloating > behaviour of certain workloads significantly (and limit the total > damage), and in my opinion this should be mentioned. > > 3c3b8a4b: Returns space claimed for the line pointer array back to the > page's empty space, so that it can also be used for tuple data. > > 0ff8bbde: Allows large tuples to be inserted on pages which have only > a small amount of data, regardless of fillfactor. > > Together they should be able to help significantly in both bloat > prevention and bloat reduction. I looked at those items. I try to mention performance items that enable new workloads or require some user action to benefit from it. I am not sure these two qualify, but can others comments? Thanks. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: PG 14 release notes, first draft
On Mon, 2021-05-10 at 12:38 -0400, Bruce Momjian wrote: > I came up with this release note text: > > > > > > When editing the previous query or a file with psql's \e, ignore the > contents if the editor exits without saving (Laurenz Albe) > > > > Previously, editing the previous query or a file and not saving the > editor contents would still execute the editor contents. The \ef and > \ev commands also now have this behavior. > > Thanks, that looks much better. The second paragraph starts describing the previous behavior, but the second sentence details on the changes. Perhaps it would be better to put that into the first paragraph: When editing the previous query or a file with psql's \e, or when a view or function definition are edited with \ev or \ef, ignore the contents if the editor exits without saving (Laurenz Albe) Previously, editing the previous query or a file and not saving the editor contents would still execute the editor contents. Yours, Laurenz Albe
Re: Reducing opr_sanity test's runtime under CLOBBER_CACHE_ALWAYS
Hi, On 2021-05-08 15:44:57 -0400, Tom Lane wrote: > In a nearby thread I bemoaned the fact that the core regression tests > seem to have gotten significantly slower in the last couple of months, > at least with CCA enabled: hyrax reports completing them in 12:52:44 > on 18 March, while its most recent run on 1 May took 14:08:18. > > Trying to diagnose the cause overall seemed a bit daunting, but > I thought I'd dig into the opr_sanity test in particular, as it > is one of the slower tests under CCA to start with and had also > slowed down noticeably (from 3701581 ms to 4761183 ms, or 28%). > I was able to complete a bisection using just that test, and > got an unexpected result: most of the slowdown appeared at > ab596105b (BRIN minmax-multi indexes). Apparently the additional > time is simply from having to check the additional pg_amop and > pg_amproc entries, which that patch added quite a few of. I suspect that it might be not just that. From a quick profile it looks like debug_invalidate_system_caches_always spends a good chunk of its time in ResetCatalogCache() and hash_seq_search(). Those cost linearly with the size of the underlying hash tables. Wo what what might be happening is that the additional catalog entries pushed some of the catcache hash tables into growing (RehashCatCache()). Which then makes all subsequent ResetCatalogCache() scans slower. Not that that changes much - your proposed fix still seems reasonable. > I noticed that all of the slowest queries in that test were dependent > on the binary_coercible() plpgsql function that it uses. Now, that > function has always been a rather lame attempt to approximate the > behavior of the parser's IsBinaryCoercible() function, so I've been > thinking for some time that we ought to get rid of it in favor of > actually using IsBinaryCoercible(). I tried that, by adding a > shim function to regress.c, and got a most gratifying result: > on my machine opr_sanity's runtime with > debug_invalidate_system_caches_always = 1 drops from > 29m9s to 3m19s. Without CCA the speedup is far less impressive, > 360ms to 305ms, but that's still useful. Especially since this > makes the test strictly more accurate. Cool! > Anyway, I propose that we ought to sneak this into HEAD, since > it's only touching test code and not anything production-critical. +1 Greetings, Andres Freund
Re: PG 14 release notes, first draft
On Mon, May 10, 2021 at 07:39:17PM +0200, Laurenz Albe wrote: > On Mon, 2021-05-10 at 12:38 -0400, Bruce Momjian wrote: > > I came up with this release note text: > > > > > > > > > > > > When editing the previous query or a file with psql's \e, ignore the > > contents if the editor exits without saving (Laurenz Albe) > > > > > > > > Previously, editing the previous query or a file and not saving the > > editor contents would still execute the editor contents. The \ef and > > \ev commands also now have this behavior. > > > > > > Thanks, that looks much better. > > The second paragraph starts describing the previous behavior, but the second > sentence details on the changes. Perhaps it would be better to put that into > the first paragraph: > > > When editing the previous query or a file with psql's \e, or when a > view or function definition are edited with \ev or \ef, ignore the > contents if the editor exits without saving (Laurenz Albe) > > > > Previously, editing the previous query or a file and not saving the > editor contents would still execute the editor contents. > Uh, I try to keep the first sentence short so people can scan it more easily, so I am hesitant to make this change. I went with this change: When editing the previous query or a file with psql's \e, or using \ef and \ev, ignore the contents if the editor exits without saving (Laurenz Albe) Previously, such edits would still execute the editor contents. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Reducing opr_sanity test's runtime under CLOBBER_CACHE_ALWAYS
Andres Freund writes: > On 2021-05-08 15:44:57 -0400, Tom Lane wrote: >> I was able to complete a bisection using just that test, and >> got an unexpected result: most of the slowdown appeared at >> ab596105b (BRIN minmax-multi indexes). Apparently the additional >> time is simply from having to check the additional pg_amop and >> pg_amproc entries, which that patch added quite a few of. > I suspect that it might be not just that. From a quick profile it looks > like debug_invalidate_system_caches_always spends a good chunk of its > time in ResetCatalogCache() and hash_seq_search(). Those cost linearly > with the size of the underlying hash tables. > Wo what what might be happening is that the additional catalog entries > pushed some of the catcache hash tables into growing > (RehashCatCache()). Which then makes all subsequent ResetCatalogCache() > scans slower. Hm. But constantly flushing the caches should mean that they're never populated with very many entries at one time, which ought to forestall that, at least to some extent. I wonder if there's anything we could do to make ResetCatalogCache faster? It wouldn't help much for normal execution of course, but it might do something to bring CCA testing time down out of the stratosphere. regards, tom lane
Re: AlterSubscription_refresh "wrconn" wrong variable?
On 2021-May-10, Peter Smith wrote: > PSA v5 of the patch. It is the same as v4 but with the v4-0001 part > omitted because that was already pushed. I made a few whitespace adjustments on Friday that I didn't get time to push, so I left the whole set to after the minors are finalized this week. I'll get them pushed on Wednesday or so. (The back branches have a few conflicts, on every release, but I see no reason to post those and it'd upset the cfbot). -- Álvaro Herrera Valdivia, Chile >From 8494316bef64cde7476146bce0bda9a93890def7 Mon Sep 17 00:00:00 2001 From: Peter Smith Date: Fri, 7 May 2021 17:51:12 +1000 Subject: [PATCH v6] Rename the logical replication global wrconn. The worker.c global wrconn is only meant to be used for logical apply/tablesync workers, but there are other variables with the same name. To reduce future confusion the global has renamed from "wrconn" to "LogRepWorkerWalRcvConn". --- src/backend/replication/logical/launcher.c | 4 +-- src/backend/replication/logical/tablesync.c | 35 - src/backend/replication/logical/worker.c| 23 +++--- src/include/replication/worker_internal.h | 2 +- 4 files changed, 36 insertions(+), 28 deletions(-) diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c index cb462a052a..f97ab12675 100644 --- a/src/backend/replication/logical/launcher.c +++ b/src/backend/replication/logical/launcher.c @@ -649,8 +649,8 @@ static void logicalrep_worker_onexit(int code, Datum arg) { /* Disconnect gracefully from the remote side. */ - if (wrconn) - walrcv_disconnect(wrconn); + if (LogRepWorkerWalRcvConn) + walrcv_disconnect(LogRepWorkerWalRcvConn); logicalrep_worker_detach(); diff --git a/src/backend/replication/logical/tablesync.c b/src/backend/replication/logical/tablesync.c index 0638f5c7f8..67f907cdd9 100644 --- a/src/backend/replication/logical/tablesync.c +++ b/src/backend/replication/logical/tablesync.c @@ -302,8 +302,11 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn) MyLogicalRepWorker->relstate, MyLogicalRepWorker->relstate_lsn); - /* End wal streaming so wrconn can be re-used to drop the slot. */ - walrcv_endstreaming(wrconn, &tli); + /* + * End streaming so that LogRepWorkerWalRcvConn can be used to drop + * the slot. + */ + walrcv_endstreaming(LogRepWorkerWalRcvConn, &tli); /* * Cleanup the tablesync slot. @@ -322,7 +325,7 @@ process_syncing_tables_for_sync(XLogRecPtr current_lsn) * otherwise, it won't be dropped till the corresponding subscription * is dropped. So passing missing_ok = false. */ - ReplicationSlotDropAtPubNode(wrconn, syncslotname, false); + ReplicationSlotDropAtPubNode(LogRepWorkerWalRcvConn, syncslotname, false); finish_sync_worker(); } @@ -642,7 +645,7 @@ copy_read_data(void *outbuf, int minread, int maxread) for (;;) { /* Try read the data. */ - len = walrcv_receive(wrconn, &buf, &fd); + len = walrcv_receive(LogRepWorkerWalRcvConn, &buf, &fd); CHECK_FOR_INTERRUPTS(); @@ -715,7 +718,8 @@ fetch_remote_table_info(char *nspname, char *relname, " AND c.relname = %s", quote_literal_cstr(nspname), quote_literal_cstr(relname)); - res = walrcv_exec(wrconn, cmd.data, lengthof(tableRow), tableRow); + res = walrcv_exec(LogRepWorkerWalRcvConn, cmd.data, + lengthof(tableRow), tableRow); if (res->status != WALRCV_OK_TUPLES) ereport(ERROR, @@ -752,9 +756,11 @@ fetch_remote_table_info(char *nspname, char *relname, " AND a.attrelid = %u" " ORDER BY a.attnum", lrel->remoteid, - (walrcv_server_version(wrconn) >= 12 ? "AND a.attgenerated = ''" : ""), + (walrcv_server_version(LogRepWorkerWalRcvConn) >= 12 ? + "AND a.attgenerated = ''" : ""), lrel->remoteid); - res = walrcv_exec(wrconn, cmd.data, lengthof(attrRow), attrRow); + res = walrcv_exec(LogRepWorkerWalRcvConn, cmd.data, + lengthof(attrRow), attrRow); if (res->status != WALRCV_OK_TUPLES) ereport(ERROR, @@ -841,7 +847,7 @@ copy_table(Relation rel) appendStringInfo(&cmd, " FROM %s) TO STDOUT", quote_qualified_identifier(lrel.nspname, lrel.relname)); } - res = walrcv_exec(wrconn, cmd.data, 0, NULL); + res = walrcv_exec(LogRepWorkerWalRcvConn, cmd.data, 0, NULL); pfree(cmd.data); if (res->status != WALRCV_OK_COPY_OUT) ereport(ERROR, @@ -957,8 +963,9 @@ LogicalRepSyncTableStart(XLogRecPtr *origin_startpos) * application_name, so that it is different from the main apply worker, * so that synchronous replication can distinguish them. */ - wrconn = walrcv_connect(MySubscription->conninfo, true, slotname, &err); - if (wrconn == NULL) + LogRepWorkerWalRcvConn = + walrcv_connect(MySubscription->conninfo, true, slotname, &err); + if (LogRepWorkerWalRcvConn == NULL) ereport(ERROR, (errmsg("could not connect to the publisher: %s", err))); @@ -985,7 +992,7 @@ LogicalRepS
Re: Reducing opr_sanity test's runtime under CLOBBER_CACHE_ALWAYS
Hi, On 2021-05-10 14:06:16 -0400, Tom Lane wrote: > Hm. But constantly flushing the caches should mean that they're never > populated with very many entries at one time, which ought to forestall > that, at least to some extent. That's probably true... > I wonder if there's anything we could do to make ResetCatalogCache > faster? It wouldn't help much for normal execution of course, > but it might do something to bring CCA testing time down out of > the stratosphere. We could make the hashtables shrink, not just grow... There's also the issue that most people, I assume, run CCA tests with -O0. In a quick test that does make a big difference in e.g. ResetCatalogCache(). I just added a function specific annotation to optimize just that function and the overall time in my test shrank 10% or so. Greetings, Andres Freund
libpq_pipeline in tmp_install
The test program libpq_pipeline produced by the test suite in src/test/modules/libpq_pipeline/ is installed into tmp_install as part of make check. This isn't a real problem by itself, but I think it creates a bit of an asymmetric situation that might be worth cleaning up. Before, the contents of tmp_install exactly matched an actual installation. There were no extra test programs installed. Also, the test suite code doesn't actually use that installed version, so it's not of any use, and it creates confusion about which copy is in use. The reason this is there is that the test suite uses PGXS to build the test program, and so things get installed automatically. I suggest that we should either write out the build system by hand to avoid this, or maybe extend PGXS to support building programs but not installing them. The advantage of the former approach is that it would allow additional test programs to be added later as well. (We should really collect the libpq tests under src/interfaces/libpq/ anyway at some point.)
Re: Another modest proposal for reducing CLOBBER_CACHE_ALWAYS runtime
David Rowley writes: > On Mon, 10 May 2021 at 18:04, Tom Lane wrote: >> real293m31.054s >> to >> real1m47.807s >> Yes, really. > That's quite impressive. > I've very much in favour of this change. Making it more realistic to > run the regression tests on a CLOBBER_CACHE_ALWAYS builds before a > commit is a very worthy goal and this is a big step towards that. > Nice. It occurred to me to check hyrax's results on the older branches (it also tests v12 and v13), and the slope of the curve is bad: Branch Latest "check" phase runtime HEAD13:56:11 v13 11:00:33 v12 6:05:30 Seems like we'd better do something about that. About 2.5 hours worth of the jump from 12 to 13 can be blamed on the privileges test, looks like. The slowdown in that evidently can be blamed on 0c882e52a86, which added this: +-- results below depend on having quite accurate stats for atest12 +SET default_statistics_target = 1; VACUUM ANALYZE atest12; +RESET default_statistics_target; The slow queries in that test all cause the planner to apply the "leak()" function to every histogram entry for atest12, so this one change caused a 100X increase in the amount of work there. I find it a bit remarkable that we barely noticed that in normal operation. In CCA mode, though, each leak() call takes circa 100ms (at least on my workstation), so kaboom. Anyway, I'm now feeling that what I should do with this patch is wait for the release cycle to finish and then apply it to v13 as well as HEAD. The other patch I proposed, to cut opr_sanity's runtime, may be too invasive for back-patch. More generally, there is an upward creep in the test runtimes that doesn't seem to be entirely accounted for by our constantly adding more tests. I am suspicious that plpgsql may be largely to blame for this. The smoking gun I found for that is the runtimes for the plpgsql_control test, which hasn't changed *at all* since it was added in v11; but hyrax shows these runtimes: HEAD: test plpgsql_control ... ok56105 ms v13: test plpgsql_control ... ok46879 ms v12: test plpgsql_control ... ok30809 ms In normal builds that test's time has held pretty steady. So I'm not sure what's underneath this rock, but I plan to try to find out. regards, tom lane
Re: Reducing opr_sanity test's runtime under CLOBBER_CACHE_ALWAYS
Andres Freund writes: > On 2021-05-10 14:06:16 -0400, Tom Lane wrote: >> I wonder if there's anything we could do to make ResetCatalogCache >> faster? It wouldn't help much for normal execution of course, >> but it might do something to bring CCA testing time down out of >> the stratosphere. > We could make the hashtables shrink, not just grow... Maybe ... > There's also the issue that most people, I assume, run CCA tests with -O0. In > a quick test that does make a big difference in e.g. ResetCatalogCache(). I > just added a function specific annotation to optimize just that function and > the overall time in my test shrank 10% or so. If they do I think they're nuts ;-). CCA is slow enough already without hobbling it. hyrax appears to use the usual -O2, as does/did avocet. Not sure if we have any other CCA buildfarm members right now. regards, tom lane
Re: PG 14 release notes, first draft
Hi Bruce, Thanks for doing this work again! > Add date_bin function (John Naylor) > > WHAT DOES THIS DO? Hard to describe in a one-liner, but it lines up timestamps into regular intervals as specified by the user. It is more clear after seeing examples: https://www.postgresql.org/docs/devel/functions-datetime.html#FUNCTIONS-DATETIME-BIN > Dramatically improve Unicode normalization (John Naylor) > > WHAT OPERATIONS USE THIS? PG13 added the normalize() function to normalize Unicode sequences, as well as the IS NORMALIZED syntax to test for that. The commits* here do not change behavior and only improve performance. As such, this really belongs in the performance section. *There is one additional commit that belongs to this entry: Author: Michael Paquier 2020-10-11 [80f8eb79e] Use perfect hash for NFC and NFKC Unicode Normalization quick check -- John Naylor EDB: http://www.enterprisedb.com
Re: Reducing opr_sanity test's runtime under CLOBBER_CACHE_ALWAYS
Andres Freund writes: > On 2021-05-10 14:06:16 -0400, Tom Lane wrote: >> I wonder if there's anything we could do to make ResetCatalogCache >> faster? It wouldn't help much for normal execution of course, >> but it might do something to bring CCA testing time down out of >> the stratosphere. > We could make the hashtables shrink, not just grow... I noticed that we already have counters that can tell whether a catcache or dynahash table is empty, so I experimented with the attached patch. Testing one of the slow queries from privileges.sql (which might not be very representative of the overall issue), I see: HEAD: Time: 536429.715 ms (08:56.430) with ResetCatalogCache hack: Time: 488938.597 ms (08:08.939) plus hash_seq_search hack: Time: 475400.634 ms (07:55.401) Of course, the issue with these patches is that they change these counters from things that (I think) we only trust for statistical purposes into things that had better be right or you're going to have impossible-to-track-down bugs with sometimes failing to invalidate cache entries. My gut feeling is that the risk-to-reward ratio is worth it for changing ResetCatalogCache, but not for hash_seq_search. This is partly because of the greater absolute payback and partly because ResetCatalogCache doesn't deal with shared data structures, reducing the risk of counting issues. regards, tom lane diff --git a/src/backend/utils/cache/catcache.c b/src/backend/utils/cache/catcache.c index 4fbdc62d8c..df5f219254 100644 --- a/src/backend/utils/cache/catcache.c +++ b/src/backend/utils/cache/catcache.c @@ -644,6 +644,10 @@ ResetCatalogCache(CatCache *cache) dlist_mutable_iter iter; int i; + /* If the cache is entirely empty, there's nothing to do */ + if (cache->cc_ntup == 0) + return; + /* Remove each list in this cache, or at least mark it dead */ dlist_foreach_modify(iter, &cache->cc_lists) { diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c index 6546e3c7c7..1ac87725c0 100644 --- a/src/backend/utils/hash/dynahash.c +++ b/src/backend/utils/hash/dynahash.c @@ -1455,11 +1455,23 @@ hash_seq_search(HASH_SEQ_STATUS *status) } /* - * Search for next nonempty bucket starting at curBucket. + * Nothing to do if hash table is entirely empty. (For a partitioned + * hash table, we'd have to check all the freelists, which is unlikely + * to be a win.) */ - curBucket = status->curBucket; hashp = status->hashp; hctl = hashp->hctl; + if (hctl->freeList[0].nentries == 0 && + !IS_PARTITIONED(hctl)) + { + hash_seq_term(status); + return NULL; /* table is empty */ + } + + /* + * Search for next nonempty bucket starting at curBucket. + */ + curBucket = status->curBucket; ssize = hashp->ssize; max_bucket = hctl->max_bucket;
Re: PG 14 release notes, first draft
On Mon, 10 May 2021 at 19:34, Bruce Momjian wrote: > > On Mon, May 10, 2021 at 01:44:12PM +0200, Matthias van de Meent wrote: > > On Mon, 10 May 2021 at 08:03, Bruce Momjian wrote: > > > > > > I have committed the first draft of the PG 14 release notes. You can > > > see the most current build of them here: > > > https://momjian.us/pgsql_docs/release-14.html > > > > > > I need clarification on many items, and the document still needs its > > > items properly ordered, and markup added. I also expect a lot of > > > feedback. > > > > I noticed that the improvement in bloat control in the HeapAM that I > > know of (3c3b8a4b, 0ff8bbde) weren't documented here. Although each > > can be considered minor, they together can decrease the bloating > > behaviour of certain workloads significantly (and limit the total > > damage), and in my opinion this should be mentioned. > > > > 3c3b8a4b: Returns space claimed for the line pointer array back to the > > page's empty space, so that it can also be used for tuple data. > > > > 0ff8bbde: Allows large tuples to be inserted on pages which have only > > a small amount of data, regardless of fillfactor. > > > > Together they should be able to help significantly in both bloat > > prevention and bloat reduction. > > I looked at those items. I try to mention performance items that enable > new workloads or require some user action to benefit from it. 0ff8bbde Enables a workload that inserts (and non-locally updates) large (> FILLFACTOR %) tuples in tables that have a low FILLFACTOR. Previously this would fail dramatically by only inserting on new pages; this would extend the table indefinately. See the thread [0] 3c3b8a4b improves workloads with high local update-then-delete churn. Previously this would irreversably claim space on the page for tuple identifiers even when they were later deleted; now we can reclaim this space when a tuple is deleted from the page. I see these two improvements in a similar light as the bottom-up index deletion in btree: No user action required, works out-of-the-box, decreases bloat / disk usage, but good to note as it fixes (known) bloating footguns that a user might have encountered. > I am not sure these two qualify, but can others comments? Thanks. I'd like to refer to Peter Geoghegan's reply [1] upthread. Thank you for your effort, Matthias van de Meent [0] https://www.postgresql.org/message-id/flat/6e263217180649339720afe2176c50aa%40opammb0562.comp.optiver.com [1] https://www.postgresql.org/message-id/CAH2-Wz%3D-A%3DjRxpB2Owj3KQadCue7%2BNLqj56Q566ees7TapMRvA%40mail.gmail.com
Re: Reducing opr_sanity test's runtime under CLOBBER_CACHE_ALWAYS
Hi, On 2021-05-10 16:17:18 -0400, Tom Lane wrote: > I noticed that we already have counters that can tell whether a > catcache or dynahash table is empty, so I experimented with the > attached patch. Testing one of the slow queries from privileges.sql > (which might not be very representative of the overall issue), > I see: > > HEAD: > Time: 536429.715 ms (08:56.430) > > with ResetCatalogCache hack: > Time: 488938.597 ms (08:08.939) > > plus hash_seq_search hack: > Time: 475400.634 ms (07:55.401) Oh, nice. Perhaps we generally ought to lower the initial sycache sizes further? 20cb18db4668 did that, but we still have things like PROCNAMEARGNSP, PROCOID, RELNAMENSP, RELOID, STATRELATTINH, ... using 128 as the initial size. Not hard to imagine that some of these are larger than what simple workloads or CCA encounter. > Of course, the issue with these patches is that they change these > counters from things that (I think) we only trust for statistical > purposes into things that had better be right or you're going to > have impossible-to-track-down bugs with sometimes failing to > invalidate cache entries. My gut feeling is that the risk-to-reward > ratio is worth it for changing ResetCatalogCache, but not for > hash_seq_search. This is partly because of the greater absolute > payback and partly because ResetCatalogCache doesn't deal with > shared data structures, reducing the risk of counting issues. That sounds reasonable. We could mitigate the risk for dynahash by testing HASH_SHARED_MEM (which we don't store right now), but it's not clear it's worth it here. But I wonder if there's other cases where it could help? If we did make the check support shared memory *and* partitioned tables, I could easily see it be a win for things like LockReleaseAll(). Greetings, Andres Freund
Re: Enhanced error message to include hint messages for redundant options error
On 2021-May-10, vignesh C wrote: > That sounds fine to me, Attached v6 patch which has the changes for the same. What about defining a function (maybe a static inline function in defrem.h) that is marked noreturn and receives the DefElem and optionally pstate, and throws the error? I think that would avoid the patch's need to have half a dozen copies of the "duplicate_error:" label and ereport stanza. -- Álvaro Herrera39°49'30"S 73°17'W
Re: Reducing opr_sanity test's runtime under CLOBBER_CACHE_ALWAYS
On Tue, May 11, 2021 at 8:52 AM Andres Freund wrote: > ... If we did make the check support shared memory *and* > partitioned tables, I could easily see it be a win for things like > LockReleaseAll(). For that case, has the idea of maintaining a dlist of local locks been considered?
Re: PG 14 release notes, first draft
On Mon, May 10, 2021 at 07:54:24AM -0700, Peter Geoghegan wrote: > On Mon, May 10, 2021 at 4:44 AM Matthias van de Meent > wrote: > > I noticed that the improvement in bloat control in the HeapAM that I > > know of (3c3b8a4b, 0ff8bbde) weren't documented here. Although each > > can be considered minor, they together can decrease the bloating > > behaviour of certain workloads significantly (and limit the total > > damage), and in my opinion this should be mentioned. > > > > 3c3b8a4b: Returns space claimed for the line pointer array back to the > > page's empty space, so that it can also be used for tuple data. > > > > 0ff8bbde: Allows large tuples to be inserted on pages which have only > > a small amount of data, regardless of fillfactor. > > +1 on mentioning both things. OK, you are confirming what Matthias suggested. I added these two items, which both seem to apply only to heap pages, not index pages: --- Deallocate space reserved by trailing unused heap line pointers (Matthias van de Meent, Peter Geoghegan) --- Allow wide tuples to be always added to almost-empty heap pages (John Naylor, Floris van Nee) Previously tuples whose insertion would have exceeded the page's fill factor were instead added to new pages. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: PG 14 release notes, first draft
On Mon, May 10, 2021 at 3:58 PM Bruce Momjian wrote: > OK, you are confirming what Matthias suggested. I added these two > items, which both seem to apply only to heap pages, not index pages: That's right -- these two relate to heap pages only. I think that Matthias compared these two to bottom-up index deletion because all three patches are concerned about avoiding "a permanent solution to a temporary problem". They're conceptually similar despite being in fairly different areas. Evidently Matthias has a similar mental model to my own when it comes to this stuff. Unfortunately the practical significance of the line pointer patch is hard to demonstrate with a benchmark. I believe that it is very useful on a sufficiently long timeline and with certain workloads because of the behavior it avoids. As I pointed out on that other thread recently, once you have irreversible bloat very small adverse events will eventually add up and cause big problems. When this happens it'll be very hard or impossible to detect, since it just looks like heap fragmentation. That said, it's clearly an issue with one of the TPC-C tables if you run BenchmarkSQL for days and days (just one table, though). So there is hard evidence that line pointer bloat could get really out of hand in at least some tables. -- Peter Geoghegan
Re: Reducing opr_sanity test's runtime under CLOBBER_CACHE_ALWAYS
Hi, On 2021-05-11 10:57:03 +1200, Thomas Munro wrote: > On Tue, May 11, 2021 at 8:52 AM Andres Freund wrote: > > ... If we did make the check support shared memory *and* > > partitioned tables, I could easily see it be a win for things like > > LockReleaseAll(). Errr, that's not even a shared hashtable... So it would help even if we just excluded shared memory hashtables. > For that case, has the idea of maintaining a dlist of local locks been > considered? Yea, there's been a long discussion on that for LockReleaseAll(). Combined with alternatives around shrinking the hashtable... Greetings, Andres Freund
Is element access after HASH_REMOVE ever OK?
Hi, After hearing from a couple of directions about systems spending too much time scanning the local lock hash table, I wrote the trivial patch to put them in a linked list, before learning that people have considered that before, so I should probably go and read some history on that and find out why it hasn't been done... However, I noticed in passing that RemoveLocalLock() accesses *locallock after removing it from the hash table (in assertion builds only). So one question I have is whether it's actually a programming rule that you can't do that (at most you can compare the pointer against NULL), or whether it's supposed to be safe-if-you-know-what-you're-doing, as the existing comments hints. Here also is a patch that does wipe_mem on removed elements, as threatened last time this topic came up[1], which reveals the problem. I'm also not exactly sure why it's only a WARNING if your local lock table is out of sync, but perhaps that's in the archives too. [1] https://www.postgresql.org/message-id/flat/CAHut%2BPs-pL%2B%2Bf6CJwPx2%2BvUqXuew%3DXt-9Bi-6kCyxn%2BFwi2M7w%40mail.gmail.com From d267c3c0cb26ea97616ca1184ab13664acb4feb8 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 11 May 2021 11:11:15 +1200 Subject: [PATCH 1/3] Clobber memory released by HASH_REMOVE. --- src/backend/utils/hash/dynahash.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/src/backend/utils/hash/dynahash.c b/src/backend/utils/hash/dynahash.c index 6546e3c7c7..b977d0395c 100644 --- a/src/backend/utils/hash/dynahash.c +++ b/src/backend/utils/hash/dynahash.c @@ -102,6 +102,7 @@ #include "storage/shmem.h" #include "storage/spin.h" #include "utils/dynahash.h" +#include "utils/memdebug.h" #include "utils/memutils.h" @@ -1076,10 +1077,12 @@ hash_search_with_hash_value(HTAB *hashp, SpinLockRelease(&hctl->freeList[freelist_idx].mutex); /* - * better hope the caller is synchronizing access to this - * element, because someone else is going to reuse it the next - * time something is added to the table + * Clobber the memory, to find bugs in code that accesses + * it after removing. */ +#ifdef CLOBBER_FREED_MEMORY +wipe_mem(ELEMENTKEY(currBucket), hctl->entrysize); +#endif return (void *) ELEMENTKEY(currBucket); } return NULL; -- 2.30.2 From 1cf44ab0f7bfc9d5221ce19f162b3e07ab42c567 Mon Sep 17 00:00:00 2001 From: Thomas Munro Date: Tue, 11 May 2021 11:11:37 +1200 Subject: [PATCH 2/3] Don't access a local locks after freeing them. --- src/backend/storage/lmgr/lock.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/backend/storage/lmgr/lock.c b/src/backend/storage/lmgr/lock.c index 108b4d9023..8642afc275 100644 --- a/src/backend/storage/lmgr/lock.c +++ b/src/backend/storage/lmgr/lock.c @@ -1390,15 +1390,15 @@ RemoveLocalLock(LOCALLOCK *locallock) SpinLockRelease(&FastPathStrongRelationLocks->mutex); } - if (!hash_search(LockMethodLocalHash, - (void *) &(locallock->tag), - HASH_REMOVE, NULL)) - elog(WARNING, "locallock table corrupted"); - /* * Indicate that the lock is released for certain types of locks */ CheckAndSetLockHeld(locallock, false); + + if (!hash_search(LockMethodLocalHash, + (void *) &(locallock->tag), + HASH_REMOVE, NULL)) + elog(WARNING, "locallock table corrupted"); } /* -- 2.30.2
Re: Is element access after HASH_REMOVE ever OK?
Thomas Munro writes: > However, I noticed in passing that RemoveLocalLock() accesses > *locallock after removing it from the hash table (in assertion builds > only). So one question I have is whether it's actually a programming > rule that you can't do that (at most you can compare the pointer > against NULL), or whether it's supposed to be > safe-if-you-know-what-you're-doing, as the existing comments hints. I'd say it's, at best, unwarranted familiarity with the dynahash implementation ... > Here also is a patch that does wipe_mem on removed elements, as > threatened last time this topic came up[1], which reveals the problem. ... one good reason being that it'll fail under this sort of entirely-reasonable debugging aid. Can we get rid of the unsafe access easily? regards, tom lane
Re: Is element access after HASH_REMOVE ever OK?
I wrote: > ... Can we get rid of the unsafe > access easily? Oh, shoulda read your second patch first. Looking at that, I fear it might not be quite that simple, because the comment on CheckAndSetLockHeld says very clearly * It is callers responsibility that this function is called after * acquiring/releasing the relation extension/page lock. so your proposed patch violates that specification. I'm inclined to think that this API spec is very poorly thought out and should be changed --- why is it that the flags should change *after* the lock change in both directions? But we'd have to take a look at the usage of these flags to understand what's going on exactly. regards, tom lane
Re: seawasp failing, maybe in glibc allocator
On Mon, May 10, 2021 at 11:21 PM Fabien COELHO wrote: > > And of course this time it succeeded :-) > > Hmmm. ISTM that failures are on and off every few attempts. OK we got the SIGABRT this time, but still no backtrace. If the kernel's core_pattern is "core", gdb is installed, then considering that the buildfarm core_file_glob is "core*" and the script version is recent (REL_12), then I'm out of ideas. ulimit -c XXX shouldn't be needed because the perl script does that with rlimit.
Re: PG 14 release notes, first draft
Thanks for making the updates. On Tue, 11 May 2021 at 05:07, Bruce Momjian wrote: > > On Mon, May 10, 2021 at 08:52:44PM +1200, David Rowley wrote: > > > Improve the performance of parallel sequential scans (Thomas Munro, David > > > Rowley) > > > > I think it is worth mentioning "I/O" before "performance". This > > change won't really help cases if all the table's pages are already in > > shared buffers. > > I went with: > > Improve the performance of parallel sequential I/O scans (Thomas > Munro, > David Rowley) I think I'd have gone with: "Improve I/O performance of parallel sequential scans (Thomas Munro, David Rowley)" The operation we're speeding up is called sequential scan. We don't have any operation that's named sequential I/O scan. David
Zeroing partial pages in walreceiver
Hi, In https://www.postgresql.org/message-id/20210505010835.umylslxgq4a6rbwg%40alap3.anarazel.de I commented that we have a number of hacky workarounds to deal with the fact that walreceiver writes partial WAL pages into reycled segments. The problem with that practice is that within a page we cannot reliably detect invalid record headers. This is especially true, when the record header spans across a page boundary - currently the only check in that case is if the record length is smaller than 1GB, and even that is just checked in the frontend. Note that we cannot rely on the CRC checksum here - it can only be validated once the whole record has been read. On a primary we *do* pad partial pages, see AdvanceXLInsertBuffer(): /* * Be sure to re-zero the buffer so that bytes beyond what we've * written will look like zeroes and not valid XLOG records... */ MemSet((char *) NewPage, 0, XLOG_BLCKSZ); Particularly the logic in allocate_recordbuf() is scary: In a completely working setup we'll regularly try to allocate large buffers that we'll never need - and the record buffer is not freed until the startup process exits. And we have no corresponding size check in the frontend (which doesn't make any sense to me). In the case of a record header across a page boundary, this check will pass in roughly 1/4 of the cases! As an example of the difference this makes, I ran a primary/standby setup with continuously running regression tests, and had a psql \watch terminate walsender every 1.5 s. Decoding failures without zero-padding: 2021-05-10 16:52:51.448 PDT [2481446][1/0] LOG: record with incorrect prev-link 103FF/73 at 4/C154FD50 2021-05-10 16:52:53.001 PDT [2481446][1/0] LOG: record with incorrect prev-link 0/ at 4/C3531A88 2021-05-10 16:52:57.848 PDT [2481446][1/0] LOG: invalid resource manager ID 32 at 4/C3B67AD8 2021-05-10 16:52:58.773 PDT [2481446][1/0] LOG: record with incorrect prev-link 403FF/12 at 4/C47F35E8 2021-05-10 16:53:03.771 PDT [2481446][1/0] LOG: invalid page at 4/C562E000 2021-05-10 16:53:04.945 PDT [2481446][1/0] LOG: invalid record length at 4/C6E1C1E8: wanted 24, got 0 2021-05-10 16:53:06.176 PDT [2481446][1/0] LOG: invalid page at 4/C704 2021-05-10 16:53:07.624 PDT [2481446][1/0] LOG: record with incorrect prev-link 2FF/64 at 4/C7475078 ... With zero-padding: 2021-05-10 16:58:20.186 PDT [2489042][1/0] LOG: invalid record length at 5/7049A40: wanted 24, got 0 2021-05-10 16:58:22.832 PDT [2489042][1/0] LOG: invalid record length at 5/801AD70: wanted 24, got 0 2021-05-10 16:58:27.548 PDT [2489042][1/0] LOG: invalid record length at 5/8319908: wanted 24, got 0 2021-05-10 16:58:30.945 PDT [2489042][1/0] LOG: invalid record length at 5/AFDC770: wanted 24, got 0 ... 2021-05-10 16:59:24.546 PDT [2489042][1/0] LOG: invalid page at 5/19284000 The "invalid page" cases are a lot rarer - previously we would hit them whenever the record header itself passed [minimal] muster, even though it was just padding passing as e.g. a valid record length. Now it's only when the end of WAL actually is at the page boundary. On 13+ we could do a bit better than the current approach, and use pg_pwritev() to append the zeroed data. However, I'm not convinced it is a good idea - when pg_pwritev is emulated, we'd always do the zeroing as part of a separate write, which does seem like it increases the likelihood of encountering such partially written pages a bit. But perhaps it's too insignificant to matter. Greetings, Andres Freund diff --git c/src/backend/replication/walreceiver.c i/src/backend/replication/walreceiver.c index 9a0e3806fcf..3188743e59b 100644 --- c/src/backend/replication/walreceiver.c +++ i/src/backend/replication/walreceiver.c @@ -865,16 +865,38 @@ XLogWalRcvProcessMsg(unsigned char type, char *buf, Size len) /* * Write XLOG data to disk. + * + * This is a bit more complicated than immediately obvious: WAL files are + * recycled, which means they can contain almost arbitrary data. To ensure + * that we can reliably detect the end of the WAL, we need to pad partial + * pages with zeroes. If we didn't do so, the data beyond the end of the WAL + * is interpreted as a record header - while we can do some checks on the + * record, they are not bulletproof. Particularly because the record's CRC + * checksum can only be validated once the entire record has been read. + * + * There is no corresponding risk at the start of a page, because + * XLogPageHeader.xlp_pageaddr inside recycled segment data will always differ + * from the expected pageaddr. + * + * We cannot just pad 'buf' by whatever amount is necessary, as it is + * allocated outside of our control, preventing us from resizing it to be big + * enough. Copying the entire incoming data into a temporary buffer would be a + * noticeable overhead. Instead we separately write pages that need to be + * padded, and copy just that pa
Re: seawasp failing, maybe in glibc allocator
On 2021-05-11 12:16:44 +1200, Thomas Munro wrote: > OK we got the SIGABRT this time, but still no backtrace. If the > kernel's core_pattern is "core", gdb is installed, then considering > that the buildfarm core_file_glob is "core*" and the script version is > recent (REL_12), then I'm out of ideas. ulimit -c XXX shouldn't be > needed because the perl script does that with rlimit. Unless perhaps the hard rlimit for -C is set? ulimit -c -H should show that.
RE: Parallel scan with SubTransGetTopmostTransaction assert coredump
Hi Andres, Reproduce steps. 1, Modify and adjust NUM_SUBTRANS_BUFFERS to 128 from 32 in the file "src/include/access/subtrans.h" line number 15. 2, configure with enable assert and build it. 3, init a new database cluster. 4, modify postgres.conf and add some parameters as below. As the coredump from parallel scan, so we adjust parallel setting, make it easy to reproduce. max_connections = 2000 parallel_setup_cost=0 parallel_tuple_cost=0 min_parallel_table_scan_size=0 max_parallel_workers_per_gather=8 max_parallel_workers = 32 5, start the database cluster. 6, use the script init_test.sql in attachment to create tables. 7, use pgbench with script sub_120.sql in attachment to test it. Try it sometimes, you should get the coredump file. pgbench -d postgres -p 33550 -n -r -f sub_120.sql -c 200 -j 200 -T 120 Thanks Pengcheng -Original Message- From: Andres Freund Sent: 2021年5月7日 11:55 To: Pengchengliu Cc: pgsql-hack...@postgresql.org Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump Hi, On 2021-05-07 11:32:57 +0800, Pengchengliu wrote: > Hi Hackers, > > Last email, format error, missing some information, so I resend this email. > > With PG 13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), I tested > subtransaction with parallel scan, I got a subtransaction coredump as below: > So the root cause is the Parallel Workers process set the TransactionXmin > with later transcation snapshot. When parallel scan, Parallel Workers process > use the older active snapshot. > > It leads to subtrans assert coredump. I don't know how to fix it. Is there > any ideas? Do you have steps to reliably reproduce this? Greetings, Andres Freund
RE: Parallel scan with SubTransGetTopmostTransaction assert coredump
Hi Andres, Reproduce steps. 1, Modify and adjust NUM_SUBTRANS_BUFFERS to 128 from 32 in the file "src/include/access/subtrans.h" line number 15. 2, configure with enable assert and build it. 3, init a new database cluster. 4, modify postgres.conf and add some parameters as below. As the coredump from parallel scan, so we adjust parallel setting, make it easy to reproduce. max_connections = 2000 parallel_setup_cost=0 parallel_tuple_cost=0 min_parallel_table_scan_size=0 max_parallel_workers_per_gather=8 max_parallel_workers = 32 5, start the database cluster. 6, use the script init_test.sql in attachment to create tables. 7, use pgbench with script sub_120.sql in attachment to test it. Try it sometimes, you should get the coredump file. pgbench -d postgres -p 33550 -n -r -f sub_120.sql -c 200 -j 200 -T 120 Thanks Pengcheng -Original Message- From: Andres Freund Sent: 2021年5月7日 11:55 To: Pengchengliu Cc: pgsql-hack...@postgresql.org Subject: Re: Parallel scan with SubTransGetTopmostTransaction assert coredump Hi, On 2021-05-07 11:32:57 +0800, Pengchengliu wrote: > Hi Hackers, > > Last email, format error, missing some information, so I resend this email. > > With PG 13.2(3fb4c75e857adee3da4386e947ba58a75f3e74b7), I tested > subtransaction with parallel scan, I got a subtransaction coredump as below: > So the root cause is the Parallel Workers process set the TransactionXmin > with later transcation snapshot. When parallel scan, Parallel Workers process > use the older active snapshot. > > It leads to subtrans assert coredump. I don't know how to fix it. Is there > any ideas? Do you have steps to reliably reproduce this? Greetings, Andres Freund test_script.tar Description: Unix tar archive
Re: Is element access after HASH_REMOVE ever OK?
Hi, On 2021-05-10 20:15:41 -0400, Tom Lane wrote: > I wrote: > > ... Can we get rid of the unsafe > > access easily? > > Oh, shoulda read your second patch first. Looking at that, > I fear it might not be quite that simple, because the > comment on CheckAndSetLockHeld says very clearly > > * It is callers responsibility that this function is called after > * acquiring/releasing the relation extension/page lock. > > so your proposed patch violates that specification. It wouldn't be too hard to fix this though - we can just copy the locktag into a local variable. Or use one of the existing local copies, higher in the stack. But: > I'm inclined to think that this API spec is very poorly thought out > and should be changed --- why is it that the flags should change > *after* the lock change in both directions? But we'd have to take > a look at the usage of these flags to understand what's going on > exactly. I can't see a need to do it after the HASH_REMOVE at least - as we don't return early if that fails, there's no danger getting out of sync if we reverse the order. I think the comment could just be changed to say that the function has to be called after it is inevitable that the lock is acquired/released. Greetings, Andres Freund
Re: Small issues with CREATE TABLE COMPRESSION
On Mon, May 10, 2021 at 12:17:19PM +0530, Dilip Kumar wrote: > Even I was confused about that's the reason I used liblz4_static.lib, > but I see you have changed to liblz4.lib to make it consistent I > guess? That's the name the upstream code is using, yes. > Patch looks good to me, I can not verify though because I don't have > such an environment. Thanks for improving the patch. Thanks, I got that applied to finish the work of this thread for beta1. At least this will give people an option to test LZ4 on Windows. Perhaps this will require some adjustments, but let's see if that's necessary when that comes up. -- Michael signature.asc Description: PGP signature
Re: PG 14 release notes, first draft
On Mon, May 10, 2021 at 07:50:14AM -0400, Joe Conway wrote: > On 5/10/21 2:03 AM, Bruce Momjian wrote: > > I have committed the first draft of the PG 14 release notes. You can > > see the most current build of them here: > > > > https://momjian.us/pgsql_docs/release-14.html > > > > I need clarification on many items, and the document still needs its > > items properly ordered, and markup added. I also expect a lot of > > feedback. > > > > I plan to work on completing this document this coming week in > > preparation for beta next week. > > While only a small change, this commit does affect user visible behavior and > so should probably be noted: > > https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=b12bd4869b5e I see your point. Here is the release entry I added: Return false for has_column_privilege() checks on non-existent or dropped columns (Joe Conway) Previously such columns returned an invalid column error. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: PG 14 release notes, first draft
On Tue, May 11, 2021 at 12:35:28PM +1200, David Rowley wrote: > Thanks for making the updates. > > On Tue, 11 May 2021 at 05:07, Bruce Momjian wrote: > > > > On Mon, May 10, 2021 at 08:52:44PM +1200, David Rowley wrote: > > > > Improve the performance of parallel sequential scans (Thomas Munro, > > > > David Rowley) > > > > > > I think it is worth mentioning "I/O" before "performance". This > > > change won't really help cases if all the table's pages are already in > > > shared buffers. > > > > I went with: > > > > Improve the performance of parallel sequential I/O scans (Thomas > > Munro, > > David Rowley) > > I think I'd have gone with: > > "Improve I/O performance of parallel sequential scans (Thomas Munro, > David Rowley)" > > The operation we're speeding up is called sequential scan. We don't > have any operation that's named sequential I/O scan. OK, new text: Improve the I/O performance of parallel sequential scans (Thomas Munro, David Rowley) This was done by allocating blocks in groups to parallel workers. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
Re: Replication slot stats misgivings
On Fri, May 7, 2021 at 8:03 AM Amit Kapila wrote: > > Thanks for the summarization. I don't find anything that is left > unaddressed. I think we can wait for a day or two to see if Andres or > anyone else sees anything that is left unaddressed and then we can > close the open item. > I have closed this open item. -- With Regards, Amit Kapila.
Re: PG 14 release notes, first draft
On Mon, May 10, 2021 at 04:14:56PM -0700, Peter Geoghegan wrote: > On Mon, May 10, 2021 at 3:58 PM Bruce Momjian wrote: > > OK, you are confirming what Matthias suggested. I added these two > > items, which both seem to apply only to heap pages, not index pages: > > That's right -- these two relate to heap pages only. > > I think that Matthias compared these two to bottom-up index deletion > because all three patches are concerned about avoiding "a permanent > solution to a temporary problem". They're conceptually similar despite > being in fairly different areas. Evidently Matthias has a similar > mental model to my own when it comes to this stuff. Agreed, that is a very interesting distinction. > Unfortunately the practical significance of the line pointer patch is > hard to demonstrate with a benchmark. I believe that it is very useful > on a sufficiently long timeline and with certain workloads because of > the behavior it avoids. As I pointed out on that other thread > recently, once you have irreversible bloat very small adverse events > will eventually add up and cause big problems. When this happens it'll > be very hard or impossible to detect, since it just looks like heap > fragmentation. > > That said, it's clearly an issue with one of the TPC-C tables if you > run BenchmarkSQL for days and days (just one table, though). So there > is hard evidence that line pointer bloat could get really out of hand > in at least some tables. OK, once I dug into what you two were saying, I see the significance. I was frankly surprised we didn't already have these optimizations, and you are right they can lead to long-term problems. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com If only the physical world exists, free will is an illusion.
walreceiver that is behind doesn't quit, send replies
Hi, There are no interrupt checks in the WalReceiverMain() sub-loop for receiving WAL. There's one above /* See if we can read data immediately */ len = walrcv_receive(wrconn, &buf, &wait_fd); but none in the loop below: /* * Process the received data, and any subsequent data we * can read without blocking. */ for (;;) Similarly, that inner loop doesn't send status updates or fsyncs, while there's network data - but that matters a bit less, because we'll sendstatus updates upon request, and flush WAL at segment boundaries. This may explain why a low-ish wal_sender_timeout / wal_receiver_status_interval combo still sees plenty timeouts. I suspect this is a lot easier to hit when the IO system on the standby is the bottleneck (with the kernel slowing us down inside the pg_pwrite()), because that makes it easier to always have incoming network data. It's probably not a good idea to just remove that two-level loop - we don't want to fsync at a much higher rate. But just putting an ProcessWalRcvInterrupts() in the inner loop also seems unsatisfying, we should respect wal_receiver_status_interval... I've a couple times gotten into a situation where I was shutting down the primary while the standby was behind, and the system appeared to just lock up, with neither primary nor standby reacting to normal shutdown attempts. This seems to happen more often with larger wal segment size... Greetings, Andres Freund