About 0001:,Having overviewed it, I don't see any issues (but I'm the author), except grammatical ones - but I'm not a native to judge it.,Also, the sentence 'turning GROUP BY clauses into pathkeys' i
On 5/27/24 19:41, Alexander Korotkov wrote: On Tue, Apr 23, 2024 at 1:19 AM Alexander Korotkov wrote: While there are some particular use-cases by Jian He, I hope that above could give some rationale. I've assembled patches in this thread into one patchset. 0001 The patch fixing asymmetry in setting EquivalenceClass.ec_sortref by Andrei [1]. I've revised comments and wrote the commit message. 0002 The patch for handling duplicates of SortGroupClause. I didn't get the sense of Andrei implementation. It seems to care about duplicate pointers in group clauses list. But the question is the equal SortGroupClause's comprising different pointers. I think we should group duplicate SortGroupClause's together as preprocess_groupclause() used to do. Reimplemented patch to do so. 0003 Rename PathKeyInfo to GroupByOrdering by Andres [3]. I only revised comments and wrote the commit message. 0004 Turn back preprocess_groupclause() for the reason I described upthread [4]. Any thoughts? About 0001: Having overviewed it, I don't see any issues (but I'm the author), except grammatical ones - but I'm not a native to judge it. Also, the sentence 'turning GROUP BY clauses into pathkeys' is unclear to me. It may be better to write something like: 'building pathkeys by the list of grouping clauses'. 0002: The part under USE_ASSERT_CHECKING looks good to me. But the code in group_keys_reorder_by_pathkeys looks suspicious: of course, we do some doubtful work without any possible way to reproduce, but if we envision some duplicated elements in the group_clauses, we should avoid usage of the list_concat_unique_ptr. What's more, why do you not exit from foreach_ptr immediately after SortGroupClause has been found? I think the new_group_clauses should be consistent with the new_group_pathkeys. 0003: Looks good 0004: I was also thinking about reintroducing the preprocess_groupclause because with the re-arrangement of GROUP-BY clauses according to incoming pathkeys, it doesn't make sense to have a user-defined order—at least while cost_sort doesn't differ costs for alternative column orderings. So, I'm okay with the code. But why don't you use the same approach with foreach_ptr as before? -- regards, Andrei Lepikhov Postgres Professional
Re: Improving the latch handling between logical replication launcher and worker processes.
On Fri, 10 May 2024 at 07:39, Hayato Kuroda (Fujitsu) wrote: > > Dear Vignesh, > > Thanks for raising idea! > > > a) Introduce a new latch to handle worker attach and exit. > > Just to confirm - there are three wait events for launchers, so I feel we may > be > able to create latches per wait event. Is there a reason to introduce > "a" latch? One latch is enough, we can use the new latch for both worker starting and worker exiting. The other existing latch can be used for other purposes. Something like the attached patch. Regards, Vignesh From 072cfefd717a0c7c496a5ca66e9239ccb808ece6 Mon Sep 17 00:00:00 2001 From: Vignesh C Date: Wed, 29 May 2024 14:55:49 +0530 Subject: [PATCH v1] Improving the latch handling between logical replication launcher and the worker processes. Currently the launcher's latch is used for the following: a) worker process attach b) worker process exit and c) subscription creation. Since this same latch is used for multiple cases, the launcher process is not able to handle concurrent scenarios like: a) Launcher started a new apply worker and waiting for apply worker to attach and b) create subscription sub2 sending launcher wake up signal. Fixed it by having a different latch for worker start and exit. --- .../replication/logical/applyparallelworker.c | 1 + src/backend/replication/logical/launcher.c| 19 ++- src/backend/replication/logical/worker.c | 1 + src/include/replication/worker_internal.h | 6 ++ 4 files changed, 22 insertions(+), 5 deletions(-) diff --git a/src/backend/replication/logical/applyparallelworker.c b/src/backend/replication/logical/applyparallelworker.c index e7f7d4c5e4..62c3791ceb 100644 --- a/src/backend/replication/logical/applyparallelworker.c +++ b/src/backend/replication/logical/applyparallelworker.c @@ -912,6 +912,7 @@ ParallelApplyWorkerMain(Datum main_arg) * the uninitialized memory queue. */ logicalrep_worker_attach(worker_slot); + SetLatch(&MyLogicalRepWorker->launcherWakeupLatch); /* * Register the shutdown callback after we are attached to the worker diff --git a/src/backend/replication/logical/launcher.c b/src/backend/replication/logical/launcher.c index 27c3a91fb7..74bfdb9b6e 100644 --- a/src/backend/replication/logical/launcher.c +++ b/src/backend/replication/logical/launcher.c @@ -187,6 +187,7 @@ WaitForReplicationWorkerAttach(LogicalRepWorker *worker, BgwHandleStatus status; int rc; + OwnLatch(&worker->launcherWakeupLatch); for (;;) { pid_t pid; @@ -199,6 +200,7 @@ WaitForReplicationWorkerAttach(LogicalRepWorker *worker, if (!worker->in_use || worker->proc) { LWLockRelease(LogicalRepWorkerLock); + DisownLatch(&worker->launcherWakeupLatch); return worker->in_use; } @@ -214,6 +216,7 @@ WaitForReplicationWorkerAttach(LogicalRepWorker *worker, if (generation == worker->generation) logicalrep_worker_cleanup(worker); LWLockRelease(LogicalRepWorkerLock); + DisownLatch(&worker->launcherWakeupLatch); return false; } @@ -221,13 +224,13 @@ WaitForReplicationWorkerAttach(LogicalRepWorker *worker, * We need timeout because we generally don't get notified via latch * about the worker attach. But we don't expect to have to wait long. */ - rc = WaitLatch(MyLatch, + rc = WaitLatch(&worker->launcherWakeupLatch, WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, 10L, WAIT_EVENT_BGWORKER_STARTUP); if (rc & WL_LATCH_SET) { - ResetLatch(MyLatch); + ResetLatch(&worker->launcherWakeupLatch); CHECK_FOR_INTERRUPTS(); } } @@ -573,6 +576,8 @@ logicalrep_worker_stop_internal(LogicalRepWorker *worker, int signo) break; } + OwnLatch(&worker->launcherWakeupLatch); + /* Now terminate the worker ... */ kill(worker->proc->pid, signo); @@ -583,18 +588,21 @@ logicalrep_worker_stop_internal(LogicalRepWorker *worker, int signo) /* is it gone? */ if (!worker->proc || worker->generation != generation) + { + DisownLatch(&worker->launcherWakeupLatch); break; + } LWLockRelease(LogicalRepWorkerLock); /* Wait a bit --- we don't expect to have to wait long. */ - rc = WaitLatch(MyLatch, + rc = WaitLatch(&worker->launcherWakeupLatch, WL_LATCH_SET | WL_TIMEOUT | WL_EXIT_ON_PM_DEATH, 10L, WAIT_EVENT_BGWORKER_SHUTDOWN); if (rc & WL_LATCH_SET) { - ResetLatch(MyLatch); + ResetLatch(&worker->launcherWakeupLatch); CHECK_FOR_INTERRUPTS(); } @@ -837,7 +845,7 @@ logicalrep_worker_onexit(int code, Datum arg) if (!InitializingApplyWorker) LockReleaseAll(DEFAULT_LOCKMETHOD, true); - ApplyLauncherWakeup(); + SetLatch(&MyLogicalRepWorker->launcherWakeupLatch); } /* @@ -976,6 +984,7 @@ ApplyLauncherShmemInit(void) memset(worker, 0, sizeof(LogicalRepWorker)); SpinLockInit(&worker->relmutex); + InitSharedLatch(&worker->launcherWakeupLatch); } } } diff --git a/src/backend/replication/logical/worker.c b/src/backend/replication/logica
Re: Improving the latch handling between logical replication launcher and worker processes.
On Wed, 29 May 2024 at 10:41, Peter Smith wrote: > > On Thu, Apr 25, 2024 at 6:59 PM vignesh C wrote: > > > > > c) Don't reset the latch at worker attach and allow launcher main to > > identify and handle it. For this there is a patch v6-0002 available at > > [2]. > > This option c seems the easiest. Can you explain what are the > drawbacks of using this approach? This solution will resolve the issue. However, one drawback to consider is that because we're not resetting the latch, in this scenario, the launcher process will need to perform an additional round of acquiring subscription details and determining whether the worker should start, regardless of any changes in subscriptions. Regards, Vignesh
Timeout gets unset on a syntax error.
On a particular query, I start an alarm (say for 5 sec) using RegisterTimeout , and when the alarm rings, I log something. This works fine. But if I run a query with a syntax error between the time duration, then the alarm never rings. Is there some code within Postgres that resets/removes the signals in case a query hits any error? TimeoutId timer = RegisterTimeout(USER_TIMEOUT,interval_handler); enable_timeout_after(timer, 5 * 1000); Thanks, Ishan. -- The information contained in this electronic communication is intended solely for the individual(s) or entity to which it is addressed. It may contain proprietary, confidential and/or legally privileged information. Any review, retransmission, dissemination, printing, copying or other use of, or taking any action in reliance on the contents of this information by person(s) or entities other than the intended recipient is strictly prohibited and may be unlawful. If you have received this communication in error, please notify us by responding to this email or telephone and immediately and permanently delete all copies of this message and any attachments from your system(s). The contents of this message do not necessarily represent the views or policies of BITS Pilani.
001_rep_changes.pl fails due to publisher stuck on shutdown
Hello hackers, As a recent buildfarm test failure [1] shows: [14:33:02.374](0.333s) ok 23 - update works with dropped subscriber column ### Stopping node "publisher" using mode fast # Running: pg_ctl -D /home/bf/bf-build/adder/HEAD/pgsql.build/testrun/subscription/001_rep_changes/data/t_001_rep_changes_publisher_data/pgdata -m fast stop waiting for server to shut down.. ... ... ... .. failed pg_ctl: server does not shut down # pg_ctl stop failed: 256 # Postmaster PID for node "publisher" is 549 [14:39:04.375](362.001s) Bail out! pg_ctl stop failed 001_rep_changes_publisher.log 2024-05-16 14:33:02.907 UTC [2238704][client backend][4/22:0] LOG: statement: DELETE FROM tab_rep 2024-05-16 14:33:02.925 UTC [2238704][client backend][:0] LOG: disconnection: session time: 0:00:00.078 user=bf database=postgres host=[local] 2024-05-16 14:33:02.939 UTC [549][postmaster][:0] LOG: received fast shutdown request 2024-05-16 14:33:03.000 UTC [549][postmaster][:0] LOG: aborting any active transactions 2024-05-16 14:33:03.049 UTC [549][postmaster][:0] LOG: background worker "logical replication launcher" (PID 2223110) exited with exit code 1 2024-05-16 14:33:03.062 UTC [901][checkpointer][:0] LOG: shutting down 2024-05-16 14:39:04.377 UTC [549][postmaster][:0] LOG: received immediate shutdown request 2024-05-16 14:39:04.382 UTC [549][postmaster][:0] LOG: database system is shut down the publisher node may hang on stopping. I reproduced the failure (with aggressive autovacuum) locally and discovered that it happens because: 1) checkpointer calls WalSndInitStopping() (which sends PROCSIG_WALSND_INIT_STOPPING to walsender), and then spins inside WalSndWaitStopping() indefinitely, because: 2) walsender receives the signal, sets got_STOPPING = true, but can't exit WalSndLoop(): 3) it never sets got_SIGUSR2 (to get into WalSndDone()) in XLogSendLogical(): 4) it never sets WalSndCaughtUp to true: 5) logical_decoding_ctx->reader->EndRecPtr can't reach flushPtr in XLogSendLogical(): 6) EndRecPtr doesn't advance in XLogNextRecord(): 7) XLogDecodeNextRecord() fails do decode a record that crosses a page boundary: 8) ReadPageInternal() (commented "Wait for the next page to become available") constantly returns XLREAD_FAIL: 9) state->routine.page_read()/logical_read_xlog_page() constantly returns -1: 10) flushptr = WalSndWaitForWal() stops advancing, because got_STOPPING == true (see 2). That is, walsender doesn't let itself to catch up, if it gets the stop signal when it's lagging behind and decoding a record requires reading the next wal page. Please look at the reproducing test (based on 001_rep_changes.pl) attached. If fails for me as below: # 17 Bailout called. Further testing stopped: pg_ctl stop failed FAILED--Further testing stopped: pg_ctl stop failed make: *** [Makefile:21: check] Ошибка 255 [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=adder&dt=2024-05-16%2014%3A22%3A38 [2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dikkop&dt=2024-04-24%2014%3A38%3A35 (apparently the same) Best regards, Alexander 099_walsender_stop.pl Description: Perl program
RE: Psql meta-command conninfo+
Hi, Is there someone willing to help me with this development and guide the patch so that it can be committed one day? Regards, Maiquel.
Re: About 0001:,Having overviewed it, I don't see any issues (but I'm the author), except grammatical ones - but I'm not a native to judge it.,Also, the sentence 'turning GROUP BY clauses into pathkey
Hi, Andrei! Thank you for your feedback. On Wed, May 29, 2024 at 11:08 AM Andrei Lepikhov wrote: > On 5/27/24 19:41, Alexander Korotkov wrote: > > On Tue, Apr 23, 2024 at 1:19 AM Alexander Korotkov > > wrote: > >> While there are some particular use-cases by Jian He, I hope that > >> above could give some rationale. > > > > I've assembled patches in this thread into one patchset. > > 0001 The patch fixing asymmetry in setting EquivalenceClass.ec_sortref > > by Andrei [1]. I've revised comments and wrote the commit message. > > 0002 The patch for handling duplicates of SortGroupClause. I didn't > > get the sense of Andrei implementation. It seems to care about > > duplicate pointers in group clauses list. But the question is the > > equal SortGroupClause's comprising different pointers. I think we > > should group duplicate SortGroupClause's together as > > preprocess_groupclause() used to do. Reimplemented patch to do so. > > 0003 Rename PathKeyInfo to GroupByOrdering by Andres [3]. I only > > revised comments and wrote the commit message. > > 0004 Turn back preprocess_groupclause() for the reason I described upthread > > [4]. > > > > Any thoughts? > About 0001: > Having overviewed it, I don't see any issues (but I'm the author), > except grammatical ones - but I'm not a native to judge it. > Also, the sentence 'turning GROUP BY clauses into pathkeys' is unclear > to me. It may be better to write something like: 'building pathkeys by > the list of grouping clauses'. OK, thank you. I'll run once again for the grammar issues. > 0002: > The part under USE_ASSERT_CHECKING looks good to me. But the code in > group_keys_reorder_by_pathkeys looks suspicious: of course, we do some > doubtful work without any possible way to reproduce, but if we envision > some duplicated elements in the group_clauses, we should avoid usage of > the list_concat_unique_ptr. As I understand Tom, there is a risk that clauses list may contain multiple instances of equivalent SortGroupClause, not duplicate pointers. > What's more, why do you not exit from > foreach_ptr immediately after SortGroupClause has been found? I think > the new_group_clauses should be consistent with the new_group_pathkeys. I wanted this to be consistent with preprocess_groupclause(), where duplicate SortGroupClause'es are grouped together. Otherwise, we could just delete redundant SortGroupClause'es. > 0003: > Looks good > > 0004: > I was also thinking about reintroducing the preprocess_groupclause > because with the re-arrangement of GROUP-BY clauses according to > incoming pathkeys, it doesn't make sense to have a user-defined order—at > least while cost_sort doesn't differ costs for alternative column orderings. > So, I'm okay with the code. But why don't you use the same approach with > foreach_ptr as before? I restored the function as it was before 0452b461bc with minimal edits to support the incremental sort. I think it would be more valuable to keep the difference with pg16 code small rather than refactor to simplify existing code. -- Regards, Alexander Korotkov
Re: Volatile write caches on macOS and Windows, redux
On 25.05.24 04:01, Jelte Fennema-Nio wrote: Is this the only reason why you're suggesting adding fsync=full, instead of simply always setting F_FULLFSYNC when fsync=true on MacOS. If so, I'm not sure we really gain anything by this tri-state. I think people either care about data loss on power loss, or they don't. I doubt many people want his third intermediate option, which afaict basically means lose data on powerloss less often than fsync=false but still lose data most of the time. I agree, two states should be enough. It could basically just be pg_fsync(int fd) { #if macos fcntl(fd, F_FULLFSYNC); #else fsync(fd); #endif }
Re: 001_rep_changes.pl fails due to publisher stuck on shutdown
On Wed, 29 May 2024 at 16:30, Alexander Lakhin wrote: > > Hello hackers, > > As a recent buildfarm test failure [1] shows: > [14:33:02.374](0.333s) ok 23 - update works with dropped subscriber column > ### Stopping node "publisher" using mode fast > # Running: pg_ctl -D > /home/bf/bf-build/adder/HEAD/pgsql.build/testrun/subscription/001_rep_changes/data/t_001_rep_changes_publisher_data/pgdata > -m fast stop > waiting for server to shut down.. ... ... ... .. failed > pg_ctl: server does not shut down > # pg_ctl stop failed: 256 > # Postmaster PID for node "publisher" is 549 > [14:39:04.375](362.001s) Bail out! pg_ctl stop failed > > 001_rep_changes_publisher.log > 2024-05-16 14:33:02.907 UTC [2238704][client backend][4/22:0] LOG: statement: > DELETE FROM tab_rep > 2024-05-16 14:33:02.925 UTC [2238704][client backend][:0] LOG: disconnection: > session time: 0:00:00.078 user=bf > database=postgres host=[local] > 2024-05-16 14:33:02.939 UTC [549][postmaster][:0] LOG: received fast > shutdown request > 2024-05-16 14:33:03.000 UTC [549][postmaster][:0] LOG: aborting any > active transactions > 2024-05-16 14:33:03.049 UTC [549][postmaster][:0] LOG: background worker > "logical replication launcher" (PID > 2223110) exited with exit code 1 > 2024-05-16 14:33:03.062 UTC [901][checkpointer][:0] LOG: shutting down > 2024-05-16 14:39:04.377 UTC [549][postmaster][:0] LOG: received > immediate shutdown request > 2024-05-16 14:39:04.382 UTC [549][postmaster][:0] LOG: database system > is shut down > > the publisher node may hang on stopping. > > I reproduced the failure (with aggressive autovacuum) locally and > discovered that it happens because: > 1) checkpointer calls WalSndInitStopping() (which sends > PROCSIG_WALSND_INIT_STOPPING to walsender), and then spins inside > WalSndWaitStopping() indefinitely, because: > 2) walsender receives the signal, sets got_STOPPING = true, but can't exit > WalSndLoop(): > 3) it never sets got_SIGUSR2 (to get into WalSndDone()) in > XLogSendLogical(): > 4) it never sets WalSndCaughtUp to true: > 5) logical_decoding_ctx->reader->EndRecPtr can't reach flushPtr in > XLogSendLogical(): > 6) EndRecPtr doesn't advance in XLogNextRecord(): > 7) XLogDecodeNextRecord() fails do decode a record that crosses a page > boundary: > 8) ReadPageInternal() (commented "Wait for the next page to become > available") constantly returns XLREAD_FAIL: > 9) state->routine.page_read()/logical_read_xlog_page() constantly returns > -1: > 10) flushptr = WalSndWaitForWal() stops advancing, because > got_STOPPING == true (see 2). > > That is, walsender doesn't let itself to catch up, if it gets the stop > signal when it's lagging behind and decoding a record requires reading > the next wal page. > > Please look at the reproducing test (based on 001_rep_changes.pl) attached. > If fails for me as below: > # 17 > Bailout called. Further testing stopped: pg_ctl stop failed > FAILED--Further testing stopped: pg_ctl stop failed > make: *** [Makefile:21: check] Ошибка 255 Thank you, Alexander, for sharing the script. I was able to reproduce the issue using the provided script. Furthermore, while investigating its origins, I discovered that this problem persists across all branches up to PG10 (the script needs slight adjustments to run it on older versions). It's worth noting that this issue isn't a result of recent version changes. Regards, Vignesh
Re: pgsql: Add more SQL/JSON constructor functions
Amit Langote writes: > On Mon, May 27, 2024 at 7:10 PM Alvaro Herrera > wrote: >> On 2024-May-27, Alvaro Herrera wrote: >> I just noticed this behavior, which looks like a bug to me: >> >> select json_serialize('{"a":1, "a":2}' returning varchar(5)); >> json_serialize >> >> {"a": >> >> I think this function should throw an error if the destination type >> doesn't have room for the output json. Otherwise, what good is the >> serialization function? > This behavior comes from using COERCE_EXPLICIT_CAST when creating the > coercion expression to convert json_*() functions' argument to the > RETURNING type. Yeah, I too think this is a cast, and truncation is the spec-defined behavior for casting to varchar with a specific length limit. I see little reason that this should work differently from select json_serialize('{"a":1, "a":2}' returning text)::varchar(5); json_serialize {"a": (1 row) regards, tom lane
Re: Remove last traces of HPPA support
On Sat, Oct 21, 2023 at 02:18:19AM -0400, Tom Lane wrote: > Andres Freund writes: > > It'd be one thing to continue supporting an almost-guaranteed-to-be-unused > > platform, if we expected it to become more popular or complete enough to be > > usable like e.g. risc-v a few years ago. But I doubt we'll find anybody out > > there believing that there's a potential future upward trend for HPPA. > > Indeed. I would have bet that Postgres on HPPA was extinct in the wild, > until I noticed this message a few days ago: > > https://www.postgresql.org/message-id/BYAPR02MB42624ED41C15BFA82DAE2C359BD5A%40BYAPR02MB4262.namprd02.prod.outlook.com > > But we already cut that user off at the knees by removing HP-UX support. > > The remaining argument for worrying about this architecture being in > use in the field is the idea that somebody is using it on top of > NetBSD or OpenBSD. But having used both of those systems (or tried > to), I feel absolutely confident in asserting that nobody is using > it in production today, let alone hoping to continue using it. > > > IMO a single person looking at HPPA code for a few minutes is a cost that > > more > > than outweighs the potential benefits of continuing "supporting" this dead > > arch. Even code that doesn't need to change has costs, particularly if it's > > intermingled with actually important code (which spinlocks certainly are). > > Yup, that. It's not zero cost to carry this stuff. +1 for dropping it.
Re:Avoid an odd undefined behavior with memcmp (src/bin/pg_rewind/pg_rewind.c)
Hi Ranier, > IMO, I think that pg_rewind can have a security issue, > if two files are exactly the same, they are considered different. > Because use of structs with padding values is unspecified. Logically you are right. But I don't understand what scenario would require memcmp to compare ControlFileData. In general, we read ControlFileData from a pg_control file and then use members of ControlFileData directly. So the two ControlFileData are not directly compared by byte. > Fix by explicitly initializing with memset to avoid this. And, even if there are scenarios that use memcmp comparisons, your modifications are not complete. There are three calls to the digestControlFile in the main() of pg_rewind.c, and as your said(if right), these should do memory initialization every time. -- Best Regards, Long
Re: Improving the latch handling between logical replication launcher and worker processes.
On Wed, May 29, 2024 at 7:53 PM vignesh C wrote: > > On Wed, 29 May 2024 at 10:41, Peter Smith wrote: > > > > On Thu, Apr 25, 2024 at 6:59 PM vignesh C wrote: > > > > > > > > c) Don't reset the latch at worker attach and allow launcher main to > > > identify and handle it. For this there is a patch v6-0002 available at > > > [2]. > > > > This option c seems the easiest. Can you explain what are the > > drawbacks of using this approach? > > This solution will resolve the issue. However, one drawback to > consider is that because we're not resetting the latch, in this > scenario, the launcher process will need to perform an additional > round of acquiring subscription details and determining whether the > worker should start, regardless of any changes in subscriptions. > Hmm. IIUC the WaitLatch of the Launcher.WaitForReplicationWorkerAttach was not expecting to get notified. e.g.1. The WaitList comment in the function says so: /* * We need timeout because we generally don't get notified via latch * about the worker attach. But we don't expect to have to wait long. */ e.g.2 The logicalrep_worker_attach() function (which is AFAIK what WaitForReplicationWorkerAttach was waiting for) is not doing any SetLatch. So that matches what the comment said. ~~~ AFAICT the original problem reported by this thread happened because the SetLatch (from CREATE SUBSCRIPTION) has been inadvertently gobbled by the WaitForReplicationWorkerAttach.WaitLatch/ResetLatch which BTW wasn't expecting to be notified at all. ~~~ Your option c removes the ResetLatch done by WaitForReplicationWorkerAttach: You said above that one drawback is "the launcher process will need to perform an additional round of acquiring subscription details and determining whether the worker should start, regardless of any changes in subscriptions" I think you mean if some CREATE SUBSCRIPTION (i.e. SetLatch) happens during the attaching of other workers then the latch would (now after option c) remain set and so the WaitLatch of ApplyLauncherMain would be notified and/or return immediately end causing an immediate re-iteration of the "foreach(lc, sublist)" loop. But I don't understand why that is a problem. a) I didn't know what you meant "regardless of any changes in subscriptions" because I think the troublesome SetLatch originated from the CREATE SUBSCRIPTION and so there *is* a change to subscriptions. b) We will need to repeat that sublist loop anyway to start the worker for the new CREATE SUBSCRIPTION, and we need to do that at the earliest opportunity because the whole point of the SetLatch is so the CREATE SUBSCRIPTION worker can get started promptly. So the earlier we do that the better, right? c) AFAICT there is no danger of accidentally tying to starting workers who are still in the middle of trying to start (from the previous iteration) because those cases should be guarded by the ApplyLauncherGetWorkerStartTime logic. ~~ To summarise, I felt removing the ResetLatch and the WL_LATCH_SET bitset (like your v6-0002 patch does) is not only an easy way of fixing the problem reported by this thread, but also it now makes that WaitForReplicationWorkerAttach code behave like the comment ("we generally don't get notified via latch about the worker attach") is saying. (Unless there are some other problems with it that I can't see) == Kind Regards, Peter Smith. Fujitsu Australia
Implementing CustomScan over an index
I've built a custom type of index. I implemented an index access method, but have run into roadblocks. I need to: 1. See the other quals in the where clause. 2. Extract some things from the projection. 3. Insert some things in the projection. 4. Push some aggregations down into the index. So I started implementing a CustomScan. It's not trivial. I've learned that the system executes ExecInitCustomScan automatically, but I probably need it to do most of the stuff in ExecInitIndexScan, and then execute the scan mostly the way it's done in IndexNext. Basically, I want just a normal index scan, but with the ability to do custom things with the quals and the projection. So... what's the best approach? Is there any sample code that does this? A search of github doesn't turn up much. Is there any way to do this without duplicating everything in nodeIndexscan.c myself? -- Chris Cleveland 312-339-2677 mobile
Re: Avoid an odd undefined behavior with memcmp (src/bin/pg_rewind/pg_rewind.c)
On Wed, 29 May 2024 at 07:02, Ranier Vilela wrote: > The function *perform_rewind* has an odd undefined behavior. > The function memcmp/, compares bytes to bytes. > > IMO, I think that pg_rewind can have a security issue, > if two files are exactly the same, they are considered different. > Because use of structs with padding values is unspecified. > > Fix by explicitly initializing with memset to avoid this. It's unclear to me why you think this makes it any safer. If you look at the guts of digestControlFile(), you'll see that the memory you've just carefully memset to zero is subsequently overwritten by a memcpy(). Do you not notice that? or do you think memcpy() somehow has some ability to skip over padding bytes and leave them set to the value they were set to previously? It doesn't. If your patch fixes the Coverity warning, then that's just a demonstration of how smart the tool is. A warning does not mean there's a problem. The location where we should ensure the memory is zeroed is when writing the control file. That's done in InitControlFile(). If you look in there you'll see "memset(ControlFile, 0, sizeof(ControlFileData));" It would be great if you add a bit more thinking between noticing a Coverity warning and raising it on the mailing list. I'm not sure how much time you dwell on these warnings before raising them here, but certainly, if we wanted a direct tie between a Coverity warning and this mailing list, we'd script it. But we don't, so we won't. So many of your reports appear to rely on someone on the list doing that thinking for you. That does not scale well when in the majority of the reports there's no actual problem. Please remember that false reports wastes people's time. If you're keen to do some other small hobby projects around here, then I bet there'd be a lot of suggestions for things that could be improved. You could raise a thread to ask for suggestions for places to start. I don't think your goal should be that Coverity runs on the PostgreSQL source warning free. If that was a worthy goal, it would have happened a long time ago. David
Re: POC: GROUP BY optimization
On 5/29/24 19:53, Alexander Korotkov wrote: Hi, Andrei! Thank you for your feedback. On Wed, May 29, 2024 at 11:08 AM Andrei Lepikhov wrote: On 5/27/24 19:41, Alexander Korotkov wrote: Any thoughts? About 0001: Having overviewed it, I don't see any issues (but I'm the author), except grammatical ones - but I'm not a native to judge it. Also, the sentence 'turning GROUP BY clauses into pathkeys' is unclear to me. It may be better to write something like: 'building pathkeys by the list of grouping clauses'. OK, thank you. I'll run once again for the grammar issues. 0002: The part under USE_ASSERT_CHECKING looks good to me. But the code in group_keys_reorder_by_pathkeys looks suspicious: of course, we do some doubtful work without any possible way to reproduce, but if we envision some duplicated elements in the group_clauses, we should avoid usage of the list_concat_unique_ptr. As I understand Tom, there is a risk that clauses list may contain multiple instances of equivalent SortGroupClause, not duplicate pointers. Maybe. I just implemented the worst-case scenario with the intention of maximum safety. So, it's up to you. What's more, why do you not exit from foreach_ptr immediately after SortGroupClause has been found? I think the new_group_clauses should be consistent with the new_group_pathkeys. I wanted this to be consistent with preprocess_groupclause(), where duplicate SortGroupClause'es are grouped together. Otherwise, we could just delete redundant SortGroupClause'es. Hm, preprocess_groupclause is called before the standard_qp_callback forms the 'canonical form' of group_pathkeys and such behaviour needed. But the code that chooses the reordering strategy uses already processed grouping clauses, where we don't have duplicates in the first num_groupby_pathkeys elements, do we? 0004: I was also thinking about reintroducing the preprocess_groupclause because with the re-arrangement of GROUP-BY clauses according to incoming pathkeys, it doesn't make sense to have a user-defined order—at least while cost_sort doesn't differ costs for alternative column orderings. So, I'm okay with the code. But why don't you use the same approach with foreach_ptr as before? I restored the function as it was before 0452b461bc with minimal edits to support the incremental sort. I think it would be more valuable to keep the difference with pg16 code small rather than refactor to simplify existing code. Got it -- regards, Andrei Lepikhov Postgres Professional
Re: 001_rep_changes.pl fails due to publisher stuck on shutdown
On Thu, May 30, 2024 at 2:09 AM vignesh C wrote: > > On Wed, 29 May 2024 at 16:30, Alexander Lakhin wrote: > > > > Hello hackers, > > > > As a recent buildfarm test failure [1] shows: > > [14:33:02.374](0.333s) ok 23 - update works with dropped subscriber column > > ### Stopping node "publisher" using mode fast > > # Running: pg_ctl -D > > /home/bf/bf-build/adder/HEAD/pgsql.build/testrun/subscription/001_rep_changes/data/t_001_rep_changes_publisher_data/pgdata > > -m fast stop > > waiting for server to shut down.. ... ... ... .. failed > > pg_ctl: server does not shut down > > # pg_ctl stop failed: 256 > > # Postmaster PID for node "publisher" is 549 > > [14:39:04.375](362.001s) Bail out! pg_ctl stop failed > > > > 001_rep_changes_publisher.log > > 2024-05-16 14:33:02.907 UTC [2238704][client backend][4/22:0] LOG: > > statement: DELETE FROM tab_rep > > 2024-05-16 14:33:02.925 UTC [2238704][client backend][:0] LOG: > > disconnection: session time: 0:00:00.078 user=bf > > database=postgres host=[local] > > 2024-05-16 14:33:02.939 UTC [549][postmaster][:0] LOG: received fast > > shutdown request > > 2024-05-16 14:33:03.000 UTC [549][postmaster][:0] LOG: aborting any > > active transactions > > 2024-05-16 14:33:03.049 UTC [549][postmaster][:0] LOG: background > > worker "logical replication launcher" (PID > > 2223110) exited with exit code 1 > > 2024-05-16 14:33:03.062 UTC [901][checkpointer][:0] LOG: shutting down > > 2024-05-16 14:39:04.377 UTC [549][postmaster][:0] LOG: received > > immediate shutdown request > > 2024-05-16 14:39:04.382 UTC [549][postmaster][:0] LOG: database system > > is shut down > > > > the publisher node may hang on stopping. > > > > I reproduced the failure (with aggressive autovacuum) locally and > > discovered that it happens because: > > 1) checkpointer calls WalSndInitStopping() (which sends > > PROCSIG_WALSND_INIT_STOPPING to walsender), and then spins inside > > WalSndWaitStopping() indefinitely, because: > > 2) walsender receives the signal, sets got_STOPPING = true, but can't exit > > WalSndLoop(): > > 3) it never sets got_SIGUSR2 (to get into WalSndDone()) in > > XLogSendLogical(): > > 4) it never sets WalSndCaughtUp to true: > > 5) logical_decoding_ctx->reader->EndRecPtr can't reach flushPtr in > > XLogSendLogical(): > > 6) EndRecPtr doesn't advance in XLogNextRecord(): > > 7) XLogDecodeNextRecord() fails do decode a record that crosses a page > > boundary: > > 8) ReadPageInternal() (commented "Wait for the next page to become > > available") constantly returns XLREAD_FAIL: > > 9) state->routine.page_read()/logical_read_xlog_page() constantly returns > > -1: > > 10) flushptr = WalSndWaitForWal() stops advancing, because > > got_STOPPING == true (see 2). > > > > That is, walsender doesn't let itself to catch up, if it gets the stop > > signal when it's lagging behind and decoding a record requires reading > > the next wal page. > > > > Please look at the reproducing test (based on 001_rep_changes.pl) attached. > > If fails for me as below: > > # 17 > > Bailout called. Further testing stopped: pg_ctl stop failed > > FAILED--Further testing stopped: pg_ctl stop failed > > make: *** [Makefile:21: check] Ошибка 255 > > Thank you, Alexander, for sharing the script. I was able to reproduce > the issue using the provided script. Furthermore, while investigating > its origins, I discovered that this problem persists across all > branches up to PG10 (the script needs slight adjustments to run it on > older versions). It's worth noting that this issue isn't a result of > recent version changes. > Hi, FWIW using the provided scripting I was also able to reproduce the problem on HEAD but for me, it was more rare. -- the script passed ok 3 times all 100 iterations; it eventually failed on the 4th run on the 75th iteration. == Kind Regards, Peter Smith. Fujitsu Australia
Re: Improving the latch handling between logical replication launcher and worker processes.
On Thu, 30 May 2024 at 08:46, Peter Smith wrote: > > On Wed, May 29, 2024 at 7:53 PM vignesh C wrote: > > > > On Wed, 29 May 2024 at 10:41, Peter Smith wrote: > > > > > > On Thu, Apr 25, 2024 at 6:59 PM vignesh C wrote: > > > > > > > > > > > c) Don't reset the latch at worker attach and allow launcher main to > > > > identify and handle it. For this there is a patch v6-0002 available at > > > > [2]. > > > > > > This option c seems the easiest. Can you explain what are the > > > drawbacks of using this approach? > > > > This solution will resolve the issue. However, one drawback to > > consider is that because we're not resetting the latch, in this > > scenario, the launcher process will need to perform an additional > > round of acquiring subscription details and determining whether the > > worker should start, regardless of any changes in subscriptions. > > > > Hmm. IIUC the WaitLatch of the Launcher.WaitForReplicationWorkerAttach > was not expecting to get notified. > > e.g.1. The WaitList comment in the function says so: > /* > * We need timeout because we generally don't get notified via latch > * about the worker attach. But we don't expect to have to wait long. > */ > > e.g.2 The logicalrep_worker_attach() function (which is AFAIK what > WaitForReplicationWorkerAttach was waiting for) is not doing any > SetLatch. So that matches what the comment said. > > ~~~ > > AFAICT the original problem reported by this thread happened because > the SetLatch (from CREATE SUBSCRIPTION) has been inadvertently gobbled > by the WaitForReplicationWorkerAttach.WaitLatch/ResetLatch which BTW > wasn't expecting to be notified at all. > > ~~~ > > Your option c removes the ResetLatch done by WaitForReplicationWorkerAttach: > > You said above that one drawback is "the launcher process will need to > perform an additional round of acquiring subscription details and > determining whether the worker should start, regardless of any changes > in subscriptions" > > I think you mean if some CREATE SUBSCRIPTION (i.e. SetLatch) happens > during the attaching of other workers then the latch would (now after > option c) remain set and so the WaitLatch of ApplyLauncherMain would > be notified and/or return immediately end causing an immediate > re-iteration of the "foreach(lc, sublist)" loop. > > But I don't understand why that is a problem. > > a) I didn't know what you meant "regardless of any changes in > subscriptions" because I think the troublesome SetLatch originated > from the CREATE SUBSCRIPTION and so there *is* a change to > subscriptions. The process of setting the latch unfolds as follows: Upon creating a new subscription, the launcher process initiates a request to the postmaster, prompting it to initiate a new apply worker process. Subsequently, the postmaster commences the apply worker process and dispatches a SIGUSR1 signal to the launcher process(this is done from do_start_bgworker & ReportBackgroundWorkerPID). Upon receiving this signal, the launcher process sets the latch. Now, there are two potential scenarios: a) Concurrent Creation of Another Subscription: In this situation, the launcher traverses the subscription list to detect the creation of a new subscription and proceeds to initiate a new apply worker for the concurrently created subscription. This is ok. b) Absence of Concurrent Subscription Creation: In this case, since the latch remains unset, the launcher iterates through the subscription list and identifies the absence of new subscriptions. This verification occurs as the latch remains unset. Here there is an additional check. I'm talking about the second scenario where no subscription is concurrently created. In this case, as the latch remains unset, we perform an additional check on the subscription list. There is no problem with this. This additional check can occur in the existing code too if the function WaitForReplicationWorkerAttach returns from the initial if check i.e. if the worker already started when this check happens. Regards, Vignesh