RE: Design of pg_stat_subscription_workers vs pgstats
On Saturday, February 26, 2022 11:51 AM Amit Kapila wrote: > I have reviewed the latest version and made a few changes along with fixing > some of the pending comments by Peter Smith. The changes are as > follows: (a) Removed m_databaseid in PgStat_MsgSubscriptionError as that is > not required now; (b) changed the struct name PgStat_MsgSubscriptionPurge > to PgStat_MsgSubscriptionDrop to make it similar to DropDb; (c) changed the > view name to pg_stat_subscription_stats, we can reconsider it in future if > there > is a consensus on some other name, accordingly changed the reset function > name to pg_stat_reset_subscription_stats; (d) moved some of the newly > added subscription stats functions adjacent to slots to main the consistency > in > code; (e) changed comments at few places; (f) added LATERAL back to > system_views query as we refer pg_subscription's oid in the function call, > previously that was not clear. > > Do let me know what you think of the attached? Hi, thank you for updating the patch ! I have a couple of comments on v4. (1) I'm not sure if I'm correct, but I'd say the sync_error_count can come next to the subname as the order of columns. I felt there's case that the column order is somewhat related to the time/processing order (I imagined pg_stat_replication's LSN related columns). If this was right, table sync related column could be the first column as a counter within this patch. (2) doc/src/sgml/monitoring.sgml +Resets statistics for a single subscription shown in the +pg_stat_subscription_stats view to zero. If +the argument is NULL, reset statistics for all +subscriptions. I felt we could improve the first sentence. From: Resets statistics for a single subscription shown in the.. To(idea1): Resets statistics for a single subscription defined by the argument to zero. Or, To(idea2): Resets statistics to zero for a single subscription or for all subscriptions. Best Regards, Takamichi Osumi
Re: [PATCH] pg_permissions
On Fri, Feb 25, 2022, at 22:12, Chapman Flack wrote: > I would be happy to review this patch, but a look through the email leaves me > thinking it may still be waiting on a C implementation of pg_get_acl(). Is > that > right? Not sure. > And perhaps a view rename to pg_privileges, following Peter's comment? +1 /Joel
Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)
On Sat, Feb 26, 2022 at 1:08 AM Nathan Bossart wrote: > > On Fri, Feb 25, 2022 at 08:31:37PM +0530, Bharath Rupireddy wrote: > > Thanks Satya and others for the inputs. Here's the v1 patch that > > basically allows async wal senders to wait until the sync standbys > > report their flush lsn back to the primary. Please let me know your > > thoughts. > > I haven't had a chance to look too closely yet, but IIUC this adds a new > function that waits for synchronous replication. This new function > essentially spins until the synchronous LSN has advanced. > > I don't think it's a good idea to block sending any WAL like this. AFAICT > it is possible that there will be a lot of synchronously replicated WAL > that we can send, and it might just be the last several bytes that cannot > yet be replicated to the asynchronous standbys. І believe this patch will > cause the server to avoid sending _any_ WAL until the synchronous LSN > advances. > > Perhaps we should instead just choose the SendRqstPtr based on the current > synchronous LSN. Presumably there are other things we'd need to consider, > but in general, I think we ought to send as much WAL as possible for a > given call to XLogSendPhysical(). A global min LSN of SendRqstPtr of all the sync standbys can be calculated and the async standbys can send WAL up to global min LSN. This is unlike what the v1 patch does i.e. async standbys will wait until the sync standbys report flush LSN back to the primary. Problem with the global min LSN approach is that there can still be a small window where async standbys can get ahead of sync standbys. Imagine async standbys being closer to the primary than sync standbys and if the failover has to happen while the WAL at SendRqstPtr isn't received by the sync standbys, but the async standbys can receive them as they are closer. We hit the same problem that we are trying to solve with this patch. This is the reason, we are waiting till the sync flush LSN as it guarantees more transactional protection. Do you think allowing async standbys optionally wait for either remote write or flush or apply or global min LSN of SendRqstPtr so that users can choose what they want? > > I've done pgbench testing to see if the patch causes any problems. I > > ran tests two times, there isn't much difference in the txns per > > seconds (tps), although there's a delay in the async standby receiving > > the WAL, after all, that's the feature we are pursuing. > > I'm curious what a longer pgbench run looks like when the synchronous > replicas are in the same region. That is probably a more realistic > use-case. We are performing more tests, I will share the results once done. Regards, Bharath Rupireddy.
Re: PATCH: add "--config-file=" option to pg_rewind
Am 26.02.22 um 06:51 schrieb Michael Paquier: On Fri, Feb 25, 2022 at 10:35:49AM +0100, Gunnar "Nick" Bluth wrote: Am 24.02.22 um 14:46 schrieb Daniel Gustafsson: Actually, I think this looks like a saner approach. Putting a config setting in two place (postgresql.conf and on the commandline for pg_rewind) is a recipe for them diverging. FWIW, I have a bad feeling about passing down directly a command through an option itself part of a command, so what's discussed on this thread is refreshing. Ta. + } else { + snprintf(postgres_cmd, sizeof(postgres_cmd), +"\"%s\" -D \"%s\" --config_file=\"%s\" -C restore_command", +postgres_exec_path, datadir_target, config_file); + } Shouldn't this one use appendShellString() on config_file? It probably should, yes. I don't fancy this repetitive code myself. But then, pg_rewind as a whole could use an overhaul. I don't see any use of PQExpBuffer in it, but a lot of char* ... I myself am not a C hacker (as you can probably tell already) and don't really feel fit (enough) to rewrite pg_rewind.c to use fe_utils/string_utils.h :/ I might give it a try anyhow, but that may be a bit... heavy,,, just to add one option? While we're at it (I'm really good at opening cans of worms): 09:47 $ grep -r -l --include \*.c fe_utils/string_utils.h src/bin/ | wc -l 28 09:48 $ grep -r -L --include \*.c fe_utils/string_utils.h src/bin/ | wc -l 66 GSOC project? ;-) Best, -- Gunnar "Nick" Bluth Eimermacherweg 106 D-48159 Münster Mobil +49 172 8853339 Email: gunnar.bl...@pro-open.de __ "Ceterum censeo SystemD esse delendam" - Cato
Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)
On Sat, Feb 26, 2022 at 3:22 AM Hsu, John wrote: > > > On Fri, Feb 25, 2022 at 08:31:37PM +0530, Bharath Rupireddy wrote: > >> Thanks Satya and others for the inputs. Here's the v1 patch that > >> basically allows async wal senders to wait until the sync standbys > >> report their flush lsn back to the primary. Please let me know your > >> thoughts. > > I haven't had a chance to look too closely yet, but IIUC this adds a new > > function that waits for synchronous replication. This new function > > essentially spins until the synchronous LSN has advanced. > > > > I don't think it's a good idea to block sending any WAL like this. AFAICT > > it is possible that there will be a lot of synchronously replicated WAL > > that we can send, and it might just be the last several bytes that cannot > > yet be replicated to the asynchronous standbys. І believe this patch will > > cause the server to avoid sending _any_ WAL until the synchronous LSN > > advances. > > > > Perhaps we should instead just choose the SendRqstPtr based on the current > > synchronous LSN. Presumably there are other things we'd need to consider, > > but in general, I think we ought to send as much WAL as possible for a > > given call to XLogSendPhysical(). > > I think you're right that we'll avoid sending any WAL until sync_lsn > advances. We could setup a contrived situation where the async-walsender > never advances because it terminates before the flush_lsn of the > synchronous_node catches up. And when the async-walsender restarts, > it'll start with the latest flushed on the primary and we could go into > a perpetual loop. The async walsender looks at flush LSN from walsndctl->lsn[SYNC_REP_WAIT_FLUSH]; after it comes up and decides to send the WAL up to it. If there are no sync replicats after it comes up (users can make sync standbys async without postmaster restart because synchronous_standby_names is effective with SIGHUP), then it doesn't wait at all and continues to send WAL. I don't see any problem with it. Am I missing something here? > I took a look at the patch and tested basic streaming with async > replicas ahead of the synchronous standby and with logical clients as > well and it works as expected. Thanks for reviewing and testing the patch. > > ereport(LOG, > >(errmsg("async standby WAL sender with request LSN %X/%X > is waiting as sync standbys are ahead with flush LSN %X/%X", > >LSN_FORMAT_ARGS(flushLSN), > LSN_FORMAT_ARGS(sendRqstPtr)), > > errhidestmt(true))); > > I think this log formatting is incorrect. > s/sync standbys are ahead/sync standbys are behind/ and I think you need > to swap flushLsn and sendRqstPtr I will correct it. "async standby WAL sender with request LSN %X/%X is waiting as sync standbys are ahead with flush LSN %X/%X", LSN_FORMAT_ARGS(sendRqstP), LSN_FORMAT_ARGS(flushLSN). I will think more about having better wording of these messages, any suggestions here? > When a walsender is waiting for the lsn on the synchronous replica to > advance and a database stop is issued to the writer, the pg_ctl stop > isn't able to proceed and the database seems to never shutdown. I too observed this once or twice. It looks like the walsender isn't detecting postmaster death in for (;;) with WalSndWait. Not sure if this is expected or true with other wait-loops in walsender code. Any more thoughts here? > > Assert(priority >= 0); > > What's the point of the assert here? Just for safety. I can remove it as the sync_standby_priority can never be negative. > Also the comments/code refer to AsyncStandbys, however it's also used > for logical clients, which may or may not be standbys. Don't feel too > strongly about the naming here but something to note. I will try to be more informative by adding something like "async standbys and logical replication subscribers". > > if (!ShouldWaitForSyncRepl()) > >return; > > ... > > for (;;) > > { > >// rest of work > > } > > If we had a walsender already waiting for an ack, and the conditions of > ShouldWaitForSyncRepl() change, such as disabling > async_standbys_wait_for_sync_replication or synchronous replication > it'll still wait since we never re-check the condition. Yeah, I will add the checks inside the async walsender wait-loop. > postgres=# select wait_event from pg_stat_activity where wait_event like > 'AsyncWal%'; >wait_event > -- > AsyncWalSenderWaitForSyncReplication > AsyncWalSenderWaitForSyncReplication > AsyncWalSenderWaitForSyncReplication > (3 rows) > > postgres=# show synchronous_standby_names; > synchronous_standby_names > --- > > (1 row) > > postgres=# show async_standbys_wait_for_sync_replication; > async_standbys_wait_for_sync_replication > -- > off > (1 row) > > >LWLockAcquire(SyncRepLock, LW_SHARED); > >flushLSN = walsndctl->lsn[SYNC_REP_WA
Re: C++ Trigger Framework
OK, great thank you, Buce! I'll check back in with any interesting results when I have something running (probably in a few months to a year since I don't yet know C++ very well :-)).
Document ordering guarantees on INSERT/UPDATE RETURNING clause
Hi all, I've seen various discussions around whether PG makes any guarantees on the ordering of rows returned by the RETURNING clause (e.g. [1]). In a nutshell, when executing a statement such as the following: CREATE TABLE foo (id INT PRIMARY KEY GENERATED ALWAYS AS IDENTITY, data INT); INSERT INTO foo (data) VALUES (8), (9), (10) RETURNING id, data; ... us the INSERT guaranteed to return the rows (1,8), (2,9) and (3,10) (and in that order)? This point is important when inserting multiple rows and wanting to e.g. match a database-generated ID back to memory structures on the client. FWIW I've received feedback from a SQL Server engineer that one definitely should *not* depend on such ordering there, and that future optimizations (e.g. parallel insertion of many rows) could result in row ordering which differs from the lexical ordering of the VALUES clause. That seems very reasonable; if the situation is similar on PostgreSQL, then I'd suggest making that very clear in the INSERT[2] and UPDATE[3] docs. I'd also possibly point to the workaround of wrapping the INSERT/UPDATE in a CTE which then defines the ordering. Thanks, Shay [1] https://stackoverflow.com/questions/5439293/is-insert-returning-guaranteed-to-return-things-in-the-right-order [2] https://www.postgresql.org/docs/current/sql-insert.html [3] https://www.postgresql.org/docs/current/sql-update.html
Re: Document ordering guarantees on INSERT/UPDATE RETURNING clause
On Sat, Feb 26, 2022 at 5:42 AM Shay Rojansky wrote: > FWIW I've received feedback from a SQL Server engineer that one definitely > should *not* depend on such ordering there, and that future optimizations > (e.g. parallel insertion of many rows) could result in row ordering which > differs from the lexical ordering of the VALUES clause. > > That seems very reasonable; if the situation is similar on PostgreSQL, > then I'd suggest making that very clear in the INSERT[2] and UPDATE[3] docs. > There is clearly no mention of such a guarantee in our documentation. David J.
Re: Document ordering guarantees on INSERT/UPDATE RETURNING clause
Hi, On Sat, Feb 26, 2022 at 06:25:22AM -0700, David G. Johnston wrote: > On Sat, Feb 26, 2022 at 5:42 AM Shay Rojansky wrote: > > > FWIW I've received feedback from a SQL Server engineer that one definitely > > should *not* depend on such ordering there, and that future optimizations > > (e.g. parallel insertion of many rows) could result in row ordering which > > differs from the lexical ordering of the VALUES clause. > > > > > > That seems very reasonable; if the situation is similar on PostgreSQL, > > then I'd suggest making that very clear in the INSERT[2] and UPDATE[3] docs. > > > > There is clearly no mention of such a guarantee in our documentation. Yes, which is just how SQL works: a set doesn't have any ordering unless an explicit one has been defined, RETURNING is no exception to that.
Re: why do hash index builds use smgrextend() for new splitpoint pages
On Fri, Feb 25, 2022 at 11:17 PM Amit Kapila wrote: > > On Sat, Feb 26, 2022 at 3:01 AM Melanie Plageman > wrote: > > > > Since _hash_alloc_buckets() WAL-logs the last page of the > > splitpoint, is it safe to skip the smgrimmedsync()? What if the last > > page of the splitpoint doesn't end up having any tuples added to it > > during the index build and the redo pointer is moved past the WAL for > > this page and then later there is a crash sometime before this page > > makes it to permanent storage. Does it matter that this page is lost? If > > not, then why bother WAL-logging it? > > > > I think we don't care if the page is lost before we update the > meta-page in the caller because we will try to reallocate in that > case. But we do care after meta page update (having the updated > information about this extension via different masks) in which case we > won't lose this last page because it would have registered the sync > request for it via sgmrextend before meta page update. and could it happen that during smgrextend() for the last page, a checkpoint starts and finishes between FileWrite() and register_dirty_segment(), then index build finishes, and then a crash occurs before another checkpoint completes the pending fsync for that last page?
Re: Allow async standbys wait for sync replication (was: Disallow quorum uncommitted (with synchronous standbys) txns in logical replication subscribers)
On Sat, Feb 26, 2022 at 02:17:50PM +0530, Bharath Rupireddy wrote: > A global min LSN of SendRqstPtr of all the sync standbys can be > calculated and the async standbys can send WAL up to global min LSN. > This is unlike what the v1 patch does i.e. async standbys will wait > until the sync standbys report flush LSN back to the primary. Problem > with the global min LSN approach is that there can still be a small > window where async standbys can get ahead of sync standbys. Imagine > async standbys being closer to the primary than sync standbys and if > the failover has to happen while the WAL at SendRqstPtr isn't received > by the sync standbys, but the async standbys can receive them as they > are closer. We hit the same problem that we are trying to solve with > this patch. This is the reason, we are waiting till the sync flush LSN > as it guarantees more transactional protection. Do you mean that the application of WAL gets ahead on your async standbys or that the writing/flushing of WAL gets ahead? If synchronous_commit is set to 'remote_write' or 'on', I think either approach can lead to situations where the async standbys are ahead of the sync standbys with WAL application. For example, a conflict between WAL replay and a query on your sync standby could delay WAL replay, but the primary will not wait for this conflict to resolve before considering a transaction synchronously replicated and sending it to the async standbys. If writing/flushing WAL gets ahead on async standbys, I think something is wrong with the patch. If you aren't sending WAL to async standbys until it is synchronously replicated to the sync standbys, it should by definition be impossible for this to happen. If you wanted to make sure that WAL was not applied to async standbys before it was applied to sync standbys, I think you'd need to set synchronous_commit to 'remote_apply'. This would ensure that the WAL is replayed on sync standbys before the primary considers the transaction synchronously replicated and sends it to the async standbys. > Do you think allowing async standbys optionally wait for either remote > write or flush or apply or global min LSN of SendRqstPtr so that users > can choose what they want? I'm not sure I follow the difference between "global min LSN of SendRqstPtr" and remote write/flush/apply. IIUC you are saying that we could use the LSN of what is being sent to sync standbys instead of the LSN of what the primary considers synchronously replicated. I don't think we should do that because it provides no guarantee that the WAL has even been sent to the sync standbys before it is sent to the async standbys. For this feature, I think we always need to consider what the primary considers synchronously replicated. My suggested approach doesn't change that. I'm saying that instead of spinning in a loop waiting for the WAL to be synchronously replicated, we just immediately send WAL up to the LSN that is presently known to be synchronously replicated. You do bring up an interesting point, though. Is there a use-case for specifying synchronous_commit='on' but not sending WAL to async replicas until it is synchronously applied? Or alternatively, would anyone want to set synchronous_commit='remote_apply' but send WAL to async standbys as soon as it is written to the sync standbys? My initial reaction is that we should depend on the synchronous replication setup. As long as the primary considers an LSN synchronously replicated, it would be okay to send it to the async standbys. I personally don't think it is worth taking on the extra complexity for that level of configuration just yet. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Mingw task for Cirrus CI
On 2/25/22 19:27, Andres Freund wrote: > Hi, > > Andrew, CCIng you both because you might be interested in the CI bit, and > because you might know the answer. > > > >> 2- src/pl/plpython/Makefile looks under "C:/Windows/system32/" for >> PYTHONDLL. gendef cannot open that file, give an error like " failed to >> open()" when creating a .def file. > On my win10 VM in which I installed python.org python I don't see a python dll > in c:/windows/system32 either. Looks like none of our windows mingw animals > build with python. So it might just be bitrot. There certainly was a time when python from python.org used to install its DLL in the system32 directory, so I imagine that's why it's there. I'm very glad to see that's no longer the case. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Fix a typo in pg_receivewal.c's get_destination_dir()
Hi, It looks like there's a typo in pg_receivewal.c's get_destination_dir(), that is, use the function parameter dest_folder instead of global variable basedir in the error message. Although there's no harm in it as basedir is a global variable in pg_receivewal.c, let's use the function parameter to be in sync with close_destination_dir. Attaching a tiny patch to fix it. Regards, Bharath Rupireddy. v1-0001-Fix-a-typo-in-pg_receivewal.c-s-get_destination_d.patch Description: Binary data
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested This patch applies cleanly for me and passes installcheck-world. I have not yet studied all of the changes in detail. Some proofreading nits in the documentation: the pg_stop_backup with arguments has lost the 'exclusive' argument, but still shows a comma before the 'wait_for_archive' argument. And the niladic pg_stop_backup is still documented, though it no longer exists. My biggest concerns are the changes to the SQL-visible pg_start_backup and pg_stop_backup functions. When the non-exclusive API was implemented (in 7117685), that was done with care (with a new optional argument to pg_start_backup, and a new overload of pg_stop_backup) to avoid immediate breakage of working backup scripts. With this patch, even scripts that were dutifully migrated to that new API and now invoke pg_start_backup(label, false) or (label, exclusive => false) will immediately and unnecessarily break. What I would suggest for this patch would be to change the exclusive default from true to false, and have the function report an ERROR if true is passed. Otherwise, for sites using a third-party backup solution, there will be an unnecessary requirement to synchronize a PostgreSQL upgrade with an upgrade of the backup solution that won't be broken by the change. For a site with their backup procedures scripted in-house, there will be an unnecessarily urgent need for the current admin team to study and patch the currently-working scripts. That can be avoided by just changing the default to false and rejecting calls where true is passed. That will break only scripts that never got the memo about moving to non-exclusive backup, available for six years now. Assuming the value is false, so no error is thrown, is it practical to determine from flinfo->fn_expr whether the value was defaulted or supplied? If so, I would further suggest reporting a deprecation WARNING if it was explicitly supplied, with a HINT that the argument can simply be removed at the call site, and will become unrecognized in some future release. pg_stop_backup needs thought, because 7117685 added a new overload for that function, rather than just an optional argument. This patch removes the old niladic version that returned pg_lsn, leaving just one version, with an optional argument, that returns a record. Here again, the old niladic one was only suitable for exclusive backups, so there can't be any script existing in 2022 that still calls that unless it has never been updated in six years to nonexclusive backups, and that breakage can't be helped. Any scripts that did get dutifully updated over the last six years will be calling the record-returning version, passing false, or exclusive => false. This patch as it stands will unnecessarily break those, but here again I think that can be avoided just by making the exclusive parameter optional with default false, and reporting an error if true is passed. Here again, I would consider also issuing a deprecation warning if the argument is explicitly supplied, if it is practical to determine that from fn_expr. (I haven't looked yet to see how practical that is.) Regards, -Chap
Re: Document ordering guarantees on INSERT/UPDATE RETURNING clause
> > > That seems very reasonable; if the situation is similar on PostgreSQL, > > > then I'd suggest making that very clear in the INSERT[2] and UPDATE[3] docs. > > > > There is clearly no mention of such a guarantee in our documentation. > > Yes, which is just how SQL works: a set doesn't have any ordering unless an > explicit one has been defined, RETURNING is no exception to that. Thanks for confirming that such a guarantee doesn't exist. I would still suggest explicitly calling that out in the docs around RETURNING, since that seems like an understand pitfall; personally-speaking, this certainly wasn't clear to me when first looking at it (even if it is now).
Re: Reducing power consumption on idle servers
On Mon, Feb 21, 2022 at 5:11 PM Simon Riggs wrote: > > On Sat, 19 Feb 2022 at 17:03, Andres Freund wrote: > > > > Hi, > > > > On 2022-02-19 14:10:39 +, Simon Riggs wrote: > > IMO we should instead consider either deprecating file based promotion, or > > adding an optional dependency on filesystem monitoring APIs (i.e. inotify > > etc) > > that avoid the need to poll for file creation. Came here to suggest exactly that :) > Deprecating explicit file-based promotion is possible and simple, so > that is the approach in the latest version of the patch. Is there any actual use-case for this other than backwards compatibility? The alternative has certainly been around for some time now, so if we don't know a specific use-case for the file-based one it's definitely time to deprecate it properly. -- Magnus Hagander Me: https://www.hagander.net/ Work: https://www.redpill-linpro.com/
Re: Reducing power consumption on idle servers
Magnus Hagander writes: >> Deprecating explicit file-based promotion is possible and simple, so >> that is the approach in the latest version of the patch. > Is there any actual use-case for this other than backwards > compatibility? The fundamental problem with signal-based promotion is that it's an edge-triggered rather than level-triggered condition, ie if you miss the single notification event you're screwed. I'm not sure why we'd want to go there, considering all the problems we're having with edge-triggered APIs in nearby threads. We could combine the two approaches, perhaps: have "pg_ctl promote" create a trigger file and then send a signal. The trigger file would record the existence of the promotion request, so that if the postmaster isn't running right now or misses the signal, it'd still promote when restarted or otherwise at the next opportunity. But with an approach like this, we could expect quick response to the signal in normal conditions, without need for constant wakeups to check for the file. Also, this route would allow overloading of the signal, since it would just mean "wake up, I asked you to do something" rather than needing to carry the full semantics of what is to be done. regards, tom lane
Missed condition-variable wakeups on FreeBSD
About once a month over the last six months, my buildfarm animal florican has gotten stuck while running the core regression tests. The symptoms have looked very much the same each time: there is a backend with two parallel worker processes that are just sitting and not consuming any CPU time. Each time I've attached to these processes with gdb to check their stack traces, and seen pretty much the same story every time (traces below). What is really interesting is that after I detach from the second worker, the processes resume running and finish out the test successfully! I don't know much about how gdb interacts with kernel calls on FreeBSD, but I speculate that the poll(2) call returns with EINTR after gdb releases the process, and then things resume fine, suggesting that we lost an interrupt somewhere. I have observed this three times in the REL_11 branch, once in REL_12, and a couple of times last summer before it occurred to me to start keeping notes. Over that time the machine has been running various patchlevels of FreeBSD 13.0. Here's the stack trace from the leader process in the most recent event (REL_11 as of yesterday). It's not always the same query that gets stuck, but it's always a parallel hash join: (gdb) p debug_query_string $1 = 0x21873090 "select count(*) from simple r join simple s using (id);" (gdb) bt #0 _poll () at _poll.S:4 #1 0x21701361 in __thr_poll (fds=0x219dc170, nfds=2, timeout=-1) at /usr/src/lib/libthr/thread/thr_syscalls.c:338 #2 0x215eaf3f in poll (pfd=0x219dc170, nfds=2, timeout=-1) at /usr/src/lib/libc/sys/poll.c:47 #3 0x0097b0fd in WaitEventSetWaitBlock (set=, cur_timeout=-1, occurred_events=, nevents=) at latch.c:1171 #4 WaitEventSetWait (set=0x219dc138, timeout=-1, occurred_events=, nevents=1, wait_event_info=134217738) at latch.c:1000 #5 0x0099842b in ConditionVariableSleep (cv=0x2bf0b230, wait_event_info=134217738) at condition_variable.c:157 #6 0x00977e12 in BarrierArriveAndWait (barrier=0x2bf0b218, wait_event_info=134217738) at barrier.c:191 #7 0x00873441 in ExecHashJoinImpl (pstate=, parallel=true) at nodeHashjoin.c:328 #8 ExecParallelHashJoin (pstate=0x2a887740) at nodeHashjoin.c:600 #9 0x0086a509 in ExecProcNode (node=0x2a887740) at ../../../src/include/executor/executor.h:248 #10 fetch_input_tuple (aggstate=aggstate@entry=0x2a887500) at nodeAgg.c:407 #11 0x00869646 in agg_retrieve_direct (aggstate=) at nodeAgg.c:1746 #12 ExecAgg (pstate=0x2a887500) at nodeAgg.c:1561 #13 0x0086e181 in ExecProcNode (node=0x2a887500) at ../../../src/include/executor/executor.h:248 #14 gather_getnext (gatherstate=0x2a887414) at nodeGather.c:276 #15 ExecGather (pstate=0x2a887414) at nodeGather.c:207 #16 0x0086a509 in ExecProcNode (node=0x2a887414) at ../../../src/include/executor/executor.h:248 #17 fetch_input_tuple (aggstate=aggstate@entry=0x2a8871b8) at nodeAgg.c:407 #18 0x00869646 in agg_retrieve_direct (aggstate=) at nodeAgg.c:1746 #19 ExecAgg (pstate=0x2a8871b8) at nodeAgg.c:1561 #20 0x0085956b in ExecProcNode (node=0x2a8871b8) at ../../../src/include/executor/executor.h:248 #21 ExecutePlan (estate=0x2a887090, planstate=0x2a8871b8, use_parallel_mode=, operation=CMD_SELECT, sendTuples=, numberTuples=, direction=, dest=0x2ab629b4, execute_once=) at execMain.c:1712 #22 standard_ExecutorRun (queryDesc=0x218a1490, direction=ForwardScanDirection, count=0, execute_once=) at execMain.c:353 #23 0x00859445 in ExecutorRun (queryDesc=0x218a1490, direction=ForwardScanDirection, count=0, execute_once=) at execMain.c:296 #24 0x009a3d57 in PortalRunSelect (portal=portal@entry=0x2197f090, forward=, count=0, dest=0x2ab629b4) at pquery.c:941 #25 0x009a39d6 in PortalRun (portal=0x2197f090, count=2147483647, isTopLevel=, run_once=, dest=0x2ab629b4, altdest=0x2ab629b4, completionTag=0xffbfdc70 "") at pquery.c:782 #26 0x009a2b53 in exec_simple_query (query_string=query_string@entry=0x21873090 "select count(*) from simple r join simple s using (id);") at postgres.c:1145 #27 0x009a04ec in PostgresMain (argc=1, argv=0x21954ed4, dbname=, username=0x21954d38 "buildfarm") at postgres.c:4052 Worker 1 looks like this: (gdb) bt #0 _poll () at _poll.S:4 #1 0x21701361 in __thr_poll (fds=0x2187cb48, nfds=2, timeout=-1) at /usr/src/lib/libthr/thread/thr_syscalls.c:338 #2 0x215eaf3f in poll (pfd=0x2187cb48, nfds=2, timeout=-1) at /usr/src/lib/libc/sys/poll.c:47 #3 0x0097b0fd in WaitEventSetWaitBlock (set=, cur_timeout=-1, occurred_events=, nevents=) at latch.c:1171 #4 WaitEventSetWait (set=0x2187cb10, timeout=-1, occurred_events=, nevents=1, wait_event_info=134217738) at latch.c:1000 #5 0x0099842b in ConditionVariableSleep (cv=0x2196b230, wait_event_info=134217738) at condition_variable.c:157 #6 0x00977e12 in BarrierArriveAndWait (barrier=0x2196b218, wait_event_info=134217738) at barrier.c:191 #7 0x00873441 in ExecHashJoinImpl (pstate=, parallel=true) at nodeHashjoin.c:328 #8 ExecParallelHashJoin (pstate=0x2a8378d8) at nodeHashjoin.c:600 #9 0x00
Re: Missed condition-variable wakeups on FreeBSD
Hi, On 2022-02-26 14:07:05 -0500, Tom Lane wrote: > About once a month over the last six months, my buildfarm animal > florican has gotten stuck while running the core regression tests. > The symptoms have looked very much the same each time: there is > a backend with two parallel worker processes that are just sitting > and not consuming any CPU time. Each time I've attached to these > processes with gdb to check their stack traces, and seen pretty > much the same story every time (traces below). What is really > interesting is that after I detach from the second worker, the > processes resume running and finish out the test successfully! > I don't know much about how gdb interacts with kernel calls on > FreeBSD, but I speculate that the poll(2) call returns with EINTR > after gdb releases the process, and then things resume fine, > suggesting that we lost an interrupt somewhere. > > I have observed this three times in the REL_11 branch, once > in REL_12, and a couple of times last summer before it occurred > to me to start keeping notes. Over that time the machine has > been running various patchlevels of FreeBSD 13.0. It's certainly interesting that it appears to happen only in the branches using poll rather than kqueue to implement latches. That changed between 12 and 13. Have you tried running the core regression tests with force_parallel_mode = on, or with the parallel costs lowered, to see if that makes the problem appear more often? > Here's the stack trace from the leader process in the most > recent event (REL_11 as of yesterday). It's not always the > same query that gets stuck, but it's always a parallel hash join: Which does make me wonder if it's a latch issue or a logic issue in hashjoin / barriers. HJ is the only user of barrier.c... There's this commit, which subsequently was reverted due to some issue, and not re-applied since... commit 4e0f0995e923948631c4114ab353b256b51b58ad Author: Thomas Munro Date: 2021-03-17 17:46:39 +1300 Fix race in Parallel Hash Join batch cleanup. It doesn't seem super likely to be related, but... > (gdb) p debug_query_string > $1 = 0x21873090 "select count(*) from simple r join simple s using (id);" > (gdb) bt > #0 _poll () at _poll.S:4 > #1 0x21701361 in __thr_poll (fds=0x219dc170, nfds=2, timeout=-1) at > /usr/src/lib/libthr/thread/thr_syscalls.c:338 > #2 0x215eaf3f in poll (pfd=0x219dc170, nfds=2, timeout=-1) at > /usr/src/lib/libc/sys/poll.c:47 > #3 0x0097b0fd in WaitEventSetWaitBlock (set=, cur_timeout=-1, > occurred_events=, nevents=) at latch.c:1171 The next time this happens / if you still have this open, perhaps it could be worth checking if there's a byte in the self pipe? > Thoughts? Ideas on debugging this? Besides trying to make the issue more likely as suggested above, it might be worth checking if signalling the stuck processes with SIGUSR1 gets them unstuck. Greetings, Andres Freund
Re: Missed condition-variable wakeups on FreeBSD
On Sat, Feb 26, 2022 at 02:07:05PM -0500, Tom Lane wrote: > I don't know much about how gdb interacts with kernel calls on > FreeBSD, but I speculate that the poll(2) call returns with EINTR > after gdb releases the process, and then things resume fine, > suggesting that we lost an interrupt somewhere. I've seen some similar interactions with strace under linux causing a process to be woken up or otherwise incur a different behavior (not necessarily postgres). > Thoughts? Ideas on debugging this? Before attaching a debugger, figure out what syscall each process is in. In linux, that's ps O wchan PID. >> Besides trying to make the issue more likely as suggested above, it might be >> worth checking if signalling the stuck processes with SIGUSR1 gets them >> unstuck. And SIGCONT. Maybe already did this, but you can dump a corefile of the running processes to allow future inspection. -- Justin
Re: explain_regress, explain(MACHINE), and default to explain(BUFFERS) (was: BUFFERS enabled by default in EXPLAIN (ANALYZE))
Rebased over ebf6c5249b7db525e59563fb149642665c88f747. It looks like that patch handles only query_id, and this patch also tries to handle a bunch of other stuff. If it's helpful, feel free to kick this patch to a future CF. >From e58fffedc6f1cf471228fb3234faba35898678c3 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sat, 22 Feb 2020 21:17:10 -0600 Subject: [PATCH 1/6] Add GUC: explain_regress This changes the defaults for explain to: costs off, timing off, summary off. It'd be reasonable to use this for new regression tests which are not intended to be backpatched. --- contrib/postgres_fdw/postgres_fdw.c | 2 +- src/backend/commands/explain.c | 13 +++-- src/backend/utils/misc/guc.c| 13 + src/bin/psql/describe.c | 2 +- src/include/commands/explain.h | 2 ++ src/test/regress/expected/explain.out | 3 +++ src/test/regress/expected/inherit.out | 2 +- src/test/regress/expected/stats_ext.out | 2 +- src/test/regress/pg_regress.c | 3 ++- src/test/regress/sql/explain.sql| 4 src/test/regress/sql/inherit.sql| 2 +- src/test/regress/sql/stats_ext.sql | 2 +- 12 files changed, 41 insertions(+), 9 deletions(-) diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c index 56654844e8..54da40d662 100644 --- a/contrib/postgres_fdw/postgres_fdw.c +++ b/contrib/postgres_fdw/postgres_fdw.c @@ -3124,7 +3124,7 @@ estimate_path_cost_size(PlannerInfo *root, * values, so don't request params_list. */ initStringInfo(&sql); - appendStringInfoString(&sql, "EXPLAIN "); + appendStringInfoString(&sql, "EXPLAIN (COSTS)"); deparseSelectStmtForRel(&sql, root, foreignrel, fdw_scan_tlist, remote_conds, pathkeys, fpextra ? fpextra->has_final_sort : false, diff --git a/src/backend/commands/explain.c b/src/backend/commands/explain.c index de81379da3..22a3cf6349 100644 --- a/src/backend/commands/explain.c +++ b/src/backend/commands/explain.c @@ -154,6 +154,7 @@ static void ExplainJSONLineEnding(ExplainState *es); static void ExplainYAMLLineStarting(ExplainState *es); static void escape_yaml(StringInfo buf, const char *str); +bool explain_regress = false; /* GUC */ /* @@ -172,6 +173,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, ListCell *lc; bool timing_set = false; bool summary_set = false; + bool costs_set = false; /* Parse options list. */ foreach(lc, stmt->options) @@ -183,7 +185,11 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, else if (strcmp(opt->defname, "verbose") == 0) es->verbose = defGetBoolean(opt); else if (strcmp(opt->defname, "costs") == 0) + { + /* Need to keep track if it was explicitly set to ON */ + costs_set = true; es->costs = defGetBoolean(opt); + } else if (strcmp(opt->defname, "buffers") == 0) es->buffers = defGetBoolean(opt); else if (strcmp(opt->defname, "wal") == 0) @@ -227,13 +233,16 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, parser_errposition(pstate, opt->location))); } + /* if the costs option was not set explicitly, set default value */ + es->costs = (costs_set) ? es->costs : es->costs && !explain_regress; + if (es->wal && !es->analyze) ereport(ERROR, (errcode(ERRCODE_INVALID_PARAMETER_VALUE), errmsg("EXPLAIN option WAL requires ANALYZE"))); /* if the timing was not set explicitly, set default value */ - es->timing = (timing_set) ? es->timing : es->analyze; + es->timing = (timing_set) ? es->timing : es->analyze && !explain_regress; /* check that timing is used with EXPLAIN ANALYZE */ if (es->timing && !es->analyze) @@ -242,7 +251,7 @@ ExplainQuery(ParseState *pstate, ExplainStmt *stmt, errmsg("EXPLAIN option TIMING requires ANALYZE"))); /* if the summary was not set explicitly, set default value */ - es->summary = (summary_set) ? es->summary : es->analyze; + es->summary = (summary_set) ? es->summary : es->analyze && !explain_regress; query = castNode(Query, stmt->query); if (IsQueryIdEnabled()) diff --git a/src/backend/utils/misc/guc.c b/src/backend/utils/misc/guc.c index 1e3650184b..87266cf7bd 100644 --- a/src/backend/utils/misc/guc.c +++ b/src/backend/utils/misc/guc.c @@ -46,6 +46,7 @@ #include "catalog/pg_authid.h" #include "catalog/storage.h" #include "commands/async.h" +#include "commands/explain.h" #include "commands/prepare.h" #include "commands/tablespace.h" #include "commands/trigger.h" @@ -1474,6 +1475,18 @@ static struct config_bool ConfigureNamesBool[] = true, NULL, NULL, NULL }, + + { + {"explain_regress", PGC_USERSET, DEVELOPER_OPTIONS, + gettext_noop("Change defaults of EXPLAIN to avoid unstable output."), + NULL, + GUC_NOT_IN_SAMPLE | GUC_EXPLAIN + }, + &explain_regress, + false, + NULL, NULL, NULL + }, + { {"log_parser_stats", PGC_SUSET, STATS_MONITORING, gettext_noop("Writes parser performance statistics to the server log."), di
Re: Commitfest manager for 2022-03
On Sat, 26 Feb 2022 at 01:33, Julien Rouhaud wrote: > > On Sat, Feb 26, 2022 at 02:42:33PM +0900, Michael Paquier wrote: > > On Fri, Feb 25, 2022 at 01:58:55PM -0600, David Steele wrote: > > > On 2/25/22 12:39, Greg Stark wrote: > > >> I would like to volunteer. > > > > > I've been hoping somebody would volunteer, so I'm all in favor of you > > > being > > > CF. > > > > Greg as CFM would be fine. It's nice to see someone volunteer. > > +1, thanks a lot Greg! > > Note that the last CF of a major version is probably even more work than > "regular" CF, so if other people are interested there's probably room for > multiple CFMs. I do have the time available. What I don't necessarily have is insight into everything that needs to be done, especially behind the scenes. So if I could have someone to give me tips about things I'm missing or review the things I am doing to see that I've covered everything that would be great. -- greg
Checkpointer sync queue fills up / loops around pg_usleep() are bad
Hi, In two recent investigations in occasional test failures (019_replslot_limit.pl failures, AIO rebase) the problems are somehow tied to checkpointer. I don't yet know if actually causally related to precisely those failures, but when running e.g. 027_stream_regress.pl, I see phases in which many backends are looping in RegisterSyncRequest() repeatedly, each time sleeping with pg_usleep(1L). Without adding instrumentation this is completely invisible at any log level. There's no log messages, there's no wait events, nothing. ISTM, we should not have any loops around pg_usleep(). And shorter term, we shouldn't have any loops around pg_usleep() that don't emit log messages / set wait events. Therefore I propose that we "prohibit" such loops without at least a DEBUG2 elog() or so. It's just too hard to debug. The reason for the sync queue filling up in 027_stream_regress.pl is actually fairly simple: 1) The test runs with shared_buffers = 1MB, leading to a small sync queue of 128 entries. 2) CheckpointWriteDelay() does pg_usleep(10L) ForwardSyncRequest() wakes up the checkpointer using SetLatch() if the sync queue is more than half full. But at least on linux and freebsd that doesn't actually interrupt pg_usleep() anymore (due to using signalfd / kqueue rather than a signal handler). And on all platforms the signal might arrive just before the pg_usleep() rather than during, also not causing usleep to be interrupted. If I shorten the sleep in CheckpointWriteDelay() the problem goes away. This actually reduces the time for a single run of 027_stream_regress.pl on my workstation noticably. With default sleep time it's ~32s, with shortened time it's ~27s. I suspect we need to do something about this concrete problem for 14 and master, because it's certainly worse than before on linux / freebsd. I suspect the easiest is to just convert that usleep to a WaitLatch(). That'd require adding a new enum value to WaitEventTimeout in 14. Which probably is fine? Greetings, Andres Freund
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
On 02/26/22 11:48, Chapman Flack wrote: > This patch applies cleanly for me and passes installcheck-world. > I have not yet studied all of the changes in detail. I've now looked through the rest, and the only further thing I noticed was that xlog.c's do_pg_start_backup still has a tablespaces parameter to receive a List* of tablespaces if the caller wants, but this patch removes the comment describing it: - * If "tablespaces" isn't NULL, it receives a list of tablespaceinfo structs - * describing the cluster's tablespaces. which seems like collateral damage. Regards, -Chap
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
On Sat, Feb 26, 2022 at 04:48:52PM +, Chapman Flack wrote: > My biggest concerns are the changes to the SQL-visible pg_start_backup > and pg_stop_backup functions. When the non-exclusive API was implemented > (in 7117685), that was done with care (with a new optional argument to > pg_start_backup, and a new overload of pg_stop_backup) to avoid immediate > breakage of working backup scripts. > > With this patch, even scripts that were dutifully migrated to that new API and > now invoke pg_start_backup(label, false) or (label, exclusive => false) will > immediately and unnecessarily break. What I would suggest for this patch > would be to change the exclusive default from true to false, and have the > function report an ERROR if true is passed. > > Otherwise, for sites using a third-party backup solution, there will be an > unnecessary requirement to synchronize a PostgreSQL upgrade with an > upgrade of the backup solution that won't be broken by the change. For > a site with their backup procedures scripted in-house, there will be an > unnecessarily urgent need for the current admin team to study and patch > the currently-working scripts. > > That can be avoided by just changing the default to false and rejecting calls > where true is passed. That will break only scripts that never got the memo > about moving to non-exclusive backup, available for six years now. > > Assuming the value is false, so no error is thrown, is it practical to > determine > from flinfo->fn_expr whether the value was defaulted or supplied? If so, I > would > further suggest reporting a deprecation WARNING if it was explicitly supplied, > with a HINT that the argument can simply be removed at the call site, and will > become unrecognized in some future release. This is a good point. I think I agree with your proposed changes. I believe it is possible to add a deprecation warning only when 'exclusive' is specified. If anything, we can create a separate function that accepts the 'exclusive' parameter and that always emits a NOTICE or WARNING. > pg_stop_backup needs thought, because 7117685 added a new overload for that > function, rather than just an optional argument. This patch removes the old > niladic version that returned pg_lsn, leaving just one version, with an > optional > argument, that returns a record. > > Here again, the old niladic one was only suitable for exclusive backups, so > there > can't be any script existing in 2022 that still calls that unless it has > never been > updated in six years to nonexclusive backups, and that breakage can't be > helped. > > Any scripts that did get dutifully updated over the last six years will be > calling the > record-returning version, passing false, or exclusive => false. This patch as > it > stands will unnecessarily break those, but here again I think that can be > avoided > just by making the exclusive parameter optional with default false, and > reporting > an error if true is passed. > > Here again, I would consider also issuing a deprecation warning if the > argument > is explicitly supplied, if it is practical to determine that from fn_expr. (I > haven't > looked yet to see how practical that is.) Agreed. I will look into updating this one, too. I think the 'exclusive' parameter should remain documented for now for both pg_start_backup() and pg_stop_backup(), but this documentation will just note that it is there for backward compatibility and must be set to false. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Postgres restart in the middle of exclusive backup and the presence of backup_label file
On Sat, Feb 26, 2022 at 05:03:04PM -0500, Chapman Flack wrote: > I've now looked through the rest, and the only further thing I noticed > was that xlog.c's do_pg_start_backup still has a tablespaces parameter > to receive a List* of tablespaces if the caller wants, but this patch > removes the comment describing it: > > > - * If "tablespaces" isn't NULL, it receives a list of tablespaceinfo structs > - * describing the cluster's tablespaces. > > > which seems like collateral damage. Thanks. I will fix this and the proofreading nits you noted upthread in the next revision. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Re: Missed condition-variable wakeups on FreeBSD
On Sat, Feb 26, 2022 at 2:07 PM Tom Lane wrote: > > About once a month over the last six months, my buildfarm animal > florican has gotten stuck while running the core regression tests. > The symptoms have looked very much the same each time: there is > a backend with two parallel worker processes that are just sitting > and not consuming any CPU time. Each time I've attached to these > processes with gdb to check their stack traces, and seen pretty > much the same story every time (traces below). What is really > interesting is that after I detach from the second worker, the > processes resume running and finish out the test successfully! > I don't know much about how gdb interacts with kernel calls on > FreeBSD, but I speculate that the poll(2) call returns with EINTR > after gdb releases the process, and then things resume fine, > suggesting that we lost an interrupt somewhere. > > I have observed this three times in the REL_11 branch, once > in REL_12, and a couple of times last summer before it occurred > to me to start keeping notes. Over that time the machine has > been running various patchlevels of FreeBSD 13.0. > > Here's the stack trace from the leader process in the most > recent event (REL_11 as of yesterday). It's not always the > same query that gets stuck, but it's always a parallel hash join: > > (gdb) p debug_query_string > $1 = 0x21873090 "select count(*) from simple r join simple s using (id);" > (gdb) bt > #0 _poll () at _poll.S:4 > #1 0x21701361 in __thr_poll (fds=0x219dc170, nfds=2, timeout=-1) at > /usr/src/lib/libthr/thread/thr_syscalls.c:338 > #2 0x215eaf3f in poll (pfd=0x219dc170, nfds=2, timeout=-1) at > /usr/src/lib/libc/sys/poll.c:47 > #3 0x0097b0fd in WaitEventSetWaitBlock (set=, cur_timeout=-1, > occurred_events=, nevents=) at latch.c:1171 > #4 WaitEventSetWait (set=0x219dc138, timeout=-1, occurred_events= out>, nevents=1, wait_event_info=134217738) at latch.c:1000 > #5 0x0099842b in ConditionVariableSleep (cv=0x2bf0b230, > wait_event_info=134217738) at condition_variable.c:157 > #6 0x00977e12 in BarrierArriveAndWait (barrier=0x2bf0b218, > wait_event_info=134217738) at barrier.c:191 > #7 0x00873441 in ExecHashJoinImpl (pstate=, parallel=true) at > nodeHashjoin.c:328 > #8 ExecParallelHashJoin (pstate=0x2a887740) at nodeHashjoin.c:600 > #9 0x0086a509 in ExecProcNode (node=0x2a887740) at > ../../../src/include/executor/executor.h:248 > #10 fetch_input_tuple (aggstate=aggstate@entry=0x2a887500) at nodeAgg.c:407 > #11 0x00869646 in agg_retrieve_direct (aggstate=) at > nodeAgg.c:1746 > #12 ExecAgg (pstate=0x2a887500) at nodeAgg.c:1561 > #13 0x0086e181 in ExecProcNode (node=0x2a887500) at > ../../../src/include/executor/executor.h:248 > #14 gather_getnext (gatherstate=0x2a887414) at nodeGather.c:276 > #15 ExecGather (pstate=0x2a887414) at nodeGather.c:207 > #16 0x0086a509 in ExecProcNode (node=0x2a887414) at > ../../../src/include/executor/executor.h:248 > #17 fetch_input_tuple (aggstate=aggstate@entry=0x2a8871b8) at nodeAgg.c:407 > #18 0x00869646 in agg_retrieve_direct (aggstate=) at > nodeAgg.c:1746 > #19 ExecAgg (pstate=0x2a8871b8) at nodeAgg.c:1561 > #20 0x0085956b in ExecProcNode (node=0x2a8871b8) at > ../../../src/include/executor/executor.h:248 > #21 ExecutePlan (estate=0x2a887090, planstate=0x2a8871b8, > use_parallel_mode=, operation=CMD_SELECT, > sendTuples=, > numberTuples=, direction=, dest=0x2ab629b4, > execute_once=) at execMain.c:1712 > #22 standard_ExecutorRun (queryDesc=0x218a1490, > direction=ForwardScanDirection, count=0, execute_once=) at > execMain.c:353 > #23 0x00859445 in ExecutorRun (queryDesc=0x218a1490, > direction=ForwardScanDirection, count=0, execute_once=) at > execMain.c:296 > #24 0x009a3d57 in PortalRunSelect (portal=portal@entry=0x2197f090, > forward=, count=0, dest=0x2ab629b4) at pquery.c:941 > #25 0x009a39d6 in PortalRun (portal=0x2197f090, count=2147483647, > isTopLevel=, run_once=, dest=0x2ab629b4, > altdest=0x2ab629b4, > completionTag=0xffbfdc70 "") at pquery.c:782 > #26 0x009a2b53 in exec_simple_query > (query_string=query_string@entry=0x21873090 "select count(*) from simple r > join simple s using (id);") at postgres.c:1145 > #27 0x009a04ec in PostgresMain (argc=1, argv=0x21954ed4, dbname= out>, username=0x21954d38 "buildfarm") at postgres.c:4052 > > Worker 1 looks like this: > > (gdb) bt > #0 _poll () at _poll.S:4 > #1 0x21701361 in __thr_poll (fds=0x2187cb48, nfds=2, timeout=-1) at > /usr/src/lib/libthr/thread/thr_syscalls.c:338 > #2 0x215eaf3f in poll (pfd=0x2187cb48, nfds=2, timeout=-1) at > /usr/src/lib/libc/sys/poll.c:47 > #3 0x0097b0fd in WaitEventSetWaitBlock (set=, cur_timeout=-1, > occurred_events=, nevents=) at latch.c:1171 > #4 WaitEventSetWait (set=0x2187cb10, timeout=-1, occurred_events= out>, nevents=1, wait_event_info=134217738) at latch.c:1000 > #5 0x0099842b in ConditionVariableSleep (cv=0x2196b230, > wait_event_info=134217738) at condition_variable.c:157 > #6 0x00977e
Re: [Proposal] Global temporary tables
I read through this. Find attached some language fixes. You should be able to apply each "fix" patch on top of your own local branch with git am, and then squish them together. Let me know if you have trouble with that. I think get_seqence_start_value() should be static. (Or otherwise, it should be in lsyscache.c). The include added to execPartition.c seems to be unused. +#define RELATION_IS_TEMP_ON_CURRENT_SESSION(relation) \ +#define RELATION_IS_TEMP(relation) \ +#define RelpersistenceTsTemp(relpersistence) \ +#define RELATION_GTT_ON_COMMIT_DELETE(relation)\ => These macros can evaluate their arguments multiple times. You should add a comment to warn about that. And maybe avoid passing them a function argument, like: RelpersistenceTsTemp(get_rel_persistence(rte->relid)) +list_all_backend_gtt_frozenxids should return TransactionId not int. The function name should say "oldest" and not "all" ? I think the GUC should have a longer name. max_active_gtt is too short for a global var. +#defineMIN_NUM_ACTIVE_GTT 0 +#defineDEFAULT_NUM_ACTIVE_GTT 1000 +#defineMAX_NUM_ACTIVE_GTT 100 +intmax_active_gtt = MIN_NUM_ACTIVE_GTT It's being initialized to MIN, but then the GUC machinery sets it to DEFAULT. By convention, it should be initialized to default. fout->remoteVersion >= 14 => should say 15 describe.c has gettext_noop("session"), which is a half-truth. The data is per-session but the table definition is persistent.. You redirect stats from pg_class and pg_statistics to a local hash table. This is pretty hairy :( I guess you'd also need to handle pg_statistic_ext and ext_data. pg_stats doesn't work, since the data isn't in pg_statistic - it'd need to look at pg_get_gtt_statistics. I wonder if there's a better way to do it, like updating pg_statistic but forcing the changes to be rolled back when the session ends... But I think that would make longrunning sessions behave badly, the same as "longrunning transactions". Have you looked at Gilles Darold's GTT extension ? >From b89f3cc5c78e7b4c3e10ab39ef527b524d0d112d Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Sat, 1 Jan 2022 17:02:30 -0600 Subject: [PATCH 02/11] f!0002-gtt-v64-doc.patch --- doc/src/sgml/ref/create_table.sgml | 42 -- 1 file changed, 22 insertions(+), 20 deletions(-) diff --git a/doc/src/sgml/ref/create_table.sgml b/doc/src/sgml/ref/create_table.sgml index 408373323c..9cd5fc17f2 100644 --- a/doc/src/sgml/ref/create_table.sgml +++ b/doc/src/sgml/ref/create_table.sgml @@ -174,18 +174,18 @@ WITH ( MODULUS numeric_literal, REM If specified, the table is created as a temporary table. Optionally, GLOBAL or LOCAL - can be written before TEMPORARY or TEMP. + may be written before TEMPORARY or TEMP. They represent two types of temporary tables supported by PostgreSQL: - global temporary table and local temporary table. Without specified - GLOBAL or LOCAL, a local temporary table is created by default. + global temporary table and local temporary table. If neither + GLOBAL or LOCAL is specified, a local temporary table is created by default. - Both types of temporary tables’ data are truncated at the + Data of both types of temporary tables is truncated at the end of a session or optionally at the end of the current transaction. - (see ON COMMIT below). For global temporary table, - its schema is reserved and reused by future sessions or transactions. - For local temporary table, both its data and its schema are dropped. + (see ON COMMIT below). For a global temporary table, + its table definition is preserved and reused by future sessions or transactions. + For a local temporary table, both its data and its table definition are dropped. @@ -194,10 +194,10 @@ WITH ( MODULUS numeric_literal, REM Global temporary table are defined just once and automatically exist -(starting with empty contents) in every session that needs them. -The schema definition of temporary tables is persistent and shared among sessions. -However, the data in temporary tables are kept private to sessions themselves, -even though they use same name and same schema. +(initially with empty contents) in every session. +The schema definition of a temporary table is persistent and common to all sessions. +However, the data in temporary tables is private to each sessions, +even though they share the same name and same table definition. @@ -206,15 +206,15 @@ WITH ( MODULUS numeric_literal, REM Local Temporary Table - Local temporary table are automatically dropped at the end of a - session (include schema and data). Future sessions need to create - their own temporary tables when they are used. + Local tem
Re: Commitfest manager for 2022-03
Can I suggest to update the CF APP to allow: | Target version: 16 I also suggest to update patches to indicate which are (not) being considered for v15. A few specific ones from myself: |https://commitfest.postgresql.org/37/2573/ |pg_dump - read data for some options from external fileReady for Committer 15 Marked RFC by Daniel Gustafsson since 2021-10-01. Is it more likely than not to be included in v15, or should the "v15" be removed ? |https://commitfest.postgresql.org/37/3499/ |libpq compression Waiting on Author 15 I re-raised the same concerns made ~12 months ago but haven't heard back. WOA since 35 days. Unlikely to be considered/included in v15. |https://commitfest.postgresql.org/37/3571/ |Add LZ4 compression in pg_dump Needs review This patch was just submitted on 2022-02-25. I did a lot of the same things this patch does for a previous patch submission (32/2888) for pg_dump/ZSTD, so I could review this, if there were interest to include it in v15. |https://commitfest.postgresql.org/37/2349/ |Global temporary table Needs review15 The handling/hijacking of pg_class and pg_statistic is not likely to be acceptable. I doubt this will be progressed for v15. |https://commitfest.postgresql.org/37/3448/ |reduce impact of lengthy startup and checkpoint tasks Needs review15 I think this is in the design/concept phase, and not likely for v15, except maybe in reduced scope. |https://commitfest.postgresql.org/37/2906/ |Row filtering for logical replication If I'm not wrong, this is merged and should be closed? For my own patches, it'd be helpful if someone would suggest if any are (not) likely to be progressed for v15. These are marked RFC, but have no "committer" set. Are they all targetting v15? https://commitfest.postgresql.org/37/2863/ Function to log backtrace of postgres processes Ready for Committer Should target v15 ? https://commitfest.postgresql.org/37/2897/ Faster pglz compression Ready for Committer 15 Tomas, are you still planning to merge this one ? https://commitfest.postgresql.org/37/2992/ Allow batched insert during cross-partition updates Ready for Committer https://commitfest.postgresql.org/37/3073/ Add callback table access method to reset filenode when dropping relation Ready for Committer https://commitfest.postgresql.org/37/3384/ use has_privs_for_role for predefined roles Ready for Committer 15 Do all of the RFC patches target v15 (except bugfixes) ? https://commitfest.postgresql.org/37/?status=3 These are not marked as targetting any version .. are any of them being considered for v15 ? https://commitfest.postgresql.org/37/2903/ Parallel Hash Full Join Needs review https://commitfest.postgresql.org/37/3508/ Avoid smgrimmedsync() during index build and add unbuffered IO API Needs review https://commitfest.postgresql.org/37/3182/ automatically generating node support functions Needs review https://commitfest.postgresql.org/37/3183/ Detectable crashes and unlogged table resetsNeeds review Jeff Davis (jdavis) These are set as "targetting v15", but have no committer set. If someone thinks it's unrealistic they'll be included in v15, I suggest to update not to say so. https://commitfest.postgresql.org/37/3272/ Add system view tracking shared buffer actions Needs review15 https://commitfest.postgresql.org/37/3224/ Fix ExecRTCheckPerms() inefficiency with many prunable partitions Needs review15 https://commitfest.postgresql.org/37/3270/ Cache tuple routing info during bulk loads into partitioned tables Needs review15 https://commitfest.postgresql.org/37/3461/ In-place persistence change of a relation (fast ALTER TABLE ... SET LOGGED with wal_level=minimal) Needs review15 https://commitfest.postgresql.org/37/3539/ Allow parallel plan for referential integrity checksNeeds review 15 https://commitfest.postgresql.org/37/3397/ Prefetching in recovery, take IINeeds review15 Thomas Munro (macdice) https://commitfest.postgresql.org/37/2482/ jsonpath syntax extensions Needs review15 https://commitfest.postgresql.org/37/2901/ SQL/JSON: functions Needs review15 https://commitfest.postgresql.org/37/2902/ SQL/JSON: JSON_TABLENeeds review15 https://commitfest.postgresql.org/37/3099/ Asymmetric partition-wise JOIN Needs review15 https://commitfest.postgresql.org/37/3293/ Tags in errordata Needs review15 https://commitfest.postgresql.org/37/3358/ Update relfrozenxmin when tru
Re: Fix a typo in pg_receivewal.c's get_destination_dir()
Hi, On 2022-02-26 22:04:13 +0530, Bharath Rupireddy wrote: > It looks like there's a typo in pg_receivewal.c's > get_destination_dir(), that is, use the function parameter dest_folder > instead of global variable basedir in the error message. Although > there's no harm in it as basedir is a global variable in > pg_receivewal.c, let's use the function parameter to be in sync with > close_destination_dir. > > Attaching a tiny patch to fix it. Good catch, pushed. Greetings, Andres Freund
Re: Adding CI to our tree
Hi, > Subject: [PATCH 2/7] cirrus: upload changed html docs as artifacts I still think the determination of the base branch needs to be resolved before this can be considered. > Always run doc build; to allow them to be shown in cfbot, they should not be > skipped if the linux build fails. > > This could be done on the client side (cfbot). One advantage of doing it here > is that fewer docs are uploaded - many patches won't upload docs at all. Imo this stuff is largely independent from the commit subject > XXX: if this is run in the same task, the configure flags should probably be > consistent ? What do you mean? > Subject: [PATCH 3/7] s!build docs as a separate task.. Could you reorder this to earlier, then we can merge it before resolving the branch issues. And we don't waffle on the CompilerWarnings dependency. > I believe this'll automatically show up as a separate "column" on the cfbot > page. Yup. > +# Verify docs can be built, and upload changed docs as artifacts > +task: > + name: HTML docs > + > + env: > +CPUS: 1 > + > + only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || > $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*(docs|html).*' > + > + container: > +image: $CONTAINER_REPO/linux_debian_bullseye_ci:latest > +cpu: $CPUS > + how about using something like (the syntax might be slightly off) skip: !changesInclude('doc/**') to avoid running it for the many pushes where no docs are changed? > + sysinfo_script: | > +id > +uname -a > +cat /proc/cmdline > +ulimit -a -H && ulimit -a -S > +export > + > +git remote -v > +git branch -a > +git remote add postgres https://github.com/postgres/postgres > +time git fetch -v postgres master > +git log -1 postgres/master > +git diff --name-only postgres/master.. Hardly "sysinfo"? > Subject: [PATCH 4/7] wip: cirrus: code coverage > > XXX: lcov should be installed in the OS image FWIW, you can open a PR in https://github.com/anarazel/pg-vm-images/ both the debian docker and VM have their packages installed via scripts/linux_debian_install_deps.sh > From 226699150e3e224198fc297689add21bece51c4b Mon Sep 17 00:00:00 2001 > From: Justin Pryzby > Date: Sun, 9 Jan 2022 18:25:02 -0600 > Subject: [PATCH 5/7] cirrus/vcregress: test modules/contrib with > NO_INSTALLCHECK=1 I don't want to commit the vcregress.pl part myself. But if you split off I'm happy to push the --temp-config bits. > From 08933bcd93d4f57ad73ab6df2f1627b93e61b459 Mon Sep 17 00:00:00 2001 > From: Justin Pryzby > Date: Sun, 16 Jan 2022 12:51:13 -0600 > Subject: [PATCH 6/7] wip: cirrus/windows: save tmp_install as an artifact.. > > ..to allow users to easily download compiled binaries to try a patch. > If they don't have a development environment handy or not familiar with > compilation. > > XXX: maybe this should be conditional or commented out ? Yea, I don't want to do this by default, that's a fair bit of data that very likely nobody will ever access. One can make entire tasks triggered manually, but that'd then require building again :/. > From a7d2bba6f51d816412fb645b0d4821c36ee5c400 Mon Sep 17 00:00:00 2001 > From: Justin Pryzby > Date: Sun, 20 Feb 2022 15:01:59 -0600 > Subject: [PATCH 7/7] wip: cirrus/windows: add compiler_warnings_script > > I'm not sure how to write this test in windows shell; it's also not easy to > write it in posix sh, since windows shell is somehow interpretting && and > ||... You could put the script in src/tools/ci and call it from the script to avoid the quoting issues. Would be good to add a comment explaining the fileLoggerParameters1 thing and a warning that compiler_warnings_script should be the last script. Greetings, Andres Freund
Re: Adding CI to our tree
On 2022-02-26 17:09:08 -0800, Andres Freund wrote: > You could put the script in src/tools/ci and call it from the script to avoid > the quoting issues. Might also be a good idea for the bulk of the docs / coverage stuff, even if there are no quoting issues.
Re: range_agg with multirange inputs
The following review has been posted through the commitfest application: make installcheck-world: tested, passed Implements feature: tested, passed Spec compliant: not tested Documentation:not tested This applies (with some fuzz) and passes installcheck-world, but a rebase is needed, because 3 lines of context aren't enough to get the doc changes in the right place in the aggregate function table. (I think generating the patch with 4 lines of context would be enough to keep that from being a recurring issue.) One thing that seems a bit funny is this message in the new multirange_agg_transfn: + if (!type_is_multirange(mltrngtypoid)) + ereport(ERROR, + (errcode(ERRCODE_DATATYPE_MISMATCH), +errmsg("range_agg must be called with a multirange"))); It's clearly copied from the corresponding test and message in range_agg_transfn. They both say "range_agg must be called ...", which makes perfect sense, as from the user's perspective both messages come from (different overloads of) a function named range_agg. Still, it could be odd to have (again from the user's perspective) a function named range_agg that sometimes says "range_agg must be called with a range" and other times says "range_agg must be called with a multirange". I'm not sure how to tweak the wording (of either message or both) to make that less weird, but there's probably a way. I kind of wonder whether either message is really reachable, at least through the aggregate machinery in the expected way. Won't that machinery ensure that it is calling the right transfn with the right type of argument? If that's the case, maybe the messages could only be seen by someone calling the transfn directly ... which also seems ruled out for these transfns because the state type is internal. Is this whole test more of the nature of an assertion? Regards, -Chap The new status of this patch is: Waiting on Author
Re: convert libpq uri-regress tests to tap test
Hi, On 2022-02-25 17:52:29 -0800, Andres Freund wrote: > I'd like to commit 0001 and 0002 soon, unless somebody sees a reason not to? Pushed. Attached is the remainder, 0003, the move of libpq_pipeline to src/interfaces/libpq that I'm not planning to push for now. Regards, Andres >From 61a89721c8aab3a1fd2300067c470807b4fe87bc Mon Sep 17 00:00:00 2001 From: Andres Freund Date: Thu, 24 Feb 2022 08:27:41 -0800 Subject: [PATCH v4] Move libpq_pipeline test into src/interfaces/libpq. --- .../libpq/t/002_pipeline.pl} | 2 +- src/interfaces/libpq/test/.gitignore | 1 + src/interfaces/libpq/test/Makefile| 2 +- .../libpq/test}/libpq_pipeline.c | 0 .../test}/traces/disallowed_in_pipeline.trace | 0 .../libpq/test}/traces/multi_pipelines.trace | 0 .../libpq/test}/traces/nosync.trace | 0 .../libpq/test}/traces/pipeline_abort.trace | 0 .../libpq/test}/traces/prepared.trace | 0 .../libpq/test}/traces/simple_pipeline.trace | 0 .../libpq/test}/traces/singlerow.trace| 0 .../libpq/test}/traces/transaction.trace | 0 src/test/modules/Makefile | 1 - src/test/modules/libpq_pipeline/.gitignore| 5 src/test/modules/libpq_pipeline/Makefile | 25 --- src/test/modules/libpq_pipeline/README| 1 - 16 files changed, 3 insertions(+), 34 deletions(-) rename src/{test/modules/libpq_pipeline/t/001_libpq_pipeline.pl => interfaces/libpq/t/002_pipeline.pl} (96%) rename src/{test/modules/libpq_pipeline => interfaces/libpq/test}/libpq_pipeline.c (100%) rename src/{test/modules/libpq_pipeline => interfaces/libpq/test}/traces/disallowed_in_pipeline.trace (100%) rename src/{test/modules/libpq_pipeline => interfaces/libpq/test}/traces/multi_pipelines.trace (100%) rename src/{test/modules/libpq_pipeline => interfaces/libpq/test}/traces/nosync.trace (100%) rename src/{test/modules/libpq_pipeline => interfaces/libpq/test}/traces/pipeline_abort.trace (100%) rename src/{test/modules/libpq_pipeline => interfaces/libpq/test}/traces/prepared.trace (100%) rename src/{test/modules/libpq_pipeline => interfaces/libpq/test}/traces/simple_pipeline.trace (100%) rename src/{test/modules/libpq_pipeline => interfaces/libpq/test}/traces/singlerow.trace (100%) rename src/{test/modules/libpq_pipeline => interfaces/libpq/test}/traces/transaction.trace (100%) delete mode 100644 src/test/modules/libpq_pipeline/.gitignore delete mode 100644 src/test/modules/libpq_pipeline/Makefile delete mode 100644 src/test/modules/libpq_pipeline/README diff --git a/src/test/modules/libpq_pipeline/t/001_libpq_pipeline.pl b/src/interfaces/libpq/t/002_pipeline.pl similarity index 96% rename from src/test/modules/libpq_pipeline/t/001_libpq_pipeline.pl rename to src/interfaces/libpq/t/002_pipeline.pl index 0c164dcaba5..2e288d8ba7c 100644 --- a/src/test/modules/libpq_pipeline/t/001_libpq_pipeline.pl +++ b/src/interfaces/libpq/t/002_pipeline.pl @@ -49,7 +49,7 @@ for my $testname (@tests) my $expected; my $result; - $expected = slurp_file_eval("traces/$testname.trace"); + $expected = slurp_file_eval("test/traces/$testname.trace"); next unless $expected ne ""; $result = slurp_file_eval($traceout); next unless $result ne ""; diff --git a/src/interfaces/libpq/test/.gitignore b/src/interfaces/libpq/test/.gitignore index 5e803d8816a..e24d7f64dc3 100644 --- a/src/interfaces/libpq/test/.gitignore +++ b/src/interfaces/libpq/test/.gitignore @@ -1 +1,2 @@ /uri-regress +/libpq_pipeline diff --git a/src/interfaces/libpq/test/Makefile b/src/interfaces/libpq/test/Makefile index 54212159065..9f99309653f 100644 --- a/src/interfaces/libpq/test/Makefile +++ b/src/interfaces/libpq/test/Makefile @@ -11,7 +11,7 @@ endif override CPPFLAGS := -I$(libpq_srcdir) $(CPPFLAGS) LDFLAGS_INTERNAL += $(libpq_pgport) -PROGS = uri-regress +PROGS = uri-regress libpq_pipeline all: $(PROGS) diff --git a/src/test/modules/libpq_pipeline/libpq_pipeline.c b/src/interfaces/libpq/test/libpq_pipeline.c similarity index 100% rename from src/test/modules/libpq_pipeline/libpq_pipeline.c rename to src/interfaces/libpq/test/libpq_pipeline.c diff --git a/src/test/modules/libpq_pipeline/traces/disallowed_in_pipeline.trace b/src/interfaces/libpq/test/traces/disallowed_in_pipeline.trace similarity index 100% rename from src/test/modules/libpq_pipeline/traces/disallowed_in_pipeline.trace rename to src/interfaces/libpq/test/traces/disallowed_in_pipeline.trace diff --git a/src/test/modules/libpq_pipeline/traces/multi_pipelines.trace b/src/interfaces/libpq/test/traces/multi_pipelines.trace similarity index 100% rename from src/test/modules/libpq_pipeline/traces/multi_pipelines.trace rename to src/interfaces/libpq/test/traces/multi_pipelines.trace diff --git a/src/test/modules/libpq_pipeline/traces/nosync.trace b/src/interfaces/libpq/test/traces/nosync.trace similarity index 100% rename from src/test/modules/libpq_pipeline/traces/nosync.
Re: Adding CI to our tree
On Sat, Feb 26, 2022 at 05:09:08PM -0800, Andres Freund wrote: > > XXX: if this is run in the same task, the configure flags should probably be > > consistent ? > > What do you mean? I mean that commit to run CompilerWarnings unconditionally built docs with different flags than the other stuff in that task. If it's going to be a separate task, then that doesn't matter. > > +# Verify docs can be built, and upload changed docs as artifacts > > +task: > > + name: HTML docs > > + > > + env: > > +CPUS: 1 > > + > > + only_if: $CIRRUS_CHANGE_MESSAGE !=~ '.*\nci-os-only:.*' || > > $CIRRUS_CHANGE_MESSAGE =~ '.*\nci-os-only:[^\n]*(docs|html).*' > > + > > + container: > > +image: $CONTAINER_REPO/linux_debian_bullseye_ci:latest > > +cpu: $CPUS > > + > > how about using something like (the syntax might be slightly off) > skip: !changesInclude('doc/**') > to avoid running it for the many pushes where no docs are changed? This doesn't do the right thing - I just tried. https://cirrus-ci.org/guide/writing-tasks/#environment-variables | changesInclude function can be very useful for skipping some tasks when no changes to sources have been made since the last successful Cirrus CI build. That means it will not normally rebuild docs (and then this still requires resolving the "base branch"). -- Justin
Re: Adding CI to our tree
Hi, On 2022-02-26 20:43:52 -0600, Justin Pryzby wrote: > This doesn't do the right thing - I just tried. > https://cirrus-ci.org/guide/writing-tasks/#environment-variables > | changesInclude function can be very useful for skipping some tasks when no > changes to sources have been made since the last successful Cirrus CI build. > That means it will not normally rebuild docs (and then this still requires > resolving the "base branch"). Why would we want to rebuild docs if they're the same as in the last build for the same branch? For cfbot purposes each commit is independent from the prior commit, so it should rebuild it every time if the CF entry has changes to the docs. Greetings, Andres Freund
Re: Adding CI to our tree
On Sat, Feb 26, 2022 at 06:50:00PM -0800, Andres Freund wrote: > Hi, > > On 2022-02-26 20:43:52 -0600, Justin Pryzby wrote: > > This doesn't do the right thing - I just tried. > > https://cirrus-ci.org/guide/writing-tasks/#environment-variables > > | changesInclude function can be very useful for skipping some tasks when > > no changes to sources have been made since the last successful Cirrus CI > > build. > > > That means it will not normally rebuild docs (and then this still requires > > resolving the "base branch"). > > Why would we want to rebuild docs if they're the same as in the last build for > the same branch? For cfbot purposes each commit is independent from the prior > commit, so it should rebuild it every time if the CF entry has changes to the > docs. I did git commit --amend --no-edit and repushed to github to trigger a new CI run, and it did this: https://github.com/justinpryzby/postgres/runs/5347878714 This is in a branch with changes to doc. I wasn't intending it to skip building docs on this branch just because the same, changed docs were previously built. Why wouldn't the docs be built following the same logic as the rest of the sources ? If someone renames or removes an xref target, shouldn't CI fail on its next run for a patch which tries to reference it ? It would fail on the buildfarm, and I think one major use for the CI is to minimize the post-push cleanup cycles. Are you sure about cfbot ? AIUI cirrus would see that docs didn't change relative to the previous run for branch: commitfest/NN/. -- Justin
Re: [Proposal] Global temporary tables
Hi You redirect stats from pg_class and pg_statistics to a local hash table. > This is pretty hairy :( > I guess you'd also need to handle pg_statistic_ext and ext_data. > pg_stats doesn't work, since the data isn't in pg_statistic - it'd need to > look > at pg_get_gtt_statistics. > Without this, the GTT will be terribly slow like current temporary tables with a lot of problems with bloating of pg_class, pg_attribute and pg_depend tables. Regards Pavel
Re: Adding CI to our tree
Hi, On 2022-02-26 21:10:57 -0600, Justin Pryzby wrote: > I did git commit --amend --no-edit and repushed to github to trigger a new CI > run, and it did this: https://github.com/justinpryzby/postgres/runs/5347878714 > > This is in a branch with changes to doc. I wasn't intending it to skip > building docs on this branch just because the same, changed docs were > previously built. But why not? If nothing in docs/ changes, there's little point? It'd probably be good to check .cirrus.yml and docs/**, to make sure that .cirrus logic changes rerun as well. > Why wouldn't the docs be built following the same logic as the rest of the > sources? Tests have a certain rate of spurious failure, so rerunning them makes sense. But what do we gain by rebuilding the docs? And especially, what do we gain about uploading the docs e.g. in the postgres/postgres repo? > If someone renames or removes an xref target, shouldn't CI fail on its next > run for a patch which tries to reference it ? Why wouldn't it? > It would fail on the buildfarm, and I think one major use for the CI is to > minimize the post-push cleanup cycles. I personally see it more as access to a "mini buildfarm" before patches are committable, but that's of course compatible. > Are you sure about cfbot ? AIUI cirrus would see that docs didn't change > relative to the previous run for branch: commitfest/NN/. Not entirely sure, but it's what I remember observing when ammending commits in a repo using changesInclues. If I were to build a feature like it I'd look at the list of files of git diff $(git merge-base last_green new_commit)..new_commit or such. cfbot doesn't commit incrementally but replaces the prior commit, so I suspect it'll always be viewn as new. But who knows, shouldn't be hard to figure out. Greetings, Andres Freund
Re: [Proposal] Global temporary tables
Hi, On 2022-02-27 04:17:52 +0100, Pavel Stehule wrote: > > You redirect stats from pg_class and pg_statistics to a local hash table. > > This is pretty hairy :( As is I think the patch is architecturally completely unacceptable. Having code everywhere to redirect to manually written in-memory catalog table code isn't maintainable. > > I guess you'd also need to handle pg_statistic_ext and ext_data. > > pg_stats doesn't work, since the data isn't in pg_statistic - it'd need to > > look > > at pg_get_gtt_statistics. > > Without this, the GTT will be terribly slow like current temporary tables > with a lot of problems with bloating of pg_class, pg_attribute and > pg_depend tables. I think it's not a great idea to solve multiple complicated problems at once... Greetings, Andres Freund
Re: [Proposal] Global temporary tables
ne 27. 2. 2022 v 5:13 odesílatel Andres Freund napsal: > Hi, > > On 2022-02-27 04:17:52 +0100, Pavel Stehule wrote: > > > You redirect stats from pg_class and pg_statistics to a local hash > table. > > > This is pretty hairy :( > > As is I think the patch is architecturally completely unacceptable. Having > code everywhere to redirect to manually written in-memory catalog table > code > isn't maintainable. > > > > > I guess you'd also need to handle pg_statistic_ext and ext_data. > > > pg_stats doesn't work, since the data isn't in pg_statistic - it'd > need to > > > look > > > at pg_get_gtt_statistics. > > > > Without this, the GTT will be terribly slow like current temporary tables > > with a lot of problems with bloating of pg_class, pg_attribute and > > pg_depend tables. > > I think it's not a great idea to solve multiple complicated problems at > once... > I thought about this issue for a very long time, and I didn't find any better (without more significant rewriting of pg storage). In a lot of projects, that I know, the temporary tables are strictly prohibited due possible devastating impact to system catalog bloat. It is a serious problem. So any implementation of GTT should solve the questions: a) how to reduce catalog bloating, b) how to allow session related statistics for GTT. I agree so implementation of GTT like template based LTT (local temporary tables) can be very simple (it is possible by extension), but with the same unhappy performance impacts. I don't say so current design should be accepted without any discussions and without changes. Maybe GTT based on LTT can be better than nothing (what we have now), and can be good enough for a lot of projects where the load is not too high (and almost all projects have low load). Unfortunately,it can be a trap for a lot of projects in future, so there should be discussion and proposed solutions for fix of related issues. The performance of GTT should be fixable, so any discussion about this topic should have part about protections against catalog bloat and about cost related to frequent catalog updates. But anyway, I invite (and probably not just me) any discussion on how to implement this feature, how to solve performance issues, and how to divide implementation into smaller steps. I am sure so fast GTT implementation can be used for fast implementation of LTT too, and maybe with all other temporary objects Regards Pavel > Greetings, > > Andres Freund >
Add WAL recovery messages with log_wal_traffic GUC (was: add recovery, backup, archive, streaming etc. activity messages to server logs along with ps display)
On Wed, Dec 29, 2021 at 6:50 AM Bharath Rupireddy wrote: > > On Thu, Dec 9, 2021 at 9:28 PM Alvaro Herrera wrote: > > > > Maybe some tunable like > > log_wal_traffic = { none, medium, high } > > where "none" is current behavior of no noise, "medium" gets (say) once > > every 256 segments (e.g., when going from XFF to (X+1)00), "high" gets > > you one message per segment. > > On Fri, Dec 24, 2021 at 7:19 PM Justin Pryzby wrote: > > > > If you're talking about a new feature that uses the infrastructre from 9ce3, > > but is controlled by a separate GUC like log_wal_traffic, that could be > > okay. > > Thanks for the suggestion. I've added a new GUC log_wal_traffic as > suggested above. PSA v7 patch. Here's the rebased v8 patch, please review it. https://commitfest.postgresql.org/37/3443/ Regards, Bharath Rupireddy. v8-0001-Add-WAL-recovery-messages-with-log_wal_traffic-GU.patch Description: Binary data