Re: race condition in pg_class
Hello Noah, 28.06.2024 08:13, Noah Misch wrote: Pushed. A recent buildfarm test failure [1] showed that the intra-grant-inplace-db.spec test added with 0844b3968 may fail on a slow machine (per my understanding): test intra-grant-inplace-db ... FAILED 4302 ms @@ -21,8 +21,7 @@ WHERE datname = current_catalog AND age(datfrozenxid) > (SELECT min(age(x)) FROM frozen_witness); -?column? --- -datfrozenxid retreated -(1 row) +?column? + +(0 rows) whilst the previous (successful) run shows much shorter duration: test intra-grant-inplace-db ... ok 540 ms I reproduced this failure on a VM slowed down so that the test duration reached 4+ seconds, with 100 test: intra-grant-inplace-db in isolation_schedule: test intra-grant-inplace-db ... ok 4324 ms test intra-grant-inplace-db ... FAILED 4633 ms test intra-grant-inplace-db ... ok 4649 ms But as the test going to be modified by the inplace110-successors-v8.patch and the modified test (with all three latest patches applied) passes reliably in the same conditions, maybe this failure doesn't deserve a deeper exploration. What do you think? [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=habu&dt=2024-07-18%2003%3A08%3A08 Best regards, Alexander
Re: Use read streams in CREATE DATABASE command when the strategy is wal_log
On Thu, Jul 18, 2024 at 02:11:13PM +0300, Nazir Bilal Yavuz wrote: > v4 is attached. Removal of the PinBufferForBlock() comment about the "persistence = RELPERSISTENCE_PERMANENT" fallback started to feel like a loss. I looked for a way to re-add a comment about the fallback. https://coverage.postgresql.org/src/backend/storage/buffer/bufmgr.c.gcov.html shows no test coverage of that fallback, and I think the fallback is unreachable. Hence, I've removed the fallback in a separate commit. I've pushed that and your three patches. Thanks.
Re: race condition in pg_class
On Sat, Jul 20, 2024 at 11:00:00AM +0300, Alexander Lakhin wrote: > 28.06.2024 08:13, Noah Misch wrote: > > Pushed. > > A recent buildfarm test failure [1] showed that the > intra-grant-inplace-db.spec test added with 0844b3968 may fail > on a slow machine > But as the test going to be modified by the inplace110-successors-v8.patch > and the modified test (with all three latest patches applied) passes > reliably in the same conditions, maybe this failure doesn't deserve a > deeper exploration. Agreed. Let's just wait for code review of the actual bug fix, not develop a separate change to stabilize the test. One flake in three weeks is low enough to make that okay. > [1] > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=habu&dt=2024-07-18%2003%3A08%3A08
Re: UUID v7
> On 8 May 2024, at 18:37, Andrey M. Borodin wrote: > > It's RFC now. PFA version with references to RFC instead of drafts. In nearby thread [0] we found out that most systems have enough presicion to fill additional 12 bits of sub-millisecond information. So I switched implementation to this method. We have a portable gettimeofday(), but unfortunately it gives only 10 bits of sub-millisecond information. So I created portable get_real_time_ns() for this purpose: it reads clock_gettime() on non-Windows platforms and GetSystemTimePreciseAsFileTime() on Windows. Thanks! Best regards, Andrey Borodin. [0] https://www.postgresql.org/message-id/flat/be0339cc-1ae1-4892-9445-8e6d8995a44d%40eisentraut.org v25-0001-Implement-UUID-v7.patch Description: Binary data
Re: Use read streams in CREATE DATABASE command when the strategy is wal_log
Hi, On Sat, 20 Jul 2024 at 14:27, Noah Misch wrote: > > On Thu, Jul 18, 2024 at 02:11:13PM +0300, Nazir Bilal Yavuz wrote: > > v4 is attached. > > Removal of the PinBufferForBlock() comment about the "persistence = > RELPERSISTENCE_PERMANENT" fallback started to feel like a loss. I looked for > a way to re-add a comment about the fallback. > https://coverage.postgresql.org/src/backend/storage/buffer/bufmgr.c.gcov.html > shows no test coverage of that fallback, and I think the fallback is > unreachable. Hence, I've removed the fallback in a separate commit. I've > pushed that and your three patches. Thanks. Thanks for the separate commit and push! With the separate commit (e00c45f685), does it make sense to rename the smgr_persistence parameter of the ReadBuffer_common() to persistence? Because, ExtendBufferedRelTo() calls ReadBuffer_common() with rel's persistence now, not with smgr's persistence. -- Regards, Nazir Bilal Yavuz Microsoft
Re: Report search_path value back to the client.
On Fri, 19 Jul 2024 at 21:48, Tomas Vondra wrote: > > > > On 7/19/24 17:16, Jelte Fennema-Nio wrote: > > That's been moving forward, even relatively fast imho for the > > size/impact of that patchset. But those changes are much less > > straight-forward than this patch. And while I hope that they can get > > committed for PG18 this year, I'm not confident about that. > > I don't know. My experience is that whenever we decided to not do a > simple improvement because a more complex patch was expected to do > something better/smarter, we often ended up getting nothing. > > So maybe it'd be best to get this committed, and then if the other patch > manages to get into PG18, we can revert this change (or rather that > patch would do that). Yeah, I think we're aligned on this. Because that's totally what I meant to say, but I don't seem to have written down that conclusion explicitly. > Maybe it's crazy-talk, but couldn't we make the GUC settable exactly > once? That should be doable using the GUC hooks, I think. That means > pgbouncer would set the GUC right after opening the connection, and then > the following attempts to set it would either error out, or silently do > nothing? > > Perhaps it could even allow enabling reporting for more GUCs, but once a > GUC is on the list, it's reported forever. This could maybe work, I'll think a bit about it. The main downside I see is that PgBouncer can then not act as a transparent proxy, because it cannot reset value to the value that the client expects the GUC to be. But in practice that doesn't matter much for this specific case because all that happens in this case is that the client gets a few more ParameterStatus messages that it did not want.
Re: meson vs windows perl
On 2024-05-28 Tu 6:13 PM, Andres Freund wrote: Hi, On 2024-04-05 16:12:12 -0400, Andrew Dunstan wrote: OK, this has been fixed and checked. The attached is what I propose. The perl command is pretty hard to read. What about using python's shlex module instead? Rough draft attached. Still not very pretty, but seems easier to read? It'd be even better if we could just get perl to print out the flags in an easier to parse way, but I couldn't immediately see a way. Thanks for looking. The attached should be easier to read. The perl package similar to shlex is Text::ParseWords, which is already a requirement. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com diff --git a/meson.build b/meson.build index 5387bb6d5f..e244cbac92 100644 --- a/meson.build +++ b/meson.build @@ -993,20 +993,17 @@ if not perlopt.disabled() # Config's ccdlflags and ldflags. (Those are the choices of those who # built the Perl installation, which are not necessarily appropriate # for building PostgreSQL.) -ldopts = run_command(perl, '-MExtUtils::Embed', '-e', 'ldopts', check: true).stdout().strip() -undesired = run_command(perl_conf_cmd, 'ccdlflags', check: true).stdout().split() -undesired += run_command(perl_conf_cmd, 'ldflags', check: true).stdout().split() +perl_ldopts = run_command(perl, '-e', ''' +use ExtUtils::Embed; +use Text::ParseWords; +# tell perl to suppress including these in ldopts +*ExtUtils::Embed::_ldflags =*ExtUtils::Embed::_ccdlflags = sub { return ""; }; +# adding an argument to ldopts makes it return a value instead of printing +# print one of these per line so splitting will preserve spaces in file names. +print "$_\n" foreach shellwords(ldopts(undef)); +''', + check: true).stdout().strip().split('\n') -perl_ldopts = [] -foreach ldopt : ldopts.split(' ') - if ldopt == '' or ldopt in undesired -continue - endif - - perl_ldopts += ldopt.strip('"') -endforeach - -message('LDFLAGS recommended by perl: "@0@"'.format(ldopts)) message('LDFLAGS for embedding perl: "@0@"'.format(' '.join(perl_ldopts))) perl_dep_int = declare_dependency(
Re: Logical Replication of sequences
On Wed, 10 Jul 2024 at 13:46, Peter Smith wrote: > > Here are a few comments for patch v20240705-0003. > > (This is a WIP. I have only looked at the docs so far.) > > == > doc/src/sgml/config.sgml > > nitpick - max_logical_replication_workers: /and sequence > synchornization worker/and a sequence synchornization worker/ > > == > doc/src/sgml/logical-replication.sgml > > nitpick - max_logical_replication_workers: re-order list of workers to > be consistent with other docs 1-apply,2-parallel,3-tablesync,4-seqsync > > == > doc/src/sgml/ref/alter_subscription.sgml > > 1. > IIUC the existing "REFRESH PUBLICATION" command will fetch and sync > all new sequences, etc., and/or remove old ones no longer in the > publication. But current docs do not say anything at all about > sequences here. It should say something about sequence behaviour. > > ~~~ > > 2. > For the existing "REFRESH PUBLICATION" there is a sub-option > "copy_data=true/false". Won't this need some explanation about how it > behaves for sequences? Or will there be another option > "copy_sequences=true/false". > > ~~~ > > 3. > IIUC the main difference between REFRESH PUBLICATION and REFRESH > PUBLICATION SEQUENCES is that the 2nd command will try synchronize > with all the *existing* sequences to bring them to the same point as > on the publisher, but otherwise, they are the same command. If that is > correct understanding I don't think that distinction is made very > clear in the current docs. > > ~~~ > > nitpick - the synopsis is misplaced. It should not be between ENABLE > and DISABLE. I moved it. Also, it should say "REFRESH PUBLICATION > SEQUENCES" because that is how the new syntax is defined in gram.y > > nitpick - REFRESH SEQUENCES. Renamed to "REFRESH PUBLICATION > SEQUENCES". And, shouldn't "from the publisher" say "with the > publisher"? > > nitpick - changed the varlistentry "id". > > == > 99. > Please also see the attached diffs patch which implements any nitpicks > mentioned above. All these comments are handled in the v20240720 version patch attached at [1]. [1] - https://www.postgresql.org/message-id/CALDaNm2vuO7Ya4QVTZKR9jY_mkFFcE_hKUJiXx4KUknPgGFjSg%40mail.gmail.com Regards, Vignesh
Re: Logical Replication of sequences
On Fri, 12 Jul 2024 at 08:22, Peter Smith wrote: > > Hi Vignesh. Here are the rest of my comments for patch v20240705-0003. > == > src/backend/catalog/pg_subscription.c > > 1. GetSubscriptionSequences > > +/* > + * Get the sequences for the subscription. > + * > + * The returned list is palloc'ed in the current memory context. > + */ > > Is that comment right? The palloc seems to be done in > CacheMemoryContext, not in the current context. This function is removed and GetSubscriptionRelations is being used instead. > ~ > > 2. > The code is very similar to the other function > GetSubscriptionRelations(). In fact I did not understand how the 2 > functions know what they are returning: > > E.g. how does GetSubscriptionRelations not return sequences too? > E.g. how does GetSubscriptionSequences not return relations too? GetSubscriptionRelations can be used, so removed the GetSubscriptionSequences function. > > 3. AlterSubscription_refresh > > - logicalrep_worker_stop(sub->oid, relid); > + /* Stop the worker if relation kind is not sequence*/ > + if (relkind != RELKIND_SEQUENCE) > + logicalrep_worker_stop(sub->oid, relid); > > Can you give more reasons in the comment why skip the stop for sequence > worker? > > ~ > > nitpick - period and space in the comment > > ~~~ > > 8. logicalrep_sequence_sync_worker_find > > +/* > + * Walks the workers array and searches for one that matches given > + * subscription id. > + * > + * We are only interested in the sequence sync worker. > + */ > +LogicalRepWorker * > +logicalrep_sequence_sync_worker_find(Oid subid, bool only_running) > > There are other similar functions for walking the workers array to > search for a worker. Instead of having different functions for > different cases, wouldn't it be cleaner to combine these into a single > function, where you pass a parameter (e.g. a mask of worker types that > you are interested in finding)? I will address this in a future version once the patch has become more stable. > ~~~ > > 11. > - Assert(is_tablesync_worker == OidIsValid(relid)); > + Assert(is_tablesync_worker == OidIsValid(relid) || > is_sequencesync_worker == OidIsValid(relid)); > > IIUC there is only a single sequence sync worker for handling all the > sequences. So, what does the 'relid' actually mean here when there are > multiple sequences? Sequence sync workers will not have relid, modified the assert. > ~~~ > > 12. logicalrep_seqsyncworker_failuretime > 12b. > Curious if this had to be a separate exit handler or if may this could > have been handled by the existing logicalrep_worker_onexit handler. > See also other review comments int this post about this area -- > perhaps all this can be removed? This function cannot be combined with logicalrep_worker_onexit as this function should be called only in failure case and this exit handler should be removed in case of success case. This cannot be removed because of the following reason: Consider the following situation: a sequence sync worker starts and then encounters a failure while syncing sequences. At the same time, a user initiates a "refresh publication sequences" operation. Given only the start time, it's not possible to distinguish whether the sequence sync worker failed or completed successfully. This is because the "refresh publication sequences" operation would have re-added the sequences, making it unclear whether the sync worker's failure or success occurred. > > 14. > The reason for the addition logic "(last_value + log_cnt)" is not > obvious. I am guessing it might be related to code from > 'nextval_internal' (fetch = log = fetch + SEQ_LOG_VALS;) but it is > complicated. It is unfortunate that the field 'log_cnt' seems hardly > commented anywhere at all. > > Also, I am not 100% sure if I trust the logic in the first place. The > caller of this function is doing: > sequence_value = fetch_sequence_data(conn, remoteid, &lsn); > /* sets the sequence with sequence_value */ > SetSequenceLastValue(RelationGetRelid(rel), sequence_value); > > Won't that mean you can get to a situation where subscriber-side > result of lastval('s') can be *ahead* from lastval('s') on the > publisher? That doesn't seem good. Added comments for "last_value + log_cnt" Yes it can be ahead in subscribers. This will happen because every change of the sequence is not wal logged. It is WAL logged once in SEQ_LOG_VALS. This was discussed earlier and the sequence value being ahead was ok. https://www.postgresql.org/message-id/CA%2BTgmoaVLiKDD5vr1bzL-rxhMA37KCS_2xrqjbKVwGyqK%2BPCXQ%40mail.gmail.com > 15. > + /* > + * Logical replication of sequences is based on decoding WAL records, > + * describing the "next" state of the sequence the current state in the > + * relfilenode is yet to reach. But during the initial sync we read the > + * current state, so we need to reconstruct the WAL record logged when we > + * started the current batch of sequence values. > + * > + * Otherwise we might get duplicate values (on subscrib
Re: Lock-free compaction. Why not?
Wow I was busy for a controle of days and now I’m again fully committed to this initiative. These ideas are extremely useful to my. I’ll first start by reading the old in-place implementation, but meanwhile I have the following questions: 1- I’m thinking of adding only one simple step to be auto-vacuum. This means that there will neither be excessive locking nor resource utilization. I guess my question is: does that simple step make the current lazy auto-vacuum much worse? 2- Can you point me to a resource explaining why this might lead to index bloating? Em qui., 18 de jul. de 2024 às 15:21, Tom Lane escreveu: > Robert Haas writes: > > On Thu, Jul 18, 2024 at 7:08 AM David Rowley > wrote: > >> I think the primary issue with the old way was index bloat wasn't > >> fixed. The release notes for 9.0 do claim the CLUSTER method "is > >> substantially faster in most cases", however, I imagine there are > >> plenty of cases where it wouldn't be. e.g, it's hard to imagine > >> rewriting the entire 1TB table and indexes is cheaper than moving 1 > >> row out of place row. > > > The other thing I remember besides index bloat is that it was > > crushingly slow. > > Yeah. The old way was great if there really were just a few tuples > needing to be moved ... but by the time you decide you need VACUUM > FULL rather than plain VACUUM, that's unlikely to be the case. > > regards, tom lane >
Re: Slow catchup of 2PC (twophase) transactions on replica in LR
On Thu, 18 Jul 2024 at 07:41, Hayato Kuroda (Fujitsu) wrote: > > Dear Peter, > > Thanks for giving comments! PSA new version. Couple of suggestions: 1) How will user know which all transactions should be rolled back since the prepared transaction name will be different in subscriber like pg_gid_16398_750, can we mention some info on how user can identify these prepared transactions that should be rolled back in the subscriber or if this information is already available can we point it from here: + When altering two_phase + from true to false, the backend + process reports and an error if any prepared transactions done by the + logical replication worker (from when two_phase + parameter was still true) are found. You can resolve + prepared transactions on the publisher node, or manually roll back them + on the subscriber, and then try again. 2) I'm not sure if InvalidRepOriginId is correct here, how about using OidIsValid in the below: +void +TwoPhaseTransactionGid(Oid subid, TransactionId xid, char *gid, int szgid) +{ + Assert(subid != InvalidRepOriginId); Regards, Vignesh
Re: Use read streams in CREATE DATABASE command when the strategy is wal_log
On Sat, Jul 20, 2024 at 03:01:31PM +0300, Nazir Bilal Yavuz wrote: > On Sat, 20 Jul 2024 at 14:27, Noah Misch wrote: > > > > On Thu, Jul 18, 2024 at 02:11:13PM +0300, Nazir Bilal Yavuz wrote: > > > v4 is attached. > > > > Removal of the PinBufferForBlock() comment about the "persistence = > > RELPERSISTENCE_PERMANENT" fallback started to feel like a loss. I looked > > for > > a way to re-add a comment about the fallback. > > https://coverage.postgresql.org/src/backend/storage/buffer/bufmgr.c.gcov.html > > shows no test coverage of that fallback, and I think the fallback is > > unreachable. Hence, I've removed the fallback in a separate commit. I've > > pushed that and your three patches. Thanks. > > Thanks for the separate commit and push! > > With the separate commit (e00c45f685), does it make sense to rename > the smgr_persistence parameter of the ReadBuffer_common() to > persistence? Because, ExtendBufferedRelTo() calls ReadBuffer_common() > with rel's persistence now, not with smgr's persistence. BMR_REL() doesn't set relpersistence, so bmr.relpersistence is associated with bmr.smgr and is unset if bmr.rel is set. That is to say, bmr.relpersistence is an smgr_persistence. It could make sense to change ReadBuffer_common() to take a BufferManagerRelation instead of the three distinct arguments. On a different naming topic, my review missed that field name copy_storage_using_buffer_read_stream_private.last_block doesn't fit how the field is used. Code uses it like an nblocks. So let's either rename the field or change the code to use it as a last_block (e.g. initialize it to nblocks-1, not nblocks).
Re: Why do we define HAVE_GSSAPI_EXT_H?
On 2024-07-08 15:56:59 -0700, Andres Freund wrote: > While looking at this I also found an argument omission present in the commit > adding meson support. I plan to fix that with the attached commit. Pushed that portion.
Re: Meson far from ready on Windows
Hi, On 2024-07-17 09:49:47 -0700, Andres Freund wrote: > On 2024-07-16 15:53:45 -0500, Tristan Partin wrote: > > Other than that, it looks pretty solid. > > Thanks for looking! I'm thinking of pushing the first few patches soon-ish. > > I'm debating between going for 17 + HEAD or also applying it to 16, to keep > the trees more similar. Pushed a number of these to 16 - HEAD. Greetings, Andres Freund
Re: Parallelize correlated subqueries that execute within each worker
Hi, I was going through the "needs review" patches with no recent messages, trying to figure out what is needed to move them forward, and this one caught my eye because I commented on it before. And it's also a bit sad example, because it started in 2021 and is moving at glacier speed :-( I read through the thread, to understand how the design changed over time, and I like the current approach (proposed by Robert) much more than the initial idea of adding new flag next to parallel_safe etc. And in general the patch looks reasonably simple and clean, but my knowledge of PARAM intricacies is pretty much nil, so I'm hesitant to claim the patch is correct. And I'm not sure what exactly needs to happen to validate the approach :-( The regression tests currently fail, due to a different plan for one of the new queries in select_parallel. I guess it's due to some committed patch, and it looks like a sensible change, but I haven't looked closely. Also, I do get this warning when building with GCC 12.2.0 on Debian: clauses.c: In function ‘max_parallel_hazard_walker’: clauses.c:961:49: warning: ‘save_safe_param_ids’ may be used uninitialized [-Wmaybe-uninitialized] 961 | context->safe_param_ids = save_safe_param_ids; | ^ clauses.c:943:29: note: ‘save_safe_param_ids’ was declared here 943 | List *save_safe_param_ids; | ^~~ It's harmless, the compiler simply does not realize the two blocks (where save_safe_param_ids is set and used) have exactly the same if conditions, but it's a bit confusing for people too. I was wondering if this could affect some queries in TPC-H, but the only query affected seems to be Q2 - where it helps, cutting the time in half, but Q2 is generally pretty fast / the expensive part was already parallelized quite well (i.e. the correlated subquery is fairly cheap). However, it's not difficult to construct a query where this helps a lot. If the correlated subquery does something expensive (e.g. aggregation of non-trivial amounts of data), this would help. So I wonder if e.g. TPC-DS would benefit from this more ... A couple review comments about the code: 1) new fields in max_parallel_hazard_context should have comments: + boolcheck_params; + Bitmapset **required_params; 2) Do we need both is_parallel_safe and is_parallel_safe_with_params? ISTM the main difference is the for() loop, so why not add an extra parameter to is_parallel_safe() and skip that loop if it's null? Or, if we worry about external code, keep is_parallel_safe_with_params() and define is_parallel_safe() as is_parallel_safe_with_params(...,NULL)? 3) Isn't it a bit weird that is_parallel_safe_with_params() actually sets check_params=false, which seems like it doesn't actually check parameters? I'm a bit confused / unsure if this is a bug or how it actually checks parameters. If it's correct, it may need a comment. 4) The only place setting check_params is max_parallel_hazard, which is called only for the whole Query from standard_planner(). But it does not set required_params - can't this be an issue if the pointer happens to be random garbage? 5) It probably needs a pgindent run, there's a bunch of rather long lines and those are likely to get wrapped and look weird. 6) Is the comment in max_parallel_hazard_walker() still accurate? It talks about PARAM_EXTERN and PARAM_EXEC, but the patch removes the PARAM_EXTERN check entirely. So maybe the comment needs updating? 7) I don't like the very complex if condition very much, it's hard to understand. I'd split that into two separate conditions, and add a short comment for each of them. I.e. instead of: if (param->paramkind != PARAM_EXEC || !(context->check_params || context->required_params != NULL)) return false; I'd do /* ... comment ... */ if (param->paramkind != PARAM_EXEC) return false; /* ... comment ... */ if (!(context->check_params || context->required_params != NULL)) return false; or something like that. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: psql: fix variable existence tab completion
On 19.07.2024 01:10, Tom Lane wrote: Actually, I think we ought to just reject this change. Debian 10 will be two years past EOL before PG 17 ships. So I don't see a reason to support it in the tests anymore. One of the points of such testing is to expose broken platforms, not mask them. Obviously, if anyone can point to a still-in-support platform with the same bug, that calculus might change. The bug when broken version of libedit want to backslash some symbols (e.g. double quotas, curly braces, the question mark) i only encountered on Debian 10 (buster). If anyone has encountered a similar error on some other system, please share such information. With respect to the other hacks Alexander mentions, maybe we could clean some of those out too? I don't recall what platform we had in mind there, but we've moved our goalposts on what we support pretty far in the last couple years. Agreed that no reason to save workarounds for non-supported systems. Here is the patch that removes fixes for Buster bug mentioned above. With the best regards, -- Anton A. Melnikov Postgres Professional: http://www.postgrespro.com The Russian Postgres CompanyFrom 971184c49cf2c5932bb682ee977a6fef9ba4eba5 Mon Sep 17 00:00:00 2001 From: "Anton A. Melnikov" Date: Sun, 21 Jul 2024 01:37:03 +0300 Subject: [PATCH] Remove workarounds for Debian 10 (Buster) with broken libedit from the 010_tab_completion.pl test. --- src/bin/psql/t/010_tab_completion.pl | 27 --- 1 file changed, 8 insertions(+), 19 deletions(-) diff --git a/src/bin/psql/t/010_tab_completion.pl b/src/bin/psql/t/010_tab_completion.pl index b45c39f0f5..678a0f7a2c 100644 --- a/src/bin/psql/t/010_tab_completion.pl +++ b/src/bin/psql/t/010_tab_completion.pl @@ -167,26 +167,18 @@ check_completion( qr/"mytab123" +"mytab246"/, "offer multiple quoted table choices"); -# note: broken versions of libedit want to backslash the closing quote; -# not much we can do about that -check_completion("2\t", qr/246\\?" /, +check_completion("2\t", qr/246" /, "finish completion of one of multiple quoted table choices"); -# note: broken versions of libedit may leave us in a state where psql -# thinks there's an unclosed double quote, so that we have to use -# clear_line not clear_query here -clear_line(); +clear_query(); # check handling of mixed-case names -# note: broken versions of libedit want to backslash the closing quote; -# not much we can do about that check_completion( "select * from \"mi\t", - qr/"mixedName\\?" /, + qr/"mixedName" /, "complete a mixed-case name"); -# as above, must use clear_line not clear_query here -clear_line(); +clear_query(); # check case folding check_completion("select * from TAB\t", qr/tab1 /, "automatically fold case"); @@ -198,8 +190,7 @@ clear_query(); # differently, so just check that the replacement comes out correctly check_completion("\\DRD\t", qr/drds /, "complete \\DRD to \\drds"); -# broken versions of libedit require clear_line not clear_query here -clear_line(); +clear_query(); # check completion of a schema-qualified name check_completion("select * from pub\t", @@ -261,18 +252,16 @@ check_completion( qr|tab_comp_dir/af\a?ile|, "filename completion with multiple possibilities"); -# broken versions of libedit require clear_line not clear_query here +# here we inside a string literal 'afile*' and wait for a sub-choice or second TAB. So use clear_line(). clear_line(); # COPY requires quoting -# note: broken versions of libedit want to backslash the closing quote; -# not much we can do about that check_completion( "COPY foo FROM tab_comp_dir/some\t", - qr|'tab_comp_dir/somefile\\?' |, + qr|'tab_comp_dir/somefile' |, "quoted filename completion with one possibility"); -clear_line(); +clear_query(); check_completion( "COPY foo FROM tab_comp_dir/af\t", -- 2.45.2
Re: Report search_path value back to the client.
On 7/20/24 14:09, Jelte Fennema-Nio wrote: > On Fri, 19 Jul 2024 at 21:48, Tomas Vondra > wrote: >> >> >> >> On 7/19/24 17:16, Jelte Fennema-Nio wrote: >>> That's been moving forward, even relatively fast imho for the >>> size/impact of that patchset. But those changes are much less >>> straight-forward than this patch. And while I hope that they can get >>> committed for PG18 this year, I'm not confident about that. >> >> I don't know. My experience is that whenever we decided to not do a >> simple improvement because a more complex patch was expected to do >> something better/smarter, we often ended up getting nothing. >> >> So maybe it'd be best to get this committed, and then if the other patch >> manages to get into PG18, we can revert this change (or rather that >> patch would do that). > > Yeah, I think we're aligned on this. Because that's totally what I > meant to say, but I don't seem to have written down that conclusion > explicitly. > >> Maybe it's crazy-talk, but couldn't we make the GUC settable exactly >> once? That should be doable using the GUC hooks, I think. That means >> pgbouncer would set the GUC right after opening the connection, and then >> the following attempts to set it would either error out, or silently do >> nothing? >> >> Perhaps it could even allow enabling reporting for more GUCs, but once a >> GUC is on the list, it's reported forever. > > This could maybe work, I'll think a bit about it. The main downside I > see is that PgBouncer can then not act as a transparent proxy, because > it cannot reset value to the value that the client expects the GUC to > be. But in practice that doesn't matter much for this specific case > because all that happens in this case is that the client gets a few > more ParameterStatus messages that it did not want. I understand that point of view, but I don't quite share it. I don't really see this as a GUC, even if it's implemented as one. It's just a convenient way to enable a protocol feature without having to modify the protocol ... It could easily be a enable_param_report() function, doing the same thing, for example. It's a bit of a tangent, but this reminds me how Oracle uses logon triggers to set "application context" for their variant of RLS. In postgres that was not possible because we did not have "logon" triggers, but in 17 that will change. I wonder it that could be handy, but maybe not - it seems unnecessarily complex. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: REINDEX not updating partition progress
19 июля 2024 г., в 04:17, Michael Paquier написал(а): On Fri, Jul 12, 2024 at 11:07:49PM +0100, Ilya Gladyshev wrote: While working on CIC for partitioned tables [1], I noticed that REINDEX for partitioned tables is not tracking keeping progress of partitioned tables, so I'm creating a separate thread for this fix as suggested. This limitation is not a bug, because we already document that partitions_total and partitions_done are 0 during a REINDEX. Compared to CREATE INDEX, a REINDEX is a bit wilder as it would work on all the indexes for all the partitions, providing this information makes sense. Agreed that this could be better, but what's now on HEAD is not wrong either. Thanks for pointing it out, I didn’t notice it. You’re right, this not a bug, but an improvement. Will remove the bits from the documentation as well. Still it does not do it because we know that the fields are going to be overwritten pretty quickly anyway, no? For now, we could just do without progress_index_partition_done(), I guess, keeping it to the incrementation of PROGRESS_CREATEIDX_PARTITIONS_DONE. There's an argument that this makes the code slightly easier to follow, with less wrappers around the progress reporting. The use-case that I thought this would improve is REINDEX CONCURRENT, where data from later stages of the previous partitions would linger for quite a while before it gets to the same stage of the current partition. I don’t think this is of big importance, so I’m ok with making code simpler and leaving it out. +intprogress_params[3] = { +PROGRESS_CREATEIDX_COMMAND, +PROGRESS_CREATEIDX_PHASE, +PROGRESS_CREATEIDX_PARTITIONS_TOTAL +}; +int64progress_values[3]; +OidheapId = relid; Rather than having new code blocks, let's use a style consistent with DefineIndex() where we have the pgstat_progress_update_multi_param(), with a single {} block. Fixed. Adding the counter increment at the end of the loop in ReindexMultipleInternal() is a good idea. It considers both the concurrent and non-concurrent cases. Another reason to do so is that reindex_relation calls itself recursively, so it'd require some additional mechanism so that it wouldn't increment multiple times per partition. + progress_values[2] = list_length(partitions); + pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX, heapId); Hmm. Is setting the relid only once for pg_stat_progress_create_index the best choice there is? Could it be better to report the partition OID instead? Technically, we could do this, but it looks like no other commands do it and currently there’s no API to do it without erasing the rest of progress information. Let's remember that indexes may have been attached with names inconsistent with the partitioned table's index. It is a bit confusing to stick to the relid all the partitioned table all the time, for all the indexes of all the partitions reindexed. I agree that it’s a bit confusing especially because documentation gives no hints about it. Judging by documentation, I would expect relid and index_relid to correspond to the same table. However, if we say that this field represents the oid of the relation on which the command was invoked, then I think it does make sense. Edited documentation to say that in the new patch. Could it be better to actually introduce an entirely new field to the progress table? What I would look for is more information: 1) the partitioned table OID on which the REINDEX runs 2) the partition table being processed 3) the index OID being processed (old and new for concurrent case). The patch sets 1) to the OID of the partitioned table, lacks 2) and sets 3) each time an index is rebuilt. — Michael I like this approach more, but I’m not sure whether adding another field for partition oid is worth it, since we already have index_relid that we can use to join with pg_index to get that. On the other hand, index_relid is missing during regular CREATE INDEX, so this new field could be useful to indicate which table is being indexed in this case. I'm on the fence about this, attached this as a separate patch, if you think it's a good idea. Thank you for the review! From 61e82b773e1c21dcb50a298bb205449ff17c90dc Mon Sep 17 00:00:00 2001 From: Ilya Gladyshev Date: Sun, 21 Jul 2024 01:40:15 +0100 Subject: [PATCH v2 2/2] partition_relid column for create index progress --- doc/src/sgml/monitoring.sgml | 9 + src/backend/catalog/system_views.sql | 3 ++- src/backend/commands/indexcmds.c | 18 +- src/include/commands/progress.h | 1 + src/test/regress/expected/rules.out | 3 ++- 5 files changed, 27 insertions(+), 7 deletions(-) diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml index 00bb423288..595f19a8f8 100644 --- a/doc/src/sgml/monitoring.sgml +++ b/doc/src/sgml/monitoring.sgml @@ -6006,6 +6006,15 @@ FROM pg_stat_get_backend_idset() AS backendid; 0 whe
Re: Add mention of execution time memory for enable_partitionwise_* GUCs
On Fri, 19 Jul 2024 at 17:24, Ashutosh Bapat wrote: > According to [1] these GUCs were added because of increased > memory consumption during planning and time taken to plan the query. > The execution time memory consumption issue was known even back then > but the GUC was not set to default because of that. But your patch > proposes to change the narrative. In the same thread [1], you will > find the discussion about turning the default to ON once we have fixed > planner's memory and time consumption. We have patches addressing > those issues [2] [3]. With your proposed changes we will almost never > have a chance to turn those GUCs ON by default. That seems a rather > sad prospect. Sad prospect for who? If the day comes and we're considering enabling these GUCs by default, I think we'll likely also be considering if it'll be sad for users who get an increased likelihood of OOM kills because the chosen plan uses $nparts times more memory to execute than the old plan. Can you honestly say that you have no concern about increased executor memory consumption if we switched on these GUCs by default? I was certainly concerned about it in [5], so I dropped that patch after realising what could happen. > I am fine if we want to mention that the executor may consume a large > amount of memory when these GUCs are turned ON. Users may decide to > turn those OFF if they can not afford to spend that much memory during > execution. But I don't like tying execution time consumption with > default OFF. If you were to fix the planner issues then this text would need to be revisited anyway. However, I seriously question your judgment on fixing those alone being enough to allow us to switch these on by default. It seems unlikely that the planner would use anything near any realistic work_mem setting extra memory per partition, but that's what the executor would do, given enough data per partition. I'd say any analysis that only found planner memory and time to be the only reason to turn these GUCs off by default was incomplete analysis. Maybe it was a case of stopping looking for all the reasons once enough had been found to make the choice. If not, then they found the molehill and failed to notice the mountain. David [4] https://postgr.es/m/3603c380-d094-136e-e333-610914fb3e80%40gmx.net [5] https://postgr.es/m/caaphdvojkdbr3mr59jxmacybyhb6q_5qpru+dy93en8wm+x...@mail.gmail.com
Re: Lock-free compaction. Why not?
On Sun, 21 Jul 2024 at 04:00, Ahmed Yarub Hani Al Nuaimi wrote: > 2- Can you point me to a resource explaining why this might lead to index > bloating? No resource links, but if you move a tuple to another page then you must also adjust the index. If you have no exclusive lock on the table, then you must assume older transactions still need the old tuple version, so you need to create another index entry rather than re-pointing the existing index entry's ctid to the new tuple version. It's not hard to imagine that would cause the index to become larger if you had to move some decent portion of the tuples to other pages. FWIW, I think it would be good if we had some easier way to compact tables without blocking concurrent users. My primary interest in TID Range Scans was to allow easier identification of tuples near the end of the heap that could be manually UPDATEd after a vacuum to allow the heap to be shrunk during the next vacuum. David
Re: pg_upgrade and logical replication
On Fri, Jul 19, 2024 at 03:44:22PM -0500, Nathan Bossart wrote: > I've been looking into optimizing pg_upgrade's once-in-each-database steps > [0], and I noticed that we are opening a connection to every database in > the cluster and running a query like > > SELECT count(*) FROM pg_catalog.pg_subscription WHERE subdbid = %d; > > Then, later on, we combine all of these values in > count_old_cluster_subscriptions() to verify that max_replication_slots is > set high enough. AFAICT these per-database subscription counts aren't used > for anything else. > > This is an extremely expensive way to perform that check, and so I'm > wondering why we don't just do > > SELECT count(*) FROM pg_catalog.pg_subscription; > > once in count_old_cluster_subscriptions(). Like so... -- nathan >From a90db3fa008f73f7f97cc73e4453ebf2e981cb83 Mon Sep 17 00:00:00 2001 From: Nathan Bossart Date: Sat, 20 Jul 2024 21:01:29 -0500 Subject: [PATCH v1 1/1] pg_upgrade: retrieve subscription count more efficiently --- src/bin/pg_upgrade/check.c | 9 +++- src/bin/pg_upgrade/info.c | 39 - src/bin/pg_upgrade/pg_upgrade.h | 3 +-- 3 files changed, 13 insertions(+), 38 deletions(-) diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c index 27924159d6..39d14b7b92 100644 --- a/src/bin/pg_upgrade/check.c +++ b/src/bin/pg_upgrade/check.c @@ -1797,17 +1797,14 @@ check_new_cluster_subscription_configuration(void) { PGresult *res; PGconn *conn; - int nsubs_on_old; int max_replication_slots; /* Subscriptions and their dependencies can be migrated since PG17. */ if (GET_MAJOR_VERSION(old_cluster.major_version) < 1700) return; - nsubs_on_old = count_old_cluster_subscriptions(); - /* Quick return if there are no subscriptions to be migrated. */ - if (nsubs_on_old == 0) + if (old_cluster.nsubs == 0) return; prep_status("Checking for new cluster configuration for subscriptions"); @@ -1821,10 +1818,10 @@ check_new_cluster_subscription_configuration(void) pg_fatal("could not determine parameter settings on new cluster"); max_replication_slots = atoi(PQgetvalue(res, 0, 0)); - if (nsubs_on_old > max_replication_slots) + if (old_cluster.nsubs > max_replication_slots) pg_fatal("\"max_replication_slots\" (%d) must be greater than or equal to the number of " "subscriptions (%d) on the old cluster", -max_replication_slots, nsubs_on_old); +max_replication_slots, old_cluster.nsubs); PQclear(res); PQfinish(conn); diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c index 95c22a7200..78398ec58b 100644 --- a/src/bin/pg_upgrade/info.c +++ b/src/bin/pg_upgrade/info.c @@ -28,7 +28,7 @@ static void print_db_infos(DbInfoArr *db_arr); static void print_rel_infos(RelInfoArr *rel_arr); static void print_slot_infos(LogicalSlotInfoArr *slot_arr); static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool live_check); -static void get_db_subscription_count(DbInfo *dbinfo); +static void get_subscription_count(ClusterInfo *cluster); /* @@ -298,12 +298,12 @@ get_db_rel_and_slot_infos(ClusterInfo *cluster, bool live_check) * count for the old cluster. */ if (cluster == &old_cluster) - { get_old_cluster_logical_slot_infos(pDbInfo, live_check); - get_db_subscription_count(pDbInfo); - } } + if (cluster == &old_cluster) + get_subscription_count(cluster); + if (cluster == &old_cluster) pg_log(PG_VERBOSE, "\nsource databases:"); else @@ -750,14 +750,14 @@ count_old_cluster_logical_slots(void) /* * get_db_subscription_count() * - * Gets the number of subscriptions in the database referred to by "dbinfo". + * Gets the number of subscriptions in the cluster. * * Note: This function will not do anything if the old cluster is pre-PG17. * This is because before that the logical slots are not upgraded, so we will * not be able to upgrade the logical replication clusters completely. */ static void -get_db_subscription_count(DbInfo *dbinfo) +get_subscription_count(ClusterInfo *cluster) { PGconn *conn; PGresult *res; @@ -766,36 +766,15 @@ get_db_subscription_count(DbInfo *dbinfo) if (GET_MAJOR_VERSION(old_cluster.major_version) < 1700) return; - conn = connectToServer(&old_cluster, dbinfo->db_name); + conn = connectToServer(&old_cluster, "template1"); res = executeQueryOrDie(conn, "SELECT count(*) " - "FROM pg_catalog.pg_subscripti
Re: Lock-free compaction. Why not?
David Rowley writes: > No resource links, but if you move a tuple to another page then you > must also adjust the index. If you have no exclusive lock on the > table, then you must assume older transactions still need the old > tuple version, so you need to create another index entry rather than > re-pointing the existing index entry's ctid to the new tuple version. The actually tricky part about that is that you have to ensure that any concurrent scan will see one of the two copies --- not both, and not neither. This is fairly hard when the concurrent query might be using any of several scan methods, and might or might not have visited the original tuple before you commenced the move. You can solve it by treating the move more like an UPDATE, that is the new tuple gets a new XID, but that has its own downsides; notably that it must block/be blocked by concurrent real UPDATEs. regards, tom lane
Re: Provide a pg_truncate_freespacemap function
On 2024/03/07 16:59, Ronan Dunklau wrote: Le mercredi 6 mars 2024, 20:28:44 CET Stephen Frost a écrit : I agree that this would generally be a useful thing to have. Thanks ! Does that seem correct ? Definitely needs to have a 'REVOKE ALL ON FUNCTION' at the end of the upgrade script, similar to what you'll find at the bottom of pg_visibility--1.1.sql in the tree today, otherwise anyone could run it. Beyond that, I'd suggest a function-level comment above the definition of the function itself (which is where we tend to put those- not at the point where we declare the function). Thank you for the review. Here is an updated patch for both of those. Here are my review comments: The documentation for pg_freespace needs updating. A regression test for pg_truncate_freespace_map() should be added. + /* Only some relkinds have a freespace map */ + if (!RELKIND_HAS_TABLE_AM(rel->rd_rel->relkind)) + ereport(ERROR, + (errcode(ERRCODE_WRONG_OBJECT_TYPE), +errmsg("relation \"%s\" is of wrong relation kind", + RelationGetRelationName(rel)), + errdetail_relkind_not_supported(rel->rd_rel->relkind))); An index can have an FSM, but this code doesn't account for that. + smgrtruncate(RelationGetSmgr(rel), &fork, 1, &block); Shouldn't truncation be performed after WAL-logging due to the WAL rule? I'm not sure if the current order might actually cause significant issues in FSM truncation case, though. Regards, -- Fujii Masao Advanced Computing Technology Center Research and Development Headquarters NTT DATA CORPORATION
Make reorder buffer max_changes_in_memory adjustable?
Hi hackers. Recently I came to an issue about logical replicating very big transactions. Since we already have logical_decoding_work_mem to keep the memory usage, there is no risk of OOM during decoding. However, the memory usage still goes out of control in 'Tuples' memory context of reorder buffer. It seems that when restoring the spilled transactions from disk, the memory usage is still limited by max_changes_in_memory which is hard coded to 4096 like what decoding does before v13. For big transactions, we have already supported streaming mode since v14, which should solve this issue, but using streaming mode relies on the subscriptor's support. There are still a lot of PostgreSQL running v12/13 in production, or maybe v11 or older even though EOLed. Also, there are a lot of CDCs which logical-replicates PostgreSQL seem not support streaming either. Would it be possible to make max_changes_in_memory a GUC so it can be adjusted dynamically? Make the default value 4096 as what current is. When coming with big transactions on memory-constrained machine, at least we can adjust max_changes_in_memory to a lower value to make logical WAL sender passing through this transaction. Or WAL sender may get kill -9 and recovery is needed. After recovery, WAL sender needs to restart from a point before this transaction starts, and keep this loop without anything useful. It would never have a chance to pass through this transaction except adding more memory to the machine, which is usually not practical in reality. Sincerely, Jingtang
Re: Provide a pg_truncate_freespacemap function
Fujii Masao writes: >> Le mercredi 6 mars 2024, 20:28:44 CET Stephen Frost a écrit : >>> I agree that this would generally be a useful thing to have. Personally, I want to push back on whether this has any legitimate use-case. Even if the FSM is corrupt, it should self-heal over time, and I'm not seeing the argument why truncating it would speed convergence towards correct values. Worse, in the interim where you don't have any FSM, you will suffer table bloat because insertions will be appended at the end of the table. So this looks like a foot-gun, and the patch's lack of user-visible documentation surely does nothing to make it safer. (The analogy to pg_truncate_visibility_map seems forced. If you are in a situation with a trashed visibility map, you are probably getting wrong query answers, and truncating the map will make that better. But a trashed FSM doesn't result in incorrect output, and zeroing it will make things worse not better.) regards, tom lane