Re: race condition in pg_class

2024-07-20 Thread Alexander Lakhin

Hello Noah,

28.06.2024 08:13, Noah Misch wrote:

Pushed.


A recent buildfarm test failure [1] showed that the
intra-grant-inplace-db.spec test added with 0844b3968 may fail
on a slow machine (per my understanding):

test intra-grant-inplace-db   ... FAILED 4302 ms

@@ -21,8 +21,7 @@
 WHERE datname = current_catalog
 AND age(datfrozenxid) > (SELECT min(age(x)) FROM frozen_witness);

-?column?
---
-datfrozenxid retreated
-(1 row)
+?column?
+
+(0 rows)

whilst the previous (successful) run shows much shorter duration:
test intra-grant-inplace-db   ... ok  540 ms

I reproduced this failure on a VM slowed down so that the test duration
reached 4+ seconds, with 100 test: intra-grant-inplace-db in
isolation_schedule:
test intra-grant-inplace-db   ... ok 4324 ms
test intra-grant-inplace-db   ... FAILED 4633 ms
test intra-grant-inplace-db   ... ok 4649 ms

But as the test going to be modified by the inplace110-successors-v8.patch
and the modified test (with all three latest patches applied) passes
reliably in the same conditions, maybe this failure doesn't deserve a
deeper exploration.

What do you think?

[1] 
https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=habu&dt=2024-07-18%2003%3A08%3A08

Best regards,
Alexander




Re: Use read streams in CREATE DATABASE command when the strategy is wal_log

2024-07-20 Thread Noah Misch
On Thu, Jul 18, 2024 at 02:11:13PM +0300, Nazir Bilal Yavuz wrote:
> v4 is attached.

Removal of the PinBufferForBlock() comment about the "persistence =
RELPERSISTENCE_PERMANENT" fallback started to feel like a loss.  I looked for
a way to re-add a comment about the fallback.
https://coverage.postgresql.org/src/backend/storage/buffer/bufmgr.c.gcov.html
shows no test coverage of that fallback, and I think the fallback is
unreachable.  Hence, I've removed the fallback in a separate commit.  I've
pushed that and your three patches.  Thanks.




Re: race condition in pg_class

2024-07-20 Thread Noah Misch
On Sat, Jul 20, 2024 at 11:00:00AM +0300, Alexander Lakhin wrote:
> 28.06.2024 08:13, Noah Misch wrote:
> > Pushed.
> 
> A recent buildfarm test failure [1] showed that the
> intra-grant-inplace-db.spec test added with 0844b3968 may fail
> on a slow machine

> But as the test going to be modified by the inplace110-successors-v8.patch
> and the modified test (with all three latest patches applied) passes
> reliably in the same conditions, maybe this failure doesn't deserve a
> deeper exploration.

Agreed.  Let's just wait for code review of the actual bug fix, not develop a
separate change to stabilize the test.  One flake in three weeks is low enough
to make that okay.

> [1] 
> https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=habu&dt=2024-07-18%2003%3A08%3A08




Re: UUID v7

2024-07-20 Thread Andrey M. Borodin


> On 8 May 2024, at 18:37, Andrey M. Borodin  wrote:
> 
> It's RFC now.

PFA version with references to RFC instead of drafts.
In nearby thread [0] we found out that most systems have enough presicion to 
fill additional 12 bits of sub-millisecond information. So I switched 
implementation to this method.
We have a portable gettimeofday(), but unfortunately it gives only 10 bits of 
sub-millisecond information. So I created portable get_real_time_ns() for this 
purpose: it reads clock_gettime() on non-Windows platforms and 
GetSystemTimePreciseAsFileTime() on Windows.

Thanks!


Best regards, Andrey Borodin.

[0] 
https://www.postgresql.org/message-id/flat/be0339cc-1ae1-4892-9445-8e6d8995a44d%40eisentraut.org



v25-0001-Implement-UUID-v7.patch
Description: Binary data


Re: Use read streams in CREATE DATABASE command when the strategy is wal_log

2024-07-20 Thread Nazir Bilal Yavuz
Hi,

On Sat, 20 Jul 2024 at 14:27, Noah Misch  wrote:
>
> On Thu, Jul 18, 2024 at 02:11:13PM +0300, Nazir Bilal Yavuz wrote:
> > v4 is attached.
>
> Removal of the PinBufferForBlock() comment about the "persistence =
> RELPERSISTENCE_PERMANENT" fallback started to feel like a loss.  I looked for
> a way to re-add a comment about the fallback.
> https://coverage.postgresql.org/src/backend/storage/buffer/bufmgr.c.gcov.html
> shows no test coverage of that fallback, and I think the fallback is
> unreachable.  Hence, I've removed the fallback in a separate commit.  I've
> pushed that and your three patches.  Thanks.

Thanks for the separate commit and push!

With the separate commit (e00c45f685), does it make sense to rename
the smgr_persistence parameter of the ReadBuffer_common() to
persistence? Because, ExtendBufferedRelTo() calls ReadBuffer_common()
with rel's persistence now, not with smgr's persistence.

-- 
Regards,
Nazir Bilal Yavuz
Microsoft




Re: Report search_path value back to the client.

2024-07-20 Thread Jelte Fennema-Nio
On Fri, 19 Jul 2024 at 21:48, Tomas Vondra
 wrote:
>
>
>
> On 7/19/24 17:16, Jelte Fennema-Nio wrote:
> > That's been moving forward, even relatively fast imho for the
> > size/impact of that patchset. But those changes are much less
> > straight-forward than this patch. And while I hope that they can get
> > committed for PG18 this year, I'm not confident about that.
>
> I don't know. My experience is that whenever we decided to not do a
> simple improvement because a more complex patch was expected to do
> something better/smarter, we often ended up getting nothing.
>
> So maybe it'd be best to get this committed, and then if the other patch
> manages to get into PG18, we can revert this change (or rather that
> patch would do that).

Yeah, I think we're aligned on this. Because that's totally what I
meant to say, but I don't seem to have written down that conclusion
explicitly.

> Maybe it's crazy-talk, but couldn't we make the GUC settable exactly
> once? That should be doable using the GUC hooks, I think. That means
> pgbouncer would set the GUC right after opening the connection, and then
> the following attempts to set it would either error out, or silently do
> nothing?
>
> Perhaps it could even allow enabling reporting for more GUCs, but once a
> GUC is on the list, it's reported forever.

This could maybe work, I'll think a bit about it. The main downside I
see is that PgBouncer can then not act as a transparent proxy, because
it cannot reset value to the value that the client expects the GUC to
be. But in practice that doesn't matter much for this specific case
because all that happens in this case is that the client gets a few
more ParameterStatus messages that it did not want.




Re: meson vs windows perl

2024-07-20 Thread Andrew Dunstan


On 2024-05-28 Tu 6:13 PM, Andres Freund wrote:

Hi,

On 2024-04-05 16:12:12 -0400, Andrew Dunstan wrote:

OK, this has been fixed and checked. The attached is what I propose.

The perl command is pretty hard to read. What about using python's shlex
module instead? Rough draft attached.  Still not very pretty, but seems easier
to read?

It'd be even better if we could just get perl to print out the flags in an
easier to parse way, but I couldn't immediately see a way.




Thanks for looking.

The attached should be easier to read. The perl package similar to shlex 
is Text::ParseWords, which is already a requirement.



cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com
diff --git a/meson.build b/meson.build
index 5387bb6d5f..e244cbac92 100644
--- a/meson.build
+++ b/meson.build
@@ -993,20 +993,17 @@ if not perlopt.disabled()
 # Config's ccdlflags and ldflags.  (Those are the choices of those who
 # built the Perl installation, which are not necessarily appropriate
 # for building PostgreSQL.)
-ldopts = run_command(perl, '-MExtUtils::Embed', '-e', 'ldopts', check: true).stdout().strip()
-undesired = run_command(perl_conf_cmd, 'ccdlflags', check: true).stdout().split()
-undesired += run_command(perl_conf_cmd, 'ldflags', check: true).stdout().split()
+perl_ldopts = run_command(perl, '-e', '''
+use ExtUtils::Embed;
+use Text::ParseWords;
+# tell perl to suppress including these in ldopts
+*ExtUtils::Embed::_ldflags =*ExtUtils::Embed::_ccdlflags = sub { return ""; };
+# adding an argument to ldopts makes it return a value instead of printing
+# print one of these per line so splitting will preserve spaces in file names.
+print "$_\n" foreach shellwords(ldopts(undef));
+''',
+ check: true).stdout().strip().split('\n')
 
-perl_ldopts = []
-foreach ldopt : ldopts.split(' ')
-  if ldopt == '' or ldopt in undesired
-continue
-  endif
-
-  perl_ldopts += ldopt.strip('"')
-endforeach
-
-message('LDFLAGS recommended by perl: "@0@"'.format(ldopts))
 message('LDFLAGS for embedding perl: "@0@"'.format(' '.join(perl_ldopts)))
 
 perl_dep_int = declare_dependency(


Re: Logical Replication of sequences

2024-07-20 Thread vignesh C
On Wed, 10 Jul 2024 at 13:46, Peter Smith  wrote:
>
> Here are a few comments for patch v20240705-0003.
>
> (This is a WIP. I have only looked at the docs so far.)
>
> ==
> doc/src/sgml/config.sgml
>
> nitpick - max_logical_replication_workers: /and sequence
> synchornization worker/and a sequence synchornization worker/
>
> ==
> doc/src/sgml/logical-replication.sgml
>
> nitpick - max_logical_replication_workers: re-order list of workers to
> be consistent with other docs 1-apply,2-parallel,3-tablesync,4-seqsync
>
> ==
> doc/src/sgml/ref/alter_subscription.sgml
>
> 1.
> IIUC the existing "REFRESH PUBLICATION" command will fetch and sync
> all new sequences, etc., and/or remove old ones no longer in the
> publication. But current docs do not say anything at all about
> sequences here. It should say something about sequence behaviour.
>
> ~~~
>
> 2.
> For the existing "REFRESH PUBLICATION" there is a sub-option
> "copy_data=true/false". Won't this need some explanation about how it
> behaves for sequences? Or will there be another option
> "copy_sequences=true/false".
>
> ~~~
>
> 3.
> IIUC the main difference between REFRESH PUBLICATION and REFRESH
> PUBLICATION SEQUENCES is that the 2nd command will try synchronize
> with all the *existing* sequences to bring them to the same point as
> on the publisher, but otherwise, they are the same command. If that is
> correct understanding I don't think that distinction is made very
> clear in the current docs.
>
> ~~~
>
> nitpick - the synopsis is misplaced. It should not be between ENABLE
> and DISABLE. I moved it. Also, it should say "REFRESH PUBLICATION
> SEQUENCES" because that is how the new syntax is defined in gram.y
>
> nitpick - REFRESH SEQUENCES. Renamed to "REFRESH PUBLICATION
> SEQUENCES". And, shouldn't "from the publisher" say "with the
> publisher"?
>
> nitpick - changed the varlistentry "id".
>
> ==
> 99.
> Please also see the attached diffs patch which implements any nitpicks
> mentioned above.

All these comments are handled in the v20240720 version patch attached at [1].
[1] - 
https://www.postgresql.org/message-id/CALDaNm2vuO7Ya4QVTZKR9jY_mkFFcE_hKUJiXx4KUknPgGFjSg%40mail.gmail.com

Regards,
Vignesh




Re: Logical Replication of sequences

2024-07-20 Thread vignesh C
On Fri, 12 Jul 2024 at 08:22, Peter Smith  wrote:
>
> Hi Vignesh. Here are the rest of my comments for patch v20240705-0003.
> ==
> src/backend/catalog/pg_subscription.c
>
> 1. GetSubscriptionSequences
>
> +/*
> + * Get the sequences for the subscription.
> + *
> + * The returned list is palloc'ed in the current memory context.
> + */
>
> Is that comment right? The palloc seems to be done in
> CacheMemoryContext, not in the current context.

This function is removed and GetSubscriptionRelations is being used instead.

> ~
>
> 2.
> The code is very similar to the other function
> GetSubscriptionRelations(). In fact I did not understand how the 2
> functions know what they are returning:
>
> E.g. how does GetSubscriptionRelations not return sequences too?
> E.g. how does GetSubscriptionSequences not return relations too?

GetSubscriptionRelations can be used, so removed the
GetSubscriptionSequences function.

>
> 3. AlterSubscription_refresh
>
> - logicalrep_worker_stop(sub->oid, relid);
> + /* Stop the worker if relation kind is not sequence*/
> + if (relkind != RELKIND_SEQUENCE)
> + logicalrep_worker_stop(sub->oid, relid);
>
> Can you give more reasons in the comment why skip the stop for sequence 
> worker?
>
> ~
>
> nitpick - period and space in the comment
>
> ~~~
>
> 8. logicalrep_sequence_sync_worker_find
>
> +/*
> + * Walks the workers array and searches for one that matches given
> + * subscription id.
> + *
> + * We are only interested in the sequence sync worker.
> + */
> +LogicalRepWorker *
> +logicalrep_sequence_sync_worker_find(Oid subid, bool only_running)
>
> There are other similar functions for walking the workers array to
> search for a worker. Instead of having different functions for
> different cases, wouldn't it be cleaner to combine these into a single
> function, where you pass a parameter (e.g. a mask of worker types that
> you are interested in finding)?

I will address this in a future version once the patch has become more stable.
> ~~~
>
> 11.
> - Assert(is_tablesync_worker == OidIsValid(relid));
> + Assert(is_tablesync_worker == OidIsValid(relid) ||
> is_sequencesync_worker == OidIsValid(relid));
>
> IIUC there is only a single sequence sync worker for handling all the
> sequences. So, what does the 'relid' actually mean here when there are
> multiple sequences?

Sequence sync workers will not have relid, modified the assert.

> ~~~
>
> 12. logicalrep_seqsyncworker_failuretime
> 12b.
> Curious if this had to be a separate exit handler or if may this could
> have been handled by the existing logicalrep_worker_onexit handler.
> See also other review comments int this post about this area --
> perhaps all this can be removed?

This function cannot be combined with logicalrep_worker_onexit as this
function should be called only in failure case and this exit handler
should be removed in case of success case.

This cannot be removed because of the following reason:
Consider the following situation: a sequence sync worker starts and
then encounters a failure while syncing sequences. At the same time, a
user initiates a "refresh publication sequences" operation. Given only
the start time, it's not possible to distinguish whether the sequence
sync worker failed or completed successfully. This is because the
"refresh publication sequences" operation would have re-added the
sequences, making it unclear whether the sync worker's failure or
success occurred.

>
> 14.
> The reason for the addition logic "(last_value + log_cnt)" is not
> obvious. I am guessing it might be related to code from
> 'nextval_internal' (fetch = log = fetch + SEQ_LOG_VALS;) but it is
> complicated. It is unfortunate that the field 'log_cnt' seems hardly
> commented anywhere at all.
>
> Also, I am not 100% sure if I trust the logic in the first place. The
> caller of this function is doing:
> sequence_value = fetch_sequence_data(conn, remoteid, &lsn);
> /* sets the sequence with sequence_value */
> SetSequenceLastValue(RelationGetRelid(rel), sequence_value);
>
> Won't that mean you can get to a situation where subscriber-side
> result of lastval('s') can be *ahead* from lastval('s') on the
> publisher? That doesn't seem good.

Added comments for "last_value + log_cnt"
Yes it can be ahead in subscribers. This will happen because every
change of the sequence is not wal logged. It is WAL logged once in
SEQ_LOG_VALS. This was discussed earlier and the sequence value being
ahead was ok.
https://www.postgresql.org/message-id/CA%2BTgmoaVLiKDD5vr1bzL-rxhMA37KCS_2xrqjbKVwGyqK%2BPCXQ%40mail.gmail.com

> 15.
> + /*
> + * Logical replication of sequences is based on decoding WAL records,
> + * describing the "next" state of the sequence the current state in the
> + * relfilenode is yet to reach. But during the initial sync we read the
> + * current state, so we need to reconstruct the WAL record logged when we
> + * started the current batch of sequence values.
> + *
> + * Otherwise we might get duplicate values (on subscrib

Re: Lock-free compaction. Why not?

2024-07-20 Thread Ahmed Yarub Hani Al Nuaimi
Wow I was busy for a controle of days and now I’m again fully committed to
this initiative. These ideas are extremely useful to my. I’ll first start
by reading the old in-place implementation, but meanwhile I have the
following questions:
1- I’m thinking of adding only one simple step to be auto-vacuum. This
means that there will neither be excessive locking nor resource
utilization. I guess my question is: does that simple step make the current
lazy auto-vacuum much worse?
2- Can you point me to a resource explaining why this might lead to index
bloating?

Em qui., 18 de jul. de 2024 às 15:21, Tom Lane  escreveu:

> Robert Haas  writes:
> > On Thu, Jul 18, 2024 at 7:08 AM David Rowley 
> wrote:
> >> I think the primary issue with the old way was index bloat wasn't
> >> fixed. The release notes for 9.0 do claim the CLUSTER method "is
> >> substantially faster in most cases", however, I imagine there are
> >> plenty of cases where it wouldn't be. e.g, it's hard to imagine
> >> rewriting the entire 1TB table and indexes is cheaper than moving 1
> >> row out of place row.
>
> > The other thing I remember besides index bloat is that it was
> > crushingly slow.
>
> Yeah.  The old way was great if there really were just a few tuples
> needing to be moved ... but by the time you decide you need VACUUM
> FULL rather than plain VACUUM, that's unlikely to be the case.
>
> regards, tom lane
>


Re: Slow catchup of 2PC (twophase) transactions on replica in LR

2024-07-20 Thread vignesh C
On Thu, 18 Jul 2024 at 07:41, Hayato Kuroda (Fujitsu)
 wrote:
>
> Dear Peter,
>
> Thanks for giving comments! PSA new version.

Couple of suggestions:
1) How will user know which all transactions should be rolled back
since the prepared transaction name will be different in subscriber
like pg_gid_16398_750, can we mention some info on how user can
identify these prepared transactions that should be rolled back in the
subscriber or if this information is already available can we point it
from here:
+  When altering two_phase
+  from true to false, the backend
+  process reports and an error if any prepared transactions done by the
+  logical replication worker (from when two_phase
+  parameter was still true) are found. You can resolve
+  prepared transactions on the publisher node, or manually roll back them
+  on the subscriber, and then try again.

2)  I'm not sure if InvalidRepOriginId is correct here,  how about
using OidIsValid in the below:
+void
+TwoPhaseTransactionGid(Oid subid, TransactionId xid, char *gid, int szgid)
+{
+   Assert(subid != InvalidRepOriginId);

Regards,
Vignesh




Re: Use read streams in CREATE DATABASE command when the strategy is wal_log

2024-07-20 Thread Noah Misch
On Sat, Jul 20, 2024 at 03:01:31PM +0300, Nazir Bilal Yavuz wrote:
> On Sat, 20 Jul 2024 at 14:27, Noah Misch  wrote:
> >
> > On Thu, Jul 18, 2024 at 02:11:13PM +0300, Nazir Bilal Yavuz wrote:
> > > v4 is attached.
> >
> > Removal of the PinBufferForBlock() comment about the "persistence =
> > RELPERSISTENCE_PERMANENT" fallback started to feel like a loss.  I looked 
> > for
> > a way to re-add a comment about the fallback.
> > https://coverage.postgresql.org/src/backend/storage/buffer/bufmgr.c.gcov.html
> > shows no test coverage of that fallback, and I think the fallback is
> > unreachable.  Hence, I've removed the fallback in a separate commit.  I've
> > pushed that and your three patches.  Thanks.
> 
> Thanks for the separate commit and push!
> 
> With the separate commit (e00c45f685), does it make sense to rename
> the smgr_persistence parameter of the ReadBuffer_common() to
> persistence? Because, ExtendBufferedRelTo() calls ReadBuffer_common()
> with rel's persistence now, not with smgr's persistence.

BMR_REL() doesn't set relpersistence, so bmr.relpersistence is associated with
bmr.smgr and is unset if bmr.rel is set.  That is to say, bmr.relpersistence
is an smgr_persistence.  It could make sense to change ReadBuffer_common() to
take a BufferManagerRelation instead of the three distinct arguments.

On a different naming topic, my review missed that field name
copy_storage_using_buffer_read_stream_private.last_block doesn't fit how the
field is used.  Code uses it like an nblocks.  So let's either rename the
field or change the code to use it as a last_block (e.g. initialize it to
nblocks-1, not nblocks).




Re: Why do we define HAVE_GSSAPI_EXT_H?

2024-07-20 Thread Andres Freund
On 2024-07-08 15:56:59 -0700, Andres Freund wrote:
> While looking at this I also found an argument omission present in the commit
> adding meson support. I plan to fix that with the attached commit.

Pushed that portion.




Re: Meson far from ready on Windows

2024-07-20 Thread Andres Freund
Hi,

On 2024-07-17 09:49:47 -0700, Andres Freund wrote:
> On 2024-07-16 15:53:45 -0500, Tristan Partin wrote:
> > Other than that, it looks pretty solid.
> 
> Thanks for looking!  I'm thinking of pushing the first few patches soon-ish.
> 
> I'm debating between going for 17 + HEAD or also applying it to 16, to keep
> the trees more similar.

Pushed a number of these to 16 - HEAD.

Greetings,

Andres Freund




Re: Parallelize correlated subqueries that execute within each worker

2024-07-20 Thread Tomas Vondra
Hi,

I was going through the "needs review" patches with no recent messages,
trying to figure out what is needed to move them forward, and this one
caught my eye because I commented on it before. And it's also a bit sad
example, because it started in 2021 and is moving at glacier speed :-(

I read through the thread, to understand how the design changed over
time, and I like the current approach (proposed by Robert) much more
than the initial idea of adding new flag next to parallel_safe etc.

And in general the patch looks reasonably simple and clean, but my
knowledge of PARAM intricacies is pretty much nil, so I'm hesitant to
claim the patch is correct. And I'm not sure what exactly needs to
happen to validate the approach :-(


The regression tests currently fail, due to a different plan for one of
the new queries in select_parallel. I guess it's due to some committed
patch, and it looks like a sensible change, but I haven't looked closely.

Also, I do get this warning when building with GCC 12.2.0 on Debian:

clauses.c: In function ‘max_parallel_hazard_walker’:
clauses.c:961:49: warning: ‘save_safe_param_ids’ may be used
uninitialized [-Wmaybe-uninitialized]
  961 | context->safe_param_ids =
save_safe_param_ids;
  |
^
clauses.c:943:29: note: ‘save_safe_param_ids’ was declared here
  943 | List   *save_safe_param_ids;
  | ^~~

It's harmless, the compiler simply does not realize the two blocks
(where save_safe_param_ids is set and used) have exactly the same if
conditions, but it's a bit confusing for people too.


I was wondering if this could affect some queries in TPC-H, but the only
query affected seems to be Q2 - where it helps, cutting the time in
half, but Q2 is generally pretty fast / the expensive part was already
parallelized quite well (i.e. the correlated subquery is fairly cheap).

However, it's not difficult to construct a query where this helps a lot.
If the correlated subquery does something expensive (e.g. aggregation of
non-trivial amounts of data), this would help. So I wonder if e.g.
TPC-DS would benefit from this more ...


A couple review comments about the code:

1) new fields in max_parallel_hazard_context should have comments:

+   boolcheck_params;
+   Bitmapset **required_params;


2) Do we need both is_parallel_safe and is_parallel_safe_with_params?
ISTM the main difference is the for() loop, so why not add an extra
parameter to is_parallel_safe() and skip that loop if it's null? Or, if
we worry about external code, keep is_parallel_safe_with_params() and
define is_parallel_safe() as is_parallel_safe_with_params(...,NULL)?

3) Isn't it a bit weird that is_parallel_safe_with_params() actually
sets check_params=false, which seems like it doesn't actually check
parameters? I'm a bit confused / unsure if this is a bug or how it
actually checks parameters. If it's correct, it may need a comment.

4) The only place setting check_params is max_parallel_hazard, which is
called only for the whole Query from standard_planner(). But it does not
set required_params - can't this be an issue if the pointer happens to
be random garbage?

5) It probably needs a pgindent run, there's a bunch of rather long
lines and those are likely to get wrapped and look weird.

6) Is the comment in max_parallel_hazard_walker() still accurate? It
talks about PARAM_EXTERN and PARAM_EXEC, but the patch removes the
PARAM_EXTERN check entirely. So maybe the comment needs updating?

7) I don't like the very complex if condition very much, it's hard to
understand. I'd split that into two separate conditions, and add a short
comment for each of them. I.e. instead of:

  if (param->paramkind != PARAM_EXEC || !(context->check_params ||
context->required_params != NULL))
return false;

I'd do

  /* ... comment ... */
  if (param->paramkind != PARAM_EXEC)
return false;

  /* ... comment ... */
  if (!(context->check_params || context->required_params != NULL))
return false;

or something like that.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: psql: fix variable existence tab completion

2024-07-20 Thread Anton A. Melnikov

On 19.07.2024 01:10, Tom Lane wrote:

Actually, I think we ought to just reject this change.  Debian 10
will be two years past EOL before PG 17 ships.  So I don't see a
reason to support it in the tests anymore.  One of the points of
such testing is to expose broken platforms, not mask them.

Obviously, if anyone can point to a still-in-support platform
with the same bug, that calculus might change.


The bug when broken version of libedit want to backslash some symbols
(e.g. double quotas, curly braces, the question mark)
i only encountered on Debian 10 (buster).

If anyone has encountered a similar error on some other system,
please share such information.



With respect to the other hacks Alexander mentions, maybe we
could clean some of those out too?  I don't recall what platform
we had in mind there, but we've moved our goalposts on what
we support pretty far in the last couple years.


Agreed that no reason to save workarounds for non-supported systems.
Here is the patch that removes fixes for Buster bug mentioned above.


With the best regards,

--
Anton A. Melnikov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres CompanyFrom 971184c49cf2c5932bb682ee977a6fef9ba4eba5 Mon Sep 17 00:00:00 2001
From: "Anton A. Melnikov" 
Date: Sun, 21 Jul 2024 01:37:03 +0300
Subject: [PATCH] Remove workarounds for Debian 10 (Buster)  with broken
 libedit from the 010_tab_completion.pl test.

---
 src/bin/psql/t/010_tab_completion.pl | 27 ---
 1 file changed, 8 insertions(+), 19 deletions(-)

diff --git a/src/bin/psql/t/010_tab_completion.pl b/src/bin/psql/t/010_tab_completion.pl
index b45c39f0f5..678a0f7a2c 100644
--- a/src/bin/psql/t/010_tab_completion.pl
+++ b/src/bin/psql/t/010_tab_completion.pl
@@ -167,26 +167,18 @@ check_completion(
 	qr/"mytab123" +"mytab246"/,
 	"offer multiple quoted table choices");
 
-# note: broken versions of libedit want to backslash the closing quote;
-# not much we can do about that
-check_completion("2\t", qr/246\\?" /,
+check_completion("2\t", qr/246" /,
 	"finish completion of one of multiple quoted table choices");
 
-# note: broken versions of libedit may leave us in a state where psql
-# thinks there's an unclosed double quote, so that we have to use
-# clear_line not clear_query here
-clear_line();
+clear_query();
 
 # check handling of mixed-case names
-# note: broken versions of libedit want to backslash the closing quote;
-# not much we can do about that
 check_completion(
 	"select * from \"mi\t",
-	qr/"mixedName\\?" /,
+	qr/"mixedName" /,
 	"complete a mixed-case name");
 
-# as above, must use clear_line not clear_query here
-clear_line();
+clear_query();
 
 # check case folding
 check_completion("select * from TAB\t", qr/tab1 /, "automatically fold case");
@@ -198,8 +190,7 @@ clear_query();
 # differently, so just check that the replacement comes out correctly
 check_completion("\\DRD\t", qr/drds /, "complete \\DRD to \\drds");
 
-# broken versions of libedit require clear_line not clear_query here
-clear_line();
+clear_query();
 
 # check completion of a schema-qualified name
 check_completion("select * from pub\t",
@@ -261,18 +252,16 @@ check_completion(
 	qr|tab_comp_dir/af\a?ile|,
 	"filename completion with multiple possibilities");
 
-# broken versions of libedit require clear_line not clear_query here
+# here we inside a string literal 'afile*' and wait for a sub-choice or second TAB. So use clear_line().
 clear_line();
 
 # COPY requires quoting
-# note: broken versions of libedit want to backslash the closing quote;
-# not much we can do about that
 check_completion(
 	"COPY foo FROM tab_comp_dir/some\t",
-	qr|'tab_comp_dir/somefile\\?' |,
+	qr|'tab_comp_dir/somefile' |,
 	"quoted filename completion with one possibility");
 
-clear_line();
+clear_query();
 
 check_completion(
 	"COPY foo FROM tab_comp_dir/af\t",
-- 
2.45.2



Re: Report search_path value back to the client.

2024-07-20 Thread Tomas Vondra
On 7/20/24 14:09, Jelte Fennema-Nio wrote:
> On Fri, 19 Jul 2024 at 21:48, Tomas Vondra
>  wrote:
>>
>>
>>
>> On 7/19/24 17:16, Jelte Fennema-Nio wrote:
>>> That's been moving forward, even relatively fast imho for the
>>> size/impact of that patchset. But those changes are much less
>>> straight-forward than this patch. And while I hope that they can get
>>> committed for PG18 this year, I'm not confident about that.
>>
>> I don't know. My experience is that whenever we decided to not do a
>> simple improvement because a more complex patch was expected to do
>> something better/smarter, we often ended up getting nothing.
>>
>> So maybe it'd be best to get this committed, and then if the other patch
>> manages to get into PG18, we can revert this change (or rather that
>> patch would do that).
> 
> Yeah, I think we're aligned on this. Because that's totally what I
> meant to say, but I don't seem to have written down that conclusion
> explicitly.
> 
>> Maybe it's crazy-talk, but couldn't we make the GUC settable exactly
>> once? That should be doable using the GUC hooks, I think. That means
>> pgbouncer would set the GUC right after opening the connection, and then
>> the following attempts to set it would either error out, or silently do
>> nothing?
>>
>> Perhaps it could even allow enabling reporting for more GUCs, but once a
>> GUC is on the list, it's reported forever.
> 
> This could maybe work, I'll think a bit about it. The main downside I
> see is that PgBouncer can then not act as a transparent proxy, because
> it cannot reset value to the value that the client expects the GUC to
> be. But in practice that doesn't matter much for this specific case
> because all that happens in this case is that the client gets a few
> more ParameterStatus messages that it did not want.

I understand that point of view, but I don't quite share it. I don't
really see this as a GUC, even if it's implemented as one. It's just a
convenient way to enable a protocol feature without having to modify the
protocol ... It could easily be a enable_param_report() function, doing
the same thing, for example.

It's a bit of a tangent, but this reminds me how Oracle uses logon
triggers to set "application context" for their variant of RLS. In
postgres that was not possible because we did not have "logon" triggers,
but in 17 that will change. I wonder it that could be handy, but maybe
not - it seems unnecessarily complex.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company




Re: REINDEX not updating partition progress

2024-07-20 Thread Ilya Gladyshev



19 июля 2024 г., в 04:17, Michael Paquier  
написал(а):


On Fri, Jul 12, 2024 at 11:07:49PM +0100, Ilya Gladyshev wrote:
While working on CIC for partitioned tables [1], I noticed that 
REINDEX for
partitioned tables is not tracking keeping progress of partitioned 
tables,

so I'm creating a separate thread for this fix as suggested.


This limitation is not a bug, because we already document that
partitions_total and partitions_done are 0 during a REINDEX.  Compared
to CREATE INDEX, a REINDEX is a bit wilder as it would work on all the
indexes for all the partitions, providing this information makes
sense.

Agreed that this could be better, but what's now on HEAD is not wrong
either.


Thanks for pointing it out, I didn’t notice it. You’re right, this not a 
bug, but an improvement. Will remove the bits from the documentation as 
well.




Still it does not do it because we know that the fields are going to
be overwritten pretty quickly anyway, no?  For now, we could just do
without progress_index_partition_done(), I guess, keeping it to the
incrementation of PROGRESS_CREATEIDX_PARTITIONS_DONE.  There's an
argument that this makes the code slightly easier to follow, with less
wrappers around the progress reporting.


The use-case that I thought this would improve is REINDEX CONCURRENT, 
where data from later stages of the previous partitions would linger for 
quite a while before it gets to the same stage of the current partition. 
I don’t think this is of big importance, so I’m ok with making code 
simpler and leaving it out.




+intprogress_params[3] = {
+PROGRESS_CREATEIDX_COMMAND,
+PROGRESS_CREATEIDX_PHASE,
+PROGRESS_CREATEIDX_PARTITIONS_TOTAL
+};
+int64progress_values[3];
+OidheapId = relid;

Rather than having new code blocks, let's use a style consistent with
DefineIndex() where we have the pgstat_progress_update_multi_param(),
with a single {} block.


Fixed.


Adding the counter increment at the end of the loop in
ReindexMultipleInternal() is a good idea.  It considers both the
concurrent and non-concurrent cases.


Another reason to do so is that reindex_relation calls itself 
recursively, so it'd require some additional mechanism so that it 
wouldn't increment multiple times per partition.


+   progress_values[2] = list_length(partitions);
+   pgstat_progress_start_command(PROGRESS_COMMAND_CREATE_INDEX, heapId);

Hmm.  Is setting the relid only once for pg_stat_progress_create_index
the best choice there is?  Could it be better to report the partition
OID instead?


Technically, we could do this, but it looks like no other commands do it 
and currently there’s no API to do it without erasing the rest of 
progress information.



 Let's remember that indexes may have been attached with
names inconsistent with the partitioned table's index.  It is a bit
confusing to stick to the relid all the partitioned table all the
time, for all the indexes of all the partitions reindexed.


I agree that it’s a bit confusing especially because documentation gives 
no hints about it. Judging by documentation, I would expect relid and 
index_relid to correspond to the same table. However, if we say that 
this field represents the oid of the relation on which the command was 
invoked, then I think it does make sense. Edited documentation to say 
that in the new patch.



 Could it
be better to actually introduce an entirely new field to the progress
table?  What I would look for is more information:
1) the partitioned table OID on which the REINDEX runs
2) the partition table being processed
3) the index OID being processed (old and new for concurrent case).

The patch sets 1) to the OID of the partitioned table, lacks 2) and
sets 3) each time an index is rebuilt.
—
Michael


I like this approach more, but I’m not sure whether adding another field 
for partition oid is worth it, since we already have index_relid that we 
can use to join with pg_index to get that. On the other hand, 
index_relid is missing during regular CREATE INDEX, so this new field 
could be useful to indicate which table is being indexed in this case. 
I'm on the fence about this, attached this as a separate patch, if you 
think it's a good idea.


Thank you for the review!
From 61e82b773e1c21dcb50a298bb205449ff17c90dc Mon Sep 17 00:00:00 2001
From: Ilya Gladyshev 
Date: Sun, 21 Jul 2024 01:40:15 +0100
Subject: [PATCH v2 2/2] partition_relid column for create index progress

---
 doc/src/sgml/monitoring.sgml |  9 +
 src/backend/catalog/system_views.sql |  3 ++-
 src/backend/commands/indexcmds.c | 18 +-
 src/include/commands/progress.h  |  1 +
 src/test/regress/expected/rules.out  |  3 ++-
 5 files changed, 27 insertions(+), 7 deletions(-)

diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
index 00bb423288..595f19a8f8 100644
--- a/doc/src/sgml/monitoring.sgml
+++ b/doc/src/sgml/monitoring.sgml
@@ -6006,6 +6006,15 @@ FROM pg_stat_get_backend_idset() AS backendid;
0 whe

Re: Add mention of execution time memory for enable_partitionwise_* GUCs

2024-07-20 Thread David Rowley
On Fri, 19 Jul 2024 at 17:24, Ashutosh Bapat
 wrote:
> According to [1] these GUCs were added because of increased
> memory consumption during planning and time taken to plan the query.
> The execution time memory consumption issue was known even back then
> but the GUC was not set to default because of that. But your patch
> proposes to change the narrative. In the same thread [1], you will
> find the discussion about turning the default to ON once we have fixed
> planner's memory and time consumption. We have patches addressing
> those issues [2] [3]. With your proposed changes we will almost never
> have a chance to turn those GUCs ON by default. That seems a rather
> sad prospect.

Sad prospect for who? If the day comes and we're considering enabling
these GUCs by default, I think we'll likely also be considering if
it'll be sad for users who get an increased likelihood of OOM kills
because the chosen plan uses $nparts times more memory to execute than
the old plan.

Can you honestly say that you have no concern about increased executor
memory consumption if we switched on these GUCs by default? I was
certainly concerned about it in [5], so I dropped that patch after
realising what could happen.

> I am fine if we want to mention that the executor may consume a large
> amount of memory when these GUCs are turned ON. Users may decide to
> turn those OFF if they can not afford to spend that much memory during
> execution. But I don't like tying execution time consumption with
> default OFF.

If you were to fix the planner issues then this text would need to be
revisited anyway. However, I seriously question your judgment on
fixing those alone being enough to allow us to switch these on by
default.  It seems unlikely that the planner would use anything near
any realistic work_mem setting extra memory per partition, but that's
what the executor would do, given enough data per partition.

I'd say any analysis that only found planner memory and time to be the
only reason to turn these GUCs off by default was incomplete analysis.
Maybe it was a case of stopping looking for all the reasons once
enough had been found to make the choice. If not, then they found the
molehill and failed to notice the mountain.

David

[4] https://postgr.es/m/3603c380-d094-136e-e333-610914fb3e80%40gmx.net
[5] 
https://postgr.es/m/caaphdvojkdbr3mr59jxmacybyhb6q_5qpru+dy93en8wm+x...@mail.gmail.com




Re: Lock-free compaction. Why not?

2024-07-20 Thread David Rowley
On Sun, 21 Jul 2024 at 04:00, Ahmed Yarub Hani Al Nuaimi
 wrote:
> 2- Can you point me to a resource explaining why this might lead to index 
> bloating?

No resource links, but if you move a tuple to another page then you
must also adjust the index.  If you have no exclusive lock on the
table, then you must assume older transactions still need the old
tuple version, so you need to create another index entry rather than
re-pointing the existing index entry's ctid to the new tuple version.
It's not hard to imagine that would cause the index to become larger
if you had to move some decent portion of the tuples to other pages.

FWIW, I think it would be good if we had some easier way to compact
tables without blocking concurrent users.  My primary interest in TID
Range Scans was to allow easier identification of tuples near the end
of the heap that could be manually UPDATEd after a vacuum to allow the
heap to be shrunk during the next vacuum.

David




Re: pg_upgrade and logical replication

2024-07-20 Thread Nathan Bossart
On Fri, Jul 19, 2024 at 03:44:22PM -0500, Nathan Bossart wrote:
> I've been looking into optimizing pg_upgrade's once-in-each-database steps
> [0], and I noticed that we are opening a connection to every database in
> the cluster and running a query like
> 
>   SELECT count(*) FROM pg_catalog.pg_subscription WHERE subdbid = %d;
> 
> Then, later on, we combine all of these values in
> count_old_cluster_subscriptions() to verify that max_replication_slots is
> set high enough.  AFAICT these per-database subscription counts aren't used
> for anything else.
> 
> This is an extremely expensive way to perform that check, and so I'm
> wondering why we don't just do
> 
>   SELECT count(*) FROM pg_catalog.pg_subscription;
> 
> once in count_old_cluster_subscriptions().

Like so...

-- 
nathan
>From a90db3fa008f73f7f97cc73e4453ebf2e981cb83 Mon Sep 17 00:00:00 2001
From: Nathan Bossart 
Date: Sat, 20 Jul 2024 21:01:29 -0500
Subject: [PATCH v1 1/1] pg_upgrade: retrieve subscription count more
 efficiently

---
 src/bin/pg_upgrade/check.c  |  9 +++-
 src/bin/pg_upgrade/info.c   | 39 -
 src/bin/pg_upgrade/pg_upgrade.h |  3 +--
 3 files changed, 13 insertions(+), 38 deletions(-)

diff --git a/src/bin/pg_upgrade/check.c b/src/bin/pg_upgrade/check.c
index 27924159d6..39d14b7b92 100644
--- a/src/bin/pg_upgrade/check.c
+++ b/src/bin/pg_upgrade/check.c
@@ -1797,17 +1797,14 @@ check_new_cluster_subscription_configuration(void)
 {
PGresult   *res;
PGconn *conn;
-   int nsubs_on_old;
int max_replication_slots;
 
/* Subscriptions and their dependencies can be migrated since PG17. */
if (GET_MAJOR_VERSION(old_cluster.major_version) < 1700)
return;
 
-   nsubs_on_old = count_old_cluster_subscriptions();
-
/* Quick return if there are no subscriptions to be migrated. */
-   if (nsubs_on_old == 0)
+   if (old_cluster.nsubs == 0)
return;
 
prep_status("Checking for new cluster configuration for subscriptions");
@@ -1821,10 +1818,10 @@ check_new_cluster_subscription_configuration(void)
pg_fatal("could not determine parameter settings on new 
cluster");
 
max_replication_slots = atoi(PQgetvalue(res, 0, 0));
-   if (nsubs_on_old > max_replication_slots)
+   if (old_cluster.nsubs > max_replication_slots)
pg_fatal("\"max_replication_slots\" (%d) must be greater than 
or equal to the number of "
 "subscriptions (%d) on the old cluster",
-max_replication_slots, nsubs_on_old);
+max_replication_slots, old_cluster.nsubs);
 
PQclear(res);
PQfinish(conn);
diff --git a/src/bin/pg_upgrade/info.c b/src/bin/pg_upgrade/info.c
index 95c22a7200..78398ec58b 100644
--- a/src/bin/pg_upgrade/info.c
+++ b/src/bin/pg_upgrade/info.c
@@ -28,7 +28,7 @@ static void print_db_infos(DbInfoArr *db_arr);
 static void print_rel_infos(RelInfoArr *rel_arr);
 static void print_slot_infos(LogicalSlotInfoArr *slot_arr);
 static void get_old_cluster_logical_slot_infos(DbInfo *dbinfo, bool 
live_check);
-static void get_db_subscription_count(DbInfo *dbinfo);
+static void get_subscription_count(ClusterInfo *cluster);
 
 
 /*
@@ -298,12 +298,12 @@ get_db_rel_and_slot_infos(ClusterInfo *cluster, bool 
live_check)
 * count for the old cluster.
 */
if (cluster == &old_cluster)
-   {
get_old_cluster_logical_slot_infos(pDbInfo, live_check);
-   get_db_subscription_count(pDbInfo);
-   }
}
 
+   if (cluster == &old_cluster)
+   get_subscription_count(cluster);
+
if (cluster == &old_cluster)
pg_log(PG_VERBOSE, "\nsource databases:");
else
@@ -750,14 +750,14 @@ count_old_cluster_logical_slots(void)
 /*
  * get_db_subscription_count()
  *
- * Gets the number of subscriptions in the database referred to by "dbinfo".
+ * Gets the number of subscriptions in the cluster.
  *
  * Note: This function will not do anything if the old cluster is pre-PG17.
  * This is because before that the logical slots are not upgraded, so we will
  * not be able to upgrade the logical replication clusters completely.
  */
 static void
-get_db_subscription_count(DbInfo *dbinfo)
+get_subscription_count(ClusterInfo *cluster)
 {
PGconn *conn;
PGresult   *res;
@@ -766,36 +766,15 @@ get_db_subscription_count(DbInfo *dbinfo)
if (GET_MAJOR_VERSION(old_cluster.major_version) < 1700)
return;
 
-   conn = connectToServer(&old_cluster, dbinfo->db_name);
+   conn = connectToServer(&old_cluster, "template1");
res = executeQueryOrDie(conn, "SELECT count(*) "
-   "FROM 
pg_catalog.pg_subscripti

Re: Lock-free compaction. Why not?

2024-07-20 Thread Tom Lane
David Rowley  writes:
> No resource links, but if you move a tuple to another page then you
> must also adjust the index.  If you have no exclusive lock on the
> table, then you must assume older transactions still need the old
> tuple version, so you need to create another index entry rather than
> re-pointing the existing index entry's ctid to the new tuple version.

The actually tricky part about that is that you have to ensure that
any concurrent scan will see one of the two copies --- not both,
and not neither.  This is fairly hard when the concurrent query
might be using any of several scan methods, and might or might not
have visited the original tuple before you commenced the move.
You can solve it by treating the move more like an UPDATE, that
is the new tuple gets a new XID, but that has its own downsides;
notably that it must block/be blocked by concurrent real UPDATEs.

regards, tom lane




Re: Provide a pg_truncate_freespacemap function

2024-07-20 Thread Fujii Masao




On 2024/03/07 16:59, Ronan Dunklau wrote:

Le mercredi 6 mars 2024, 20:28:44 CET Stephen Frost a écrit :

I agree that this would generally be a useful thing to have.


Thanks !




Does that seem correct ?


Definitely needs to have a 'REVOKE ALL ON FUNCTION' at the end of the
upgrade script, similar to what you'll find at the bottom of
pg_visibility--1.1.sql in the tree today, otherwise anyone could run it.

Beyond that, I'd suggest a function-level comment above the definition
of the function itself (which is where we tend to put those- not at the
point where we declare the function).


Thank you for the review. Here is an updated patch for both of those.


Here are my review comments:

The documentation for pg_freespace needs updating.


A regression test for pg_truncate_freespace_map() should be added.


+   /* Only some relkinds have a freespace map */
+   if (!RELKIND_HAS_TABLE_AM(rel->rd_rel->relkind))
+   ereport(ERROR,
+   (errcode(ERRCODE_WRONG_OBJECT_TYPE),
+errmsg("relation \"%s\" is of wrong relation 
kind",
+   RelationGetRelationName(rel)),
+
errdetail_relkind_not_supported(rel->rd_rel->relkind)));

An index can have an FSM, but this code doesn't account for that.


+   smgrtruncate(RelationGetSmgr(rel), &fork, 1, &block);

Shouldn't truncation be performed after WAL-logging due to the WAL rule?
I'm not sure if the current order might actually cause significant issues
in FSM truncation case, though.

Regards,

--
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION




Make reorder buffer max_changes_in_memory adjustable?

2024-07-20 Thread Jingtang Zhang
Hi hackers.

Recently I came to an issue about logical replicating very big
transactions. Since we already have logical_decoding_work_mem to keep
the memory usage, there is no risk of OOM during decoding. However, the
memory usage still goes out of control in 'Tuples' memory context of
reorder buffer. It seems that when restoring the spilled transactions
from disk, the memory usage is still limited by max_changes_in_memory
which is hard coded to 4096 like what decoding does before v13.

For big transactions, we have already supported streaming mode since
v14, which should solve this issue, but using streaming mode relies on
the subscriptor's support. There are still a lot of PostgreSQL running
v12/13 in production, or maybe v11 or older even though EOLed. Also,
there are a lot of CDCs which logical-replicates PostgreSQL seem not
support streaming either.

Would it be possible to make max_changes_in_memory a GUC so it can be
adjusted dynamically? Make the default value 4096 as what current is.
When coming with big transactions on memory-constrained machine, at
least we can adjust max_changes_in_memory to a lower value to make
logical WAL sender passing through this transaction. Or WAL sender may
get kill -9 and recovery is needed. After recovery, WAL sender needs to
restart from a point before this transaction starts, and keep this loop
without anything useful. It would never have a chance to pass through
this transaction except adding more memory to the machine, which is
usually not practical in reality.

Sincerely, Jingtang


Re: Provide a pg_truncate_freespacemap function

2024-07-20 Thread Tom Lane
Fujii Masao  writes:
>> Le mercredi 6 mars 2024, 20:28:44 CET Stephen Frost a écrit :
>>> I agree that this would generally be a useful thing to have.

Personally, I want to push back on whether this has any legitimate
use-case.  Even if the FSM is corrupt, it should self-heal over
time, and I'm not seeing the argument why truncating it would
speed convergence towards correct values.  Worse, in the interim
where you don't have any FSM, you will suffer table bloat because
insertions will be appended at the end of the table.  So this
looks like a foot-gun, and the patch's lack of user-visible
documentation surely does nothing to make it safer.

(The analogy to pg_truncate_visibility_map seems forced.
If you are in a situation with a trashed visibility map,
you are probably getting wrong query answers, and truncating
the map will make that better.  But a trashed FSM doesn't
result in incorrect output, and zeroing it will make things
worse not better.)

regards, tom lane